arkhipenko / TaskScheduler

Cooperative multitasking for Arduino, ESPx, STM32, nRF and other microcontrollers
http://playground.arduino.cc/Code/TaskScheduler
BSD 3-Clause "New" or "Revised" License
1.22k stars 224 forks source link

Scheduler seems to crash/hang #33

Closed costyn closed 7 years ago

costyn commented 7 years ago

Hello,

Thanks for your cool library! But ... :-) I'm running into an weird issue where the scheduler seems to stay stuck during running of one particular task. At that point all tasks stop and no work is done. I would like to ask your assistance in troubleshooting. Extra flags/options/debug statements I could include to pinpoint the issue.

My code can be found here: https://github.com/costyn/Glow-Fur-for-Pro-Mini-with-MPU6050

So at a certain point, only if Task taskGetDMPData is enabled, runner.execute() will not return in void loop(). I have included a heartbeat in loop() to confirm this. I have tried re-enabling tasks in loop() but that didn't make a difference.

I have tried to see if there is a pattern in the number of iterations at which it crashes, but I have not been able to spot a pattern.

The code for the MPU6050 works fine without the scheduler. It's only if I run the data retrieval from the gyro with the scheduler that it stops.

I also checked for memory issues; heap/stack crashes, but the SRAM usage stays constant during running.

Pulling pin 3 to GND triggers an interrupt with some println's, and these still work after the scheduler crashes. So I know the AVR is still not crashed.

Any tips welcome! Thanks!

Kind regards,

Costyn.

arkhipenko commented 7 years ago

Hi Costyn,

I have worked with MPU6050 before and never ran into a problem like that. From what I understand it might not be a problem of the the scheduler, but rather a problem of the callback method that never returns control to scheduler.

I am very suspicious of this line of code:

while (fifoCount < packetSize) fifoCount = mpu.getFIFOCount();

Because it has a potential of generating an infinite loop. Could you add a timeout condition or rewrite the method in such a way that getFIFOCount() is called once per scheduler iteration (so you return control back to scheduler)?

StatusRequest functionality is great for situations where one task is waiting for another one, which is in turn is waiting for an event (a FIFO queue to fill up for instance).

I will have a closer look, but this is what caught my eye immediately.

Let me know if you need help rewriting the GetDMPData method in event driven way with StatusRequest objects.

Cheers, Anatoli

Sent from a mobile device. Apologies for accidental typos.

-------- Original message -------- From: Costyn van Dongen notifications@github.com Date: 5/6/17 10:43 PM (GMT-05:00) To: arkhipenko/TaskScheduler TaskScheduler@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [arkhipenko/TaskScheduler] Scheduler seems to crash/hang (#33)

Hello,

Thanks for your cool library! But ... :-) I'm running into an weird issue where the scheduler seems to stay stuck during running of one particular task. At that point all tasks stop and no work is done. I would like to ask your assistance in troubleshooting. Extra flags/options/debug statements I could include to pinpoint the issue.

My code can be found here: https://github.com/costyn/Glow-Fur-for-Pro-Mini-with-MPU6050

So at a certain point, only if Task taskGetDMPData is enabled, runner.execute() will not return in void loop(). I have included a heartbeat in loop() to confirm this. I have tried re-enabling tasks in loop() but that didn't make a difference.

I have tried to see if there is a pattern in the number of iterations at which it crashes, but I have not been able to spot a pattern.

The code for the MPU6050 works fine without the scheduler. It's only if I run the data retrieval from the gyro with the scheduler that it stops.

I also checked for memory issues; heap/stack crashes, but the SRAM usage stays constant during running.

Pulling pin 3 to GND triggers an interrupt with some println's, and these still work after the scheduler crashes. So I know the AVR is still not crashed.

Any tips welcome! Thanks!

Kind regards,

Costyn.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/arkhipenko/TaskScheduler/issues/33, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AATGTeeYNBhiH2b2xAoKd8ogFH2UVSLcks5r3S_BgaJpZM4NS_Fk.

costyn commented 7 years ago

Hello Anatoli,

Thanks for your quick and detailed response!

I was also very suspicious of that line, and I think I have ruled it out, but I didn't want to make my initial post too long ;-) But what I did, was add some debugging output (a simple ',' to see if it got stuck there). But at the time of the hang, it doesn't start printing out only long lines of ,,,,,,,,

I naively tried removing the while loop before getFIFOCount(), but it didn't change things.

I decided to use/adapt Jeff Rowberg's example code for using the MPU with the DMP, mostly out of laziness at not having to do my own calculations for the gyro. And also because the math/code involved are above my abilities. ;-) And lastly because the DMP data is a lot more accurate. For my real-world usage, the MPU will be moving in 3 dimensions, and has to read various yaw/pitch/roll angles during different orientations.

I see you don't use the DMP in your scarab project, so I guess the issue is definitely somewhere in the DMP code. What I don't get though, is why that infinite loop situation isn't hit when I don't use the scheduler.

One last question: Timer interrupts aren't related to pin interrupts, right? I mean, it's not a problem that I use pin 2 and 3 for interrupts? I assume you use Timer interrupts for the scheduler? I'm still a little fuzzy on this topic. Because, as I was typing this, I did one more experiment where I disconnected the interrupt pin and now it has been running fine for a long time without hang. I had tried to remove interrupt code before in the MPU example, but that led to unexpected results. Now it seems to work ok, when getDMPData() called by the scheduler.

Thank you for your offer to help out StatusRequest. Perhaps it's not necessary any more. ;-)

Kind regards,

Costyn.

costyn commented 7 years ago

So further experimentation seems to rule out the interrupt issue. Changing the Task interval to 3 ms instead of 1 provides stability. No, still issue. ;-) Disconnecting interrupt pin is most reliable.

I also found this thread on the MPU https://github.com/jrowberg/i2cdevlib/issues/252 and is probably what I am running into. I'll spare you having to read through the whole thing: the summary is there is likely a problem on the Wire library when using Chinese Arduino clones (my board is a Gravitech Nano - not an official Italian one as far as I can see).

arkhipenko commented 7 years ago

Hi Costyn,

Now I remember running into a similar issue with Wire! Don't remember the details, but remember struggling with intermittent hangs as well. I even added watchdog functionality to selectively disable certain tasks (after reboot) if an i2c device failed and was causing wire to hang. Wdt example I believe is based on that code.

Regarding interrupts: TaskScheduler​ does not use any interrupts, and that is why the switching overhead is so low. However, it does mean that tasks are not preempted and have to be coded in a cooperative way (i.e. no long running tasks, short loops, NO DELAY!).

Hope this helps.

I recommend you look into StatusRequest functionality though. It is a great way to implement event driven vs. time driven process. I used it much more extensively in my IoT2 project. Just a thought.

Regards, and good luck with your project Anatoli

PS: Can I reference your project in TaskScheduler readme? What is it? Do you have a page (other than GitHub).

Thanks.

Sent from a mobile device. Apologies for accidental typos.

-------- Original message -------- From: Costyn van Dongen notifications@github.com Date: 5/7/17 7:59 AM (GMT-05:00) To: arkhipenko/TaskScheduler TaskScheduler@noreply.github.com Cc: Anatoli Arkhipenko arkhipenko@hotmail.com, Comment comment@noreply.github.com Subject: Re: [arkhipenko/TaskScheduler] Scheduler seems to crash/hang (#33)

So further experimentation seems to rule out the interrupt issue. Changing the Task interval to 3 ms instead of 1 provides stability.

I also found this thread on the MPU jrowberg/i2cdevlib#252https://github.com/jrowberg/i2cdevlib/issues/252 and is probably what I am running into. I'll spare you having to read through the whole thing: the summary is there is likely a problem on the Wire library when using Chinese Arduino clones (my board is a Gravitech Nano - not an official Italian one as far as I can see).

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/arkhipenko/TaskScheduler/issues/33#issuecomment-299701546, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AATGTfrSQSn2n56lmU_kdCxoTUIr-spLks5r3bISgaJpZM4NS_Fk.

costyn commented 7 years ago

Hi Anatoli,

Thanks again for the long response, from a mobile device? Wow. :)

Yes, thanks. I certainly have no delays and for loops are short (<90 iterations on an array). It is/was also a requirement for the MPU which needs quite a lot of attention to prevent buffer overflows. :)

As for event driven, I don't think I want to delve into that at the moment. There are only 2 events, a button push and interrupt from the MPU, and the latter I don't even really need it now seems. I will certainly look into it for future projects. And I am certain I am going to use TaskScheduler more often. It's awesome!

Of course you can reference it, I would be honored! It is still heavily in development so I haven't made a website/page about it yet. But basically it's this: https://learn.adafruit.com/animated-neopixel-gemma-glow-fur-scarf?embeds=allow on steroids. With many more patterns and the plan is mine also reacting to movement by the wearer.

Cheers,

Costyn.

arkhipenko commented 7 years ago

Hi Costyn,

Great project! I am waiting for my kids to grow up a bit before we start doing something like this together. A few more years...

A few things from my experience with TaskScheduler:

  1. It is great for interrupt handling - the requirement for an ISR is that it is as fast as possible. So what you do is create an interrupt handling task and either keep it disabled, or schedule it waiting for an event via StatusRequest object. Then in the ISR you need to either just enable the task or complete the StatusRequest object. That's it! The scheduler will run the tasks as soon as its place in the chain. I use that technique in my robotic arm project for the wifi communications - check it out! The receiving code only triggers when data is received.

  2. Do you know that you can use pin interrupts on all pins of the nano? I use this library:

https://github.com/GreyGnome/EnableInterrupt

and it works great. However, I believe if you put the chip to deep sleep, you have use pins 2 and 3 as those are the only ones that could wake it up. I used that technique in my pumpkin project - the chip is asleep until motion or a sound is detected. Another side effects of this is that you can detect serial communications via pin interrupt and enable data processing task, so no need to constantly poll for data. (great if you use Bluetooth for comms for instance)

  1. Using EnableInterrupt and TaskScheduler you can easily react to button presses and debounce them in the process: you connect buttons to any pin, EnableInterrupt on those pins, then use task scheduling technique I explained in point #1, but trigger a task in ISR with some delay (say 30 ms). As a result, multiple triggers caused by electric​ sparks are simply ignored. I used that technique in APIS project (the first plant watering).

  2. Another great library is DigitalIO.

https://github.com/greiman/DigitalIO

It allows you using pins as variables and is much faster as well. Great for MPU and other devices requiring fast processing. I am sure you noticed it is scarab project. Highly recommend!

  1. And finally, the layered prioritization is something you want to use for your project for the tasks that have to run as soon as they are ready. Again, scarab uses it for gyro and accelerometer handling. Look into it!

Good luck with your projects. And don't hesitate to ask questions.

Regards Anatoli.

Sent from a mobile device. Apologies for accidental typos. -------- Original message -------- From: Costyn van Dongen notifications@github.com Date: 5/7/17 11:17 AM (GMT-05:00) To: arkhipenko/TaskScheduler TaskScheduler@noreply.github.com Cc: Anatoli Arkhipenko arkhipenko@hotmail.com, Comment comment@noreply.github.com Subject: Re: [arkhipenko/TaskScheduler] Scheduler seems to crash/hang (#33)

Hi Anatoli,

Thanks again for the long response, from a mobile device? Wow. :)

Yes, thanks. I certainly have no delays and for loops are short (<90 iterations on an array). It is/was also a requirement for the MPU which needs quite a lot of attention to prevent buffer overflows. :)

As for event driven, I don't think I want to delve into that at the moment. There are only 2 events, a button push and interrupt from the MPU, and the latter I don't even really need it now seems. I will certainly look into it for future projects. And I am certain I am going to use TaskScheduler more often. It's awesome!

Of course you can reference it, I would be honored! It is still heavily in development so I haven't made a website/page about it yet. But basically it's this: https://learn.adafruit.com/animated-neopixel-gemma-glow-fur-scarf?embeds=allow on steroids. With many more patterns and the plan is mine also reacting to movement by the wearer.

Cheers,

Costyn.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/arkhipenko/TaskScheduler/issues/33#issuecomment-299713045, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AATGTRj8rTUff8zozHu9T0GfXXUzR3jcks5r3eCcgaJpZM4NS_Fk.

costyn commented 7 years ago

Ha! You keep upping your game with longer replies! :-) Impressive!

BTW, your email address has been exposed above. Maybe you want to sanitize it, or perhaps you trust Hotmail's spam filters ;-)

Thank you so much for all the tips. I'm sure they'll be useful! I'm learning so much with this project. Has been the biggest Arduino project I've worked on so far.

I also looked into Blynk, that looks hugely cool! I don't need it yet, but I will definitely use it in the future!

Cheers!

arkhipenko commented 7 years ago

About Blynk: some of the methods in their library are blocking, so it might not be 100% compatible with cooperative multitasking.

I even forked their library to make it cooperative, but then dropped the idea.

I have used their REST protocol instead in my IoT2 project, have a look when you are ready to give it a try.

https://www.instructables.com/id/IoT-APIS-V2-Autonomous-IoT-enabled-Automated-Plant/

Cheers, Anatoli


From: Costyn van Dongen notifications@github.com Sent: Monday, May 8, 2017 9:11 AM To: arkhipenko/TaskScheduler Cc: Anatoli Arkhipenko; Comment Subject: Re: [arkhipenko/TaskScheduler] Scheduler seems to crash/hang (#33)

Ha! You keep upping your game with longer replies! :-) Impressive!

BTW, your email address has been exposed above. Maybe you want to sanitize it, or perhaps you trust Hotmail's spam filters ;-)

Thank you so much for all the tips. I'm sure they'll be useful! I'm learning so much with this project. Has been the biggest Arduino project I've worked on so far.

I also looked into Blynk, that looks hugely cool! I don't need it yet, but I will definitely use it in the future!

Cheers!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/arkhipenko/TaskScheduler/issues/33#issuecomment-299862501, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AATGTZReROXv1TJEzGU9OTnVDMmAN6iLks5r3xRqgaJpZM4NS_Fk.

costyn commented 7 years ago

Ah, I thought you were one of the devs because I saw it as one of your repo's, but it's just a fork.

Thanks for the tips/links.

Well I have had no more issues after disconnecting the interrupt pin, haha. So let's close this issue. Not quite sure where exactly the problem is, but at least I found a fix.