arkhipenko / TaskScheduler

Cooperative multitasking for Arduino, ESPx, STM32, nRF and other microcontrollers
http://playground.arduino.cc/Code/TaskScheduler
BSD 3-Clause "New" or "Revised" License
1.26k stars 230 forks source link

Exception during destruction of Task #38

Closed ilg closed 7 years ago

ilg commented 7 years ago

I'm using the testing branch (3fc2676) with _TASK_STD_FUNCTION defined in an ESP8266 project and running into an exception that I'm having trouble understanding—I'm hoping I'm missing something obvious. I have a class that has a std::unique_ptr<Task> and when my class is being destroyed, I get an exception that decodes as:


Exception 9: LoadStoreAlignmentCause: Load or store to an unaligned address
Decoding 15 results
0x4020494a: Task::disable() at /Users/ilg/Documents/code/esp8266-wemos-test/esp8266-wemos-test/./Library/TaskScheduler/src/TaskScheduler.h line 608
0x4020c06c: Print::write(unsigned char const*, unsigned int) at /Users/ilg/Library/Arduino15/packages/esp8266/hardware/esp8266/2.3.0/cores/esp8266/Print.cpp line 64
0x402049bc: Task::~Task() at /Users/ilg/Documents/code/esp8266-wemos-test/esp8266-wemos-test/main.cpp line 62
0x40204a28: std::default_delete ::operator()(Task*) const at /Users/ilg/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/bits/unique_ptr.h line 67
0x402374c4: ip_output_if at /Users/igrokhotkov/espressif/arduino/tools/sdk/lwip/src/core/ipv4/ip.c line 631
0x40204a4d: ~unique_ptr at /Users/ilg/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/bits/unique_ptr.h line 185

[followed by my class's destructor, etc.]

In particular, I'm confused by the Print::write() call in there, since Task::~Task() is

Task::~Task() {
    disable();
    if (iScheduler)
        iScheduler->deleteTask(*this);
}

and line 608 of Task::disable() is bool previousEnabled = iStatus.enabled; in

bool Task::disable() {
    bool previousEnabled = iStatus.enabled;
    iStatus.enabled = false;
    iStatus.inonenable = false; 
    if (previousEnabled && iOnDisable) {
        Task *current = iScheduler->iCurrent;
        iScheduler->iCurrent = this;
        iOnDisable();
        iScheduler->iCurrent = current;
    }
#ifdef _TASK_STATUS_REQUEST
    iMyStatusRequest.signalComplete();
#endif
    return (previousEnabled);
}

If more context is necessary, I'll see if I can work up a reasonably-sized example to reproduce the issue. Thanks.

arkhipenko commented 7 years ago

Hmm... Could it be the namespace issue (hitting some other package's Task methods?) Quite honestly I would not use the testing branch since it was not fully tested by me yet (and there are other issues with it as well).

I always created my Tasks to be static global objects, so never needed a destructor to begin with, which worked well so far (after all, we are talking about micro-controllers with [not-so anymore] limited resources here).

TaskScheduler was also never intended to be part of another class, as it is meant to be THE orchestrator of all activities in a sketch, and therefore lives above all the objects.

Having said that, I will try to replicate this if I can. A piece of code would be helpful though.

ilg commented 7 years ago

I'm pretty sure I don't have anything else named Task kicking around. I needed the std::function changes that are in the testing branch in order to use capturing lambdas as task completion handlers—so yeah, I'm taking the risk on unstable code and the answer might just end up being "nope, doesn't work yet."

Overall, I'm trying to take some synchronous networking code and make it asynchronous so that waiting on networking doesn't block other things. There's some potential for multiple networking calls to happen simultaneously, so it seemed like each networking call should have its own Task that existed only for the duration of the network call (and encapsulated all in a class), but perhaps that's the wrong approach. I suppose it'd be possible to have a fixed set of tasks for the networking "library" and have each task handle whatever piece of the process it's handling for each of the various running network calls.

My TaskScheduler object is a single global object and is passed into the object constructor to create task(s), and not held onto by the class/object. I actually have this pattern working well in another class.

The code I have is big enough that I don't want to dump it all in here and I don't think it'd be reasonable to expect someone else to dig through it all, so I'll see if I can create a more reasonably-sized example to replicate the issue.

Thanks for the help—and thanks for the amazing library.

ilg commented 7 years ago

TaskScheduler-issue-38.zip Here's a fairly small sample project that reliably crashes on every run, like my full project... except the crashes are a bit different:

Exception 9: LoadStoreAlignmentCause: Load or store to an unaligned address
Decoding 13 results
0x40202abb: std::_Function_base::~_Function_base() at /Users/ilg/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/functional line 2029
0x40201cd2: std::function ::operator()() const at /Users/ilg/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/functional line 2465
0x40201e04: operator() at /var/folders/jx/_ng199qd4ks972_kpj7nxdy80000gn/T/arduino_build_167489/sketch/Foo.cpp line 43
:  (inlined by) _M_invoke at /Users/ilg/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/functional line 2071
0x40201cd2: std::function ::operator()() const at /Users/ilg/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/functional line 2465
0x40201de8: std::_Function_handler ::_M_invoke(std::_Any_data const&) at /Users/ilg/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/functional line 2073
0x40201df4: _M_invoke at /Users/ilg/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/functional line 2069
0x40201cd2: std::function ::operator()() const at /Users/ilg/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/functional line 2465
0x40202310: Scheduler::execute() at /var/folders/jx/_ng199qd4ks972_kpj7nxdy80000gn/T/arduino_build_167489/sketch/TaskScheduler.h line 884
0x40202188: _M_manager at /Users/ilg/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/functional line 1931
0x40202170: operator() at /Users/ilg/Documents/Arduino/TaskScheduler-issue-38/TaskScheduler-issue-38.ino line 17
:  (inlined by) _M_invoke at /Users/ilg/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/functional line 2071
0x4020235c: loop at /Users/ilg/Documents/Arduino/TaskScheduler-issue-38/TaskScheduler-issue-38.ino line 24
0x402028f0: loop_wrapper at /Users/ilg/Library/Arduino15/packages/esp8266/hardware/esp8266/2.3.0/cores/esp8266/core_esp8266_main.cpp line 56
0x40100114: cont_norm at /Users/ilg/Library/Arduino15/packages/esp8266/hardware/esp8266/2.3.0/cores/esp8266/cont.S line 109

... and removing either const String a or const String b from the parameters of foo() stops the crashing. I'm thoroughly confused, but I'm thinking it's probably some C++ thing I'm doing wrong or maybe some quirk of Arduino and not a TaskScheduler issue.

arkhipenko commented 7 years ago

This is unfortunately beyond my understanding of how lambda functions work... What I was able to read up on the exception is that Arduino is 8bit and ESP is a 32bit architecture, to word alignment is dealt with differetnly

line 608 of Task::disable() is bool previousEnabled = iStatus.enabled; could be a good candidate since who knows how bool is represented between ESP and AVR? (I think on AVR it's 2 bytes).

Anyway, I wish I could help you here...

Another thought: TaskScheduler is not pre-emptive multitasking library, so using it to "wait" for network operations completion maybe not the best use for it? How do you make the wait non-blocking? I would see how that is possible if a task in the main thread is waiting on a StatusRequest object (basically a semaphore), and a network function in another thread triggers it. But that requires multi-threading right there... I was able to do it using pin interrupts generated by the network equipment "data ready" pin. But then again, ISR are not cooperative. Same exact use as you: a one-time, zero delayed task "waiting" to be activated in the ISR routine. Just curious...

ilg commented 7 years ago

I'd expect that a bool is 1 byte, but most of my knowledge is Objective-C and C99, not so much C++, much less C++11.

Given that the only TaskScheduler reference in the sample project exception stack trace is Scheduler::execute(), I think it's reasonable to say it's not really a TaskScheduler issue, so I'm closing this issue.

A lot of the synchronous networking code I looked at was really asynchronous with a timeout-delay or waiting/polling loop under the hood, allowing it to be made asynchronous relatively easily. For example, in ESP8266WiFiGenericClass::hostByName, the underlying dns_gethostbyname call is already asynchronous, so I was able to replace calls to the synchronous WiFi.hostByName with direct calls to dns_gethostbyname (or, actually, write a nicer wrapper around it that was still asynchronous). This one didn't need TaskScheduler at all.

Looking at the NTPClient example, it has a more-or-less arbitrary 1 second delay for a UDP response after sending the UDP packet. For this one, I made it asynchronous by sending the UDP packet, then using TaskScheduler to check for the response repeatedly with a shorter interval (100 or 500 ms or something) and with a maximum number of runs of that task leading to a timeout.

The thing that led to this issue is a more complex web request (which needed the current time under the hood, hence my needing NTP, and NTP needed to get the IP address of time.nist.gov, hence hostByName)... though it's looking more and more like the actual problem is something at a lower level in the way I've constructed my class and/or lambdas, not in the complexity of the web request.