espressif / arduino-esp32

Arduino core for the ESP32
GNU Lesser General Public License v2.1
13.42k stars 7.37k forks source link

Intermittent crash RadioHead on Heltec LoRa32 #2245

Closed i4things closed 5 years ago

i4things commented 5 years ago

error is

16:32:46: Guru Meditation Error: Core 1 panic'ed (Cache disabled but cached memory region accessed) 16:32:46: Guru Meditation Error: Core 1 panic'ed (IllegalInstruction). Exception was unhandled. 16:32:46: Memory dump at 0x401fe728: bad00bad bad00bad bad00bad

Decoding 1 results 0x401fe728: std::ctype ::do_scan_not(char, wchar_t const, wchar_t const) const at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/libstdc++-v3/src/c++11/ctype_members.cc line 192

Hardware:

Board: Heltec LoRa32 Core Installation/update date: Latest commit 1085e9a (latest core) IDE name: Arduino IDE Flash Frequency: 80Mhz Upload Speed: 115200 Computer OS: Windows 10

Description:

Intermittent crash when trying to send data over LoRa (RF95)

i4things commented 5 years ago

Second run same exception on different place according to Exception Decoder- no stack trace in the exception

17:09:25: Guru Meditation Error: Core 1 panic'ed (Cache disabled but cached memory region accessed) 17:09:25: Guru Meditation Error: Core 1 panic'ed (IllegalInstruction). Exception was unhandled. 17:09:25: Memory dump at 0x401fe7d0: bad00bad bad00bad bad00bad

Decoding 1 results 0x401fe7d0: std::ctype ::do_tolower(char, char const) const at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/libstdc++-v3/src/c++11/ctype_configure_char.cc line 102

me-no-dev commented 5 years ago

both errors point to the fact that inside IRAM interrupt (probably GPIO) flash has been accessed. Try to find another lib that is compatible with esp32

i4things commented 5 years ago

The part of the Lib that I use RH_95Driver is not complex and pretty strait forward code.

The only issue that I can see is that the Lib was missing IRAM_ATTR in the isr function and in the calls done from it - I added the IRAM_ATTR and the crash start to be a lot more rare but still it continue to presznt itself from time to time.

btw: I have the same Lib running on 4 devices Heltec LoRA32 for about 8 months 24/7 without single crash and no IRAM_ATTR but with a lot older version of ESP32 core.

i4things commented 5 years ago

I will comment out all the use of the RadioHead library and run the app again for hours to see if it crashes again.. may be I'm wrong the the problem is not in RadioHead...

me-no-dev commented 5 years ago

you have to have all functions that are called inside IRAM function also to be in IRAM, else flash will be accessed

i4things commented 5 years ago

How IRAM_ATTR works for methods inside classes ? the whole class ? virtual methods ? static one ?

or it just work only for normal static functions ?

stickbreaker commented 5 years ago

@i4things any function called while in interrupt must be marked IRAM. Keep the code simple and short. No possible delay() or heap allocation, device driver call (WiFi, Serial, I2C, I2S). If your interrupt needs to initiate a complex task, create a specific TASK, use one of the FreeRTOS semaphore,Task signal functions to release the task. Your interrupt code needs to complete within 10ms or less.

Chuck.

i4things commented 5 years ago

is it possible to use SPI from inside interrupt ?- to read/write registers and read data from the LoRa or all this need to be done in a TASK ? ( the completion time is not a issue should be microseconds)

i4things commented 5 years ago

in brief the suggestion is in the ISR do nothing else except building a TASK and do everything that needed to be done in this task when it executes ? right ?

stickbreaker commented 5 years ago

I wouldn't, Can you guarantee that your SPI code will never be interrupted? I use i2c for multiple sensors, I have a 4x4 keypad attached through a MCP23008 I2C IO expander. It generates an interrupt when it detects a keypress event. I have a separate task that hangs on a semaphore. The semaphore is posted inside the keypress ISR. The i2c bus is shared with mutex semaphores between tasks. So, when I am updating my 20x4 LCD or 128x64 OLED and a keypress interrupt is generated, the keypad task is released from it's wait status by the ISR posting to the Semaphore. Then this task gains control of the i2c Bus mutex(with a 250ms timeout), It then process the I2C commands necessary detect the key press, adds this key to the inputbuffer, releases i2c bus semaphore, goes back to into its wait status. The interrupt is quick, in and out, the actual key read can take multiple milliseconds.

Chuck.

stickbreaker commented 5 years ago

@i4things the Task should exist before the interrupt is configured, I would pass the semaphore handle to the ISR when you allocated the interrupt.

You cannot 'build' a task during an interrupt context.

i4things commented 5 years ago

OK then the logic should be :

create a TASK that blocks on a semaphore, the ISR unblock the semaphore, and the task can do the job later...

a bit of possible race condition if a interrupt is fired again while the task hasn't been done yet... two options here:

    • skip the second interrupt if the task is not finished the previous one - possible loss of data...( may be OK depends how often will happened)
  1. block the second interrupt until the task has finished - which most likely will lead to deadlock

stickbreaker commented 5 years ago

Option 1. If the task cannot be completed within the interrupt interval, the task is too complex. Real world is real. If a task needs to execute every 100ms, it cannot take longer than 100ms to execute. There are multiple types of semaphores, one is a counting semaphore, it can be given multiple times (increase count), but only taken if count > 0.

Chuck.

i4things commented 5 years ago

Thanks

Will give it a go :)

Vasko

stickbreaker commented 5 years ago

If you are satisfied, close this issue.

Chuck.

i4things commented 5 years ago

Solution found.

Thank you!