biologist79 / ESPuino

RFID-controlled musicplayer powered by ESP32
https://forum.espuino.de
GNU General Public License v3.0
291 stars 123 forks source link

Esp32 Reboots sometimes when NVS Data is written #29

Closed grch87 closed 3 years ago

grch87 commented 3 years ago

Hi, first of all thank you for the great work you did!

I face a problem, that the Esp32 is rebooting from time to time. I narrowed it down, that it happens when NVS data is written. The reason is that a core panic'ed:

`Titel wurde bei Position 483427 pausiert. Schreibe '#/ManaMana#483427#3#0' in NVS für RFID-Card-ID 164177162177 mit playmode 3 und letzter Track 0

/ManaMana#483427#3#0

Guru Meditation Error: Core 0 panic'ed (Cache disabled but cached memory region accessed) Core 0 register dump: PC : 0x401826cc PS : 0x00060034 A0 : 0x800813dd A1 : 0x3ffbe180 A2 : 0x00000000 A3 : 0x3ffc1530 A4 : 0xc86f19be A5 : 0x00001004 A6 : 0x3ffc5ad0 A7 : 0xffffeffb A8 : 0x80081332 A9 : 0x3ffbe160 A10 : 0x00000000 A11 : 0x00000000 A12 : 0x0ffd114c A13 : 0x00000000 A14 : 0x000f8023 A15 : 0x00000002 SAR : 0x00000014 EXCCAUSE: 0x00000007 EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000 Core 0 was running in ISR context: EPC1 : 0x4008ad89 EPC2 : 0x00000000 EPC3 : 0x40088e44 EPC4 : 0x401826cc`

To reproduce it I use the Audiobook mode and toggle Play/Pause Button as long as the problem occurs.

It seems that I found with the help of google the reason: https://esp32.com/viewtopic.php?t=7684

However, I'm not able to solve it so far and created therefore this issue. Is this a known issue?

biologist79 commented 3 years ago

Hi,

hmm, that's interesting. Indeed in project's description I stated recently "Currently, when re-learning a RFID-tag, Tonuino restarts. Almost certainly it's a memory-issue. Still have to point out.". I discovered this only happens when the music is already playing. So if the music is stopped when NVS-operation takes place, it's fine. Already decoded the stacktrace:

Decoding stack results 0x40170624: rmt_set_tx_intr_en at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/driver/rmt.c line 366 0x40081305: ESP32RMTController::interruptHandler(void*) at .pio/libdeps/lolin32/FastLED/src/platforms/esp/32/clockless_rmt_esp32.cpp line 323 0x40087a78: spi_flash_op_block_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/spi_flash/cache_utils.c line 82 0x40083a97: ipc_task at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/ipc.c line 62 0x40089c11: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143

But didn't have a focus what's the reason so far. Maybe the problem is the same. I'll try to investigate. Try because the link provided is for IDF and not Arduino. We'll see...

Thanks for the hint.

biologist79 commented 3 years ago

Have one thing to add: the problem I described doesn't occur if Neopixel is not enabled.

grch87 commented 3 years ago

Hehe, just wanted to add the same hint with Neopixel. It seems, that I have as well a fix for it, but do not know yet exactly why it is working ^^ When pinning the showLed Task to Core1 instead of Core0 the problem is gone.

Maybe it helps to separate the RFID Task with SPI from the showLed Task?

Edit: Just saw as a side effect that my Neopixelring is no longer showing "error" pixels. I had the issue that sometimes pixel flickered or showed the wrong color. After trying a lot of things on the hardware side I came to my personal conclusion that it is a timing issue on the software side. I have a Ring with 24 LEDs, when selecting just 16 it worked fine. Now this problem is gone :D

biologist79 commented 3 years ago

Cool. Indeed switching to core 0 seems to solve the problem 👍 Regarding the flickering: for me it massively improved by using a JST-PH 2.0-connector instead of jumperwires. But timing is indeed an issue as via serial this is indicated via "BAIL"-stuff. Don't know if that's because it's running as a task.

biologist79 commented 3 years ago

With your fix provided: are you still able to reproduce the audiobook-book-problem when toggling play/pause? I'm not :-)

grch87 commented 3 years ago

With the core change I'm not able to reproduce the problem, so the issue can be closed. Nevertheless I want to understand the problem and I will invest some time on it. If I have some news I let you know! Thank you for the hint with the JST-PH connector. I will consider that in a next board revision. Currently the Neopixel cables are soldered onto the board.

biologist79 commented 3 years ago

I guess it's something with interrupts as Neopixel is interrupt-driven. Disabling Neopixel "fixes" the problem as well. However, big thanks for your input!

biologist79 commented 3 years ago

As it turns out, moving Neopixel-task to the second cpu-core has negative impact to the FTP-transfer-rate (from 185 kB/s down to 165 kB/s). However, it can be rescued a bit if Neopixel-animation is disabled while FTP-transfer is active (176 kB/s). Will have to test a bit...

biologist79 commented 3 years ago

Hi @grch87, as FTP-performance is highly affected when moving Neopixel to 2nd core, I'm trying something else: if NVS-write is in progress, Neopixel-signalisation is disabled. Committed it recently and tested it as well for pressing pause/play multiple times in audiobook-mode. Worked for me. Please test if it also works for you.

grch87 commented 3 years ago

Hi @biologist79, yes your fix is working :) However, it seems that my LED Task freezes short time after start playing audio.

What I know so far: the new pauseNeopixel flag is not the reason. Either the LED task stucks somewhere in the loop (but then the watchdog should be triggered?), the task is killed (but I have no indication via the serial monitor) or the task is no longer getting computation time?

I guess you don't see that issue?

grch87 commented 3 years ago

ok found the issue: There is something wrong in the FastLED library. It seems to crash silently. Tried then older FastLED versions. 3.3.3 crashed. The 3.2.10 seems to work. A lessons learnt would be to use tagged versions for the libraries this project is using.

In the platformio.ini the reference would look like that: https://github.com/FastLED/FastLED.git#3.2.10

What do you think?

biologist79 commented 3 years ago

Hm ok, gonna add this version to plastformio.ini. Thanks for your contribution for MMC. Will revise this tomorrow I guess.

biologist79 commented 3 years ago

Seems as there's been a fix 2 hours ago. Discovered that when music's playing and I tried to access the webgui, Tonuino restarted. Not every time but very often. However, this seems to fix that problem: https://github.com/FastLED/FastLED/pull/1144 BAIL-stuff is also gone now (from the logs).