atanisoft / ESP32CommandStation

An ESP32 based DCC Command Station with integrated OpenLCB (LCC) --- NOTE: this project is not under active development.
https://atanisoft.github.io/ESP32CommandStation/
GNU General Public License v3.0
90 stars 34 forks source link

IDF v4.1: CS crashes during init of status LEDs #50

Closed TrainzLuvr closed 4 years ago

TrainzLuvr commented 4 years ago

17:08:48.433 -> ESP32 Command Station v1.5.0 (v1.5.0-alpha3-13-g1e4fb03) starting up... 17:08:48.440 -> Compiled on Sep 3 2020 11:21:47 using IDF v4.1 17:08:48.440 -> Running from: ota_0 17:08:48.487 -> ESP32 Command Station uses the OpenMRN library 17:08:48.487 -> Copyright (c) 2019-2020, OpenMRN 17:08:48.487 -> All rights reserved. 17:08:48.487 -> [NVS] Initializing NVS 17:08:48.487 -> [TimeZone] EST5EDT 17:08:48.487 -> [ADC] Configure 12-bit ADC resolution 17:08:48.535 -> [FS] SD card 'SD32G' mounted, max capacity 29832.00 MB 17:08:48.568 -> [FS] SD FAT usage: 0.34/29820.00 MB 17:08:48.568 -> [FS] SD will be used for persistent storage. 17:08:48.568 -> [FS] Dumping content of filesystem: /cfg 17:08:48.568 -> [FS] /cfg/TEST.TXT (154 bytes) mtime: Sat Jan 1 01:00:00 2000 17:08:48.568 -> [FS] /cfg/ESP32CS/wifi-sta (24 bytes) mtime: Wed Dec 31 19:05:16 1980 17:08:48.568 -> [FS] /cfg/ESP32CS/trains.json (216 bytes) mtime: Wed Dec 31 19:03:00 1980 17:08:48.568 -> [FS] /cfg/ESP32CS/wifi-ap (15 bytes) mtime: Wed Dec 31 19:07:24 1980 17:08:48.615 -> [FS] /cfg/LCC/cdi.xml (5574 bytes) mtime: Wed Dec 31 19:00:06 1980 17:08:48.615 -> [FS] /cfg/LCC/config (450 bytes) mtime: Wed Dec 31 19:06:50 1980 17:08:48.615 -> [FS] /cfg/LCC/train.xml (5406 bytes) mtime: Wed Dec 31 19:00:06 1980 17:08:48.615 -> [FS] /cfg/LCC/tmptrain.xml (2593 bytes) mtime: Wed Dec 31 19:00:06 1980 17:08:48.615 -> [LCC] Initializing Stack (node-id: 50101013900) 17:08:48.615 -> [LCC] Enabling CAN interface (rx: 27, tx: 32) 17:08:48.662 -> [WiFi] Starting WiFiManager 17:08:48.662 -> [Turnout] Initializing DCC Turnout database 17:08:48.662 -> [FS] /cfg/ESP32CS/turnouts.json does not exist, returning blank json object 17:08:48.662 -> [Turnout] Loaded 0 DCC turnout(s) 17:08:48.662 -> [Track] Registering /dev/track VFS interface 17:08:48.662 -> [RailCom] Initializing detector using UART 2 17:08:48.662 -> [RailCom] Configuring hardware timer (0:0)... 17:08:48.662 -> [OPS] Configuring H-Bridge (LMD18200 3000 mA max) using ADC 1:0 17:08:48.662 -> [OPS] Short limit 3686/4096 (2699.71 mA), events (on: 0, off: 0) 17:08:48.708 -> [OPS] Shutdown limit 4080/4096 (2988.28 mA), events (on: 0, off: 0) 17:08:48.708 -> [PROG] Configuring H-Bridge (L298 2000 mA max) using ADC 1:3 17:08:48.708 -> [PROG] Short limit 512/4096 (250.00 mA), events (on: 0, off: 0) 17:08:48.708 -> [PROG] Shutdown limit 1024/4096 (500.00 mA), events (on: 0, off: 0) 17:08:48.708 -> [PROG] Prog ACK: 122/4096 (59.57 mA) 17:08:48.708 -> [Track] Registering LCC EventConsumer for Track Power (On:10000000000fffe, Off:10000000000ffff) 17:08:48.708 -> [OPS] DCC config: zero: 96uS, one: 58uS, preamble-bits: 16, wave: high, low 17:08:48.708 -> [eStop] Registering emergency stop handler (On: 10000000000fffd, Off:10000000000fffc) 17:08:48.754 -> [OPS] signal pin: 17, RMT(ch:0,mem:73[2],clk-div:80,clk-src:APB) 17:08:48.754 -> [Track] Connecting track interface (track:OPS, fd:0) 17:08:48.754 -> [OPS] Starting signal generator 17:08:48.754 -> [Track] Connecting track interface (track:PROG, fd:3) 17:08:48.754 -> [PROG] DCC config: zero: 96uS, one: 58uS, preamble-bits: 22, wave: high, low 17:08:48.754 -> [PROG] signal pin: 19, RMT(ch:3,mem:79[2],clk-div:80,clk-src:APB) 17:08:48.754 -> [CDI] Checking /cfg/LCC/cdi.xml... 17:08:48.754 -> [PROG] Starting signal generator 17:08:48.754 -> [CDI] File /cfg/LCC/cdi.xml appears up-to-date (len 5574 vs 5574) 17:08:48.801 -> [CDI] Registering CDI with stack... 17:08:48.801 -> [LCC] Config file opened using fd:6 17:08:48.801 -> [LCC] Creating automatic fsync(6) calls every 10 seconds. 17:08:48.801 -> [CDI] Checking /cfg/LCC/train.xml... 17:08:48.801 -> [CDI] File /cfg/LCC/train.xml appears up-to-date (len 5406 vs 5406) 17:08:48.801 -> [CDI] Checking /cfg/LCC/tmptrain.xml... 17:08:48.848 -> [CDI] File /cfg/LCC/tmptrain.xml appears up-to-date (len 2593 vs 2593) 17:08:48.848 -> [TrainDB] Initializing... 17:08:48.848 -> [TrainDB] Registering 5903 - 5903 (idle: On, limited: Off) 17:08:48.848 -> [TrainDB] Found 1 persistent roster entries. 17:08:48.848 -> 17:08:48.848 -> 17:08:48.848 -> ESP32 Command Station Startup complete! 17:08:48.848 -> 17:08:48.848 -> [Status] Initializing LEDs (color-mode:RGB, protocol:RMT(6)-APA106, pin: 22, brightness: 128) 17:08:50.641 -> Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0) 17:08:50.641 -> Core 0 register dump: 17:08:50.641 -> PC : 0x40082664 PS : 0x00050034 A0 : 0x401a9dc9 A1 : 0x3ffb0690
17:08:50.688 -> A2 : 0x00000000 A3 : 0x00000040 A4 : 0x00080000 A5 : 0x3ffb0670
17:08:50.688 -> A6 : 0x00000101 A7 : 0x3ff6e000 A8 : 0x00005c84 A9 : 0x40088378
17:08:50.688 -> A10 : 0x00017030 A11 : 0x00000180 A12 : 0x3ffdcf88 A13 : 0x00000000
17:08:50.688 -> A14 : 0x00000000 A15 : 0x00000000 SAR : 0x0000001a EXCCAUSE: 0x00000005
17:08:50.688 -> EXCVADDR: 0x00000000 LBEG : 0x40101ced LEND : 0x40101d00 LCOUNT : 0x00000000
17:08:50.688 -> Core 0 was running in ISR context: 17:08:50.688 -> EPC1 : 0x401a9dd2 EPC2 : 0x401a9dc9 EPC3 : 0x00000000 EPC4 : 0x40082664 17:08:50.734 -> 17:08:50.734 -> ELF file SHA256: 733616ca23d49ef5 17:08:50.734 -> 17:08:50.734 -> Backtrace: 0x40082661:0x3ffb0690 0x401a9dc6:0x3ffbf150 0x40138dee:0x3ffbf170 0x40102195:0x3ffbf1c0 0x401023b5:0x3ffbf1f0 0x4011a555:0x3ffbf240 0x4011a361:0x3ffbf270 0x400da209:0x3ffbf2a0 0x400d375b:0x3ffbfc30 17:08:50.734 -> 17:08:50.734 -> Core 1 register dump: 17:08:50.734 -> PC : 0x401aa006 PS : 0x00060534 A0 : 0x800d488c A1 : 0x3ffc0b10
17:08:50.734 -> A2 : 0x00000000 A3 : 0x00000001 A4 : 0x800891a2 A5 : 0x3ffb0c40
17:08:50.734 -> A6 : 0x00000003 A7 : 0x00060623 A8 : 0x800d43da A9 : 0x3ffc0ae0
17:08:50.775 -> A10 : 0x00000000 A11 : 0x00000000 A12 : 0x800f419d A13 : 0x3ffb0bf0
17:08:50.775 -> A14 : 0x00020f01 A15 : 0x00060023 SAR : 0x00000000 EXCCAUSE: 0x00000005
17:08:50.775 -> EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000
17:08:50.775 -> 17:08:50.775 -> ELF file SHA256: 733616ca23d49ef5 17:08:50.775 -> 17:08:50.775 -> Backtrace: 0x401aa003:0x3ffc0b10 0x400d4889:0x3ffc0b30 0x4008974c:0x3ffc0b50 17:08:50.775 -> 17:08:50.775 -> Rebooting...

atanisoft commented 4 years ago

This looks related to an issue I'm trying to resolve with NeoPixelBus which is used by the status leds. If you can try disabling CONFIG_STATUS_LED it should fix it while I resolve the issue with NeoPixelBus.

atanisoft commented 4 years ago

@TrainzLuvr can you test status LED with https://github.com/atanisoft/ESP32CommandStation/commit/ead408a20356cff21333ea6b546c46a6e49f71b9 ? I've done a small bit of testing locally and it looks like it works.

TrainzLuvr commented 4 years ago

The LEDs are back but I don't know about stability.

It crashed and rebooted at least once, after I force-reloaded the main UI page:

22:58:41.652 -> [WS 192.168.x.x/51] Connected 22:58:41.746 -> Incoming connection from 192.168.x.x, fd 52. 22:58:42.875 -> [WS 192.168.x.x/51] Disconnected 22:58:42.922 -> Incoming connection from 192.168.x.x, fd 51. 22:58:42.922 -> Incoming connection from 192.168.x.x, fd 52. 22:58:42.922 -> Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled. 22:58:42.970 -> Core 0 register dump: 22:58:42.970 -> PC : 0x40157d06 PS : 0x00060730 A0 : 0x80159ae7 A1 : 0x3ffda510
22:58:42.970 -> A2 : 0x0000009e A3 : 0x0000059b A4 : 0x3ffd9758 A5 : 0x00000000
22:58:42.970 -> A6 : 0x3ffd9768 A7 : 0x00000000 A8 : 0x8014b611 A9 : 0x3ffda4f0
22:58:42.970 -> A10 : 0x00000000 A11 : 0x3ffb3e68 A12 : 0x800883eb A13 : 0x00000000
22:58:42.970 -> A14 : 0x00000003 A15 : 0x00060023 SAR : 0x00000019 EXCCAUSE: 0x0000001c
22:58:42.970 -> EXCVADDR: 0x0000009e LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff
22:58:43.018 -> 22:58:43.018 -> ELF file SHA256: 3bc37063189aed11 22:58:43.018 -> 22:58:43.018 -> Backtrace: 0x40157d03:0x3ffda510 0x40159ae4:0x3ffda530 0x40148719:0x3ffda550 22:58:43.018 -> 22:58:43.018 -> Rebooting... 22:58:43.018 -> ets Jun 8 2016 00:22:57

I think if I keep CTRL+R or CTRL+F5 to reload it would crash again after a few times. Also, 2nd LED from the left turned red at one point, then CS crashed and rebooted.

atanisoft commented 4 years ago

Can you run this through IDF.py monitor so that it decodes the backtraces?

As for the led, that is likely due to a small delay in the data line when the rmt reloads the buffer. I have a couple ideas of how to fix it so I'll do some testing tomorrow and push an update.

TrainzLuvr commented 4 years ago

Another crash, after editing one loco in my roster and turning track on and off:

23:07:29.346 -> Incoming connection from 192.168.x.x, fd 53. 23:07:29.440 -> abort() was called at PC 0x400d6977 on core 0 23:07:29.440 -> 23:07:29.440 -> ELF file SHA256: 3bc37063189aed11 23:07:29.440 -> 23:07:29.440 -> Backtrace: 0x40086f24:0x3ffd4b90 0x4008734d:0x3ffd4bb0 0x400d6977:0x3ffd4bd0 0x401968e9:0x3ffd4bf0 0x40196dcd:0x3ffd4c10 0x40196f4c:0x3ffd4c30 0x40197455:0x3ffd4c70 0x401974a3:0x3ffd4c90 0x400da581:0x3ffd4cb0 0x400e2581:0x3ffd4cd0 0x400e2717:0x3ffd4d00 0x400e2593:0x3ffd4d30 0x400e29be:0x3ffd4d60 0x400e58b5:0x3ffd5010 0x400e5b1e:0x3ffd52e0 0x400ebf29:0x3ffd5330 0x401a7e8b:0x3ffd53f0 0x40115e23:0x3ffd5410 0x4011a579:0x3ffd5460 0x4011a385:0x3ffd5490 0x401a8b15:0x3ffd54c0 0x4012596f:0x3ffd54e0 23:07:29.487 -> 23:07:29.487 -> Rebooting... 23:07:29.487 -> ets Jun 8 2016 00:22:57

TrainzLuvr commented 4 years ago

Connecting from idf.py monitor then CTRL+C and then running idf.py monitor again causes it to reboot.

atanisoft commented 4 years ago

IDF.py monitor toggles the DTR/RTS lines on connect so a restart is expected unfortunately.

The abort() looks memory related but to know for sure we will need the decode of the backtrace.

TrainzLuvr commented 4 years ago

How do we get the decode of the backtrace?

atanisoft commented 4 years ago

How do we get the decode of the backtrace?

idf.py monitor should be doing that for you, but if you are not using that you can download decoder.py and run it as: decoder.py -p ESP32 -t ~/.espressif/tools/xtensa-espre32-elf/esp-2020r2-8.2.0/xtensa-esp32-elf -e build/ESP32CommandStation.elf {file containing stacktrace}

TrainzLuvr commented 4 years ago

Well the last abort() crash was obtained via idf.py monitor, yet there was no decode of the backtrace so that's why I'm asking.

atanisoft commented 4 years ago
17:08:48.848 -> [Status] Initializing LEDs (color-mode:RGB, protocol:RMT(6)-APA106, pin: 22, brightness: 128)
17:08:50.641 -> Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
17:08:50.641 -> Core 0 register dump:

This stems from the RailCom UART ISR and Status LED RMT ISR competing for ISR time cycles resulting in effectively a deadlock. I'd suggest disable either the status led or RailCom while I rework the RailCom UART ISR so it doesn't block.

Well the last abort() crash was obtained via idf.py monitor, yet there was no decode of the backtrace so that's why I'm asking.

Were you running idf.py monitor from the CS code directory (where sdkconfig and build directory are present)? In any case, the above usage of decoder.py should be able to decode it.

TrainzLuvr commented 4 years ago

Yes, I was in the code top folder.

Should the Railcom and Status LED be competing for ISR cycles like this - sounds like a problem with ISR then?

atanisoft commented 4 years ago

Should the Railcom and Status LED be competing for ISR cycles like this - sounds like a problem with ISR then?

Normally no they should not be competing at all. But there is a race condition that wasn't occurring in IDF v4.0 that is now showing up in IDF v4.1+ relating to the amount of data being sent to the RMT.

I've partially resolved the race condition as part of https://github.com/atanisoft/ESP32CommandStation/commit/ead408a20356cff21333ea6b546c46a6e49f71b9 but it has the side effect of the LED blinking red after the first 64 bits have been transmitted to the LEDs. I've fixed that blink locally but it brings back the ISR WDT failure.

TrainzLuvr commented 4 years ago

Sounds like this is a regression in IDF v4.1 that Espressif needs to look into?

atanisoft commented 4 years ago

Sounds like this is a regression in IDF v4.1 that Espressif needs to look into?

There is likely a change in the IDF RMT ISR that is exposing this but with the DCC code not using the IDF API to write data to the RMT it likely would be rejected as an issue. I'll dig into this further and see what can be done to mitigate the race condition properly.

atanisoft commented 4 years ago

Fixed the RailCom side here: https://github.com/atanisoft/ESP32CommandStation/commit/5baf5276f024766ad850c9c6601af93014dd530b Added note about IDF v4.1 issues in https://github.com/atanisoft/ESP32CommandStation/commit/4751f60f4e43dc0a305cfcdca7d6b006c9d14a1d.

In short, IDF v4.1 while being "stable" it has a few issues which break it's usability for all modules and as such is not recommended.

atanisoft commented 3 years ago

Filed https://github.com/espressif/esp-idf/issues/5992 to track the RMT truncation.