atanisoft / ESP32CommandStation

An ESP32 based DCC Command Station with integrated OpenLCB (LCC) --- NOTE: this project is not under active development.
https://atanisoft.github.io/ESP32CommandStation/
GNU General Public License v3.0
90 stars 34 forks source link

Several crashes + train does not move #115

Closed UpBlueio closed 10 months ago

UpBlueio commented 1 year ago

Hi Again.

Thank for fixing the settings page yesterday. Today I am trying to run a train with the ESP32 and stock motor shield.

I got several crashes with the esp32 while doing several things. First of all, i didn't get my train running. My train does not drive whatever command i give. Only when the esp32 crashes when doing a other action(listed below) it somtimes drives for a few cm before stopping.

Crashes:

Deleting a train from the roster:

Assertion failed in file ../components/OpenMRNIDF/src/openlcb/If.hxx line 344: assert(it != localNodes_.end())

assert failed: void openlcb::If::remove_local_node_from_map(openlcb::Node*) If.hxx:344 (0)

Backtrace:0x40081c2e:0x3ffc04000x400898dd:0x3ffc0420 0x40090c79:0x3ffc0440 0x40151abe:0x3ffc0570 0x4013e539:0x3ffc05a0 0x400fcd34:0x3ffc05c0 0x400ff6d2:0x3ffc05e0 0x400ee518:0x3ffc0610 0x400d9fed:0x3ffc0630 0x400eecfd:0x3ffc0650 0x40123d5b:0x3ffc0670 0x400df181:0x3ffc06a0 0x401cedd3:0x3ffc1550 0x4008cf7d:0x3ffc1570

ELF file SHA256: c69b3535851093b9

Rebooting...

While booting i get this core dump:

Core dump: Task:main (1073485316) crashed at PC 40081c2e Registers: A00: 0x800898e0 A01: 0x3ffc0400 A02: 0x3ffc0484 A03: 0x00000002 A04: 0x0000000a A05: 0x0000ff00 A06: 0x00000017 A07: 0xff000000 A08: 0x00000000 A09: 0x00000001 A10: 0x3ffc044f A11: 0x3ffc044f A12: 0x00000001 A13: 0x3ffaf07c A14: 0x00000000 A15: 0x00000000 EXCCAUSE: 0000001d EXCVADDR: 00000000 EPCX:0:40168d7b 1:00000000 2:00000000 3:00000000 4:00000000 5:00000000 Backtrace: 0x40081c2e 0x400898dd 0x40090c79 0x40151abe 0x4013e539 0x400fcd34 0x400ff6d2 0x400ee518 0x400d9fed 0x400eecfd 0x40123d5b 0x400df181 0x401cedd3 0x4008cf7d

Crash when changing locomotive at the throttle page:

abort() was called at PC 0x4017d302 on core 1

Backtrace:0x40081c2e:0x3ffcb5400x400898dd:0x3ffcb560 0x40090b55:0x3ffcb580 0x4017d302:0x3ffcb600 0x4017d32f:0x3ffcb620 0x401b782b:0x3ffcb640 0x401b7c5e:0x3ffcb660 0x401b83b1:0x3ffcb680 0x400dc391:0x3ffcb6a0 0x40116791:0x3ffcb6d0 0x400e2691:0x3ffcb6f0 0x401c8f77:0x3ffcb7c0 0x4011bafa:0x3ffcb7e0 0x4011bc27:0x3ffcb810 0x40124519:0x3ffcb880 0x40123d5b:0x3ffcb8d0 0x40125123:0x3ffcb900 0x4008cf7d:0x3ffcb920

ELF file SHA256: c69b3535851093b9

Rebooting... ets Jun 8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:2 load:0x3fff0030,len:4704 load:0x40078000,len:14252 load:0x40080400,len:3464 entry 0x40080638

ESP32 Command Station starting up...

Core dump: Task:Esp32WiFiConn (1073527224) crashed at PC 40081c2e Registers: A00: 0x800898e0 A01: 0x3ffcb540 A02: 0x3ffcb5aa A03: 0x3ffcb5d7 A04: 0x0000000a A05: 0x00000000 A06: 0x3ffdce50 A07: 0x6e6f736a A08: 0x00000000 A09: 0x00000001 A10: 0x3ffcb58e A11: 0x3ffcb58e A12: 0x00000000 A13: 0x00060e23 A14: 0x00000001 A15: 0x000024f7 EXCCAUSE: 0000001d EXCVADDR: 00000000 EPCX:0:40168d7b 1:00000000 2:00000000 3:40084151 4:00000000 5:00000000 Backtrace: 0x40081c2e 0x400898dd 0x40090b55 0x4017d302 0x4017d32f 0x401b782b 0x401b7c5e 0x401b83b1 0x400dc391 0x40116791 0x400e2691 0x401c8f77 0x4011bafa 0x4011bc27 0x40124519 0x40123d5b 0x00000011

When using the throttle in JMRI:

Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.

Core 1 register dump: PC : 0x4017a1b5 PS : 0x00060f30 A0 : 0x8017c295 A1 : 0x3ffccb10
A2 : 0x00000002 A3 : 0x00000000 A4 : 0x00000033 A5 : 0x000000f0
A6 : 0x00000000 A7 : 0x00000001 A8 : 0x8016e0ea A9 : 0x3ffccaf0
A10 : 0x00000001 A11 : 0x00000000 A12 : 0x00000000 A13 : 0x00000000
A14 : 0x3ffc0480 A15 : 0x00000000 SAR : 0x00000010 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000002 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff

Backtrace:0x4017a1b2:0x3ffccb100x4017c292:0x3ffccb30 0x4017c851:0x3ffccb60 0x4016f39d:0x3ffccb80 0x4016f440:0x3ffccba0 0x4008cf7d:0x3ffccbd0

ELF file SHA256: c69b3535851093b9

Rebooting... ets Jun 8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:2 load:0x3fff0030,len:4704 load:0x40078000,len:14252 load:0x40080400,len:3464 entry 0x40080638

ESP32 Command Station starting up...

Core dump: Task:tiT (1073532008) crashed at PC 4017a1b2 Registers: A00: 0x8017c295 A01: 0x3ffccb10 A02: 0x00000002 A03: 0x00000000 A04: 0x00000033 A05: 0x000000f0 A06: 0x00000000 A07: 0x00000001 A08: 0x8016e0ea A09: 0x3ffccaf0 A10: 0x00000001 A11: 0x00000000 A12: 0x00000000 A13: 0x00000000 A14: 0x3ffc0480 A15: 0x00000000 EXCCAUSE: 0000001c EXCVADDR: 00000002 EPCX:0:40168d7b 1:00000000 2:00000000 3:40084151 4:00000000 5:00000000 Backtrace: 0x4017a1b2 0x4017c292 0x4017c851 0x4016f39d 0x4016f440 0x4008cf7d

When there are no crashes, the train does not run.

Hope you can solve the issues.

atanisoft commented 1 year ago

Today I am trying to run a train with the ESP32 and stock motor shield.

Which ESP32 model are you using? If it is one that is shaped like the Arduino Uno it is very likely that the A0 and A1 pins are not usable (ESP32 limitation on the ADC pins). Add a jumper wire from A0 to A2 and A1 to A3.

Core dump: Task:main (1073485316) crashed at PC 40081c2e Registers:

This is a duplicate of the Assertion failed in file ../components/OpenMRNIDF/src/openlcb/If.hxx line 344: assert(it != localNodes_.end()) one. For this one, was the train deleted from actively running or was it deleted from the Locomotive Roster page?

If you can give me exact steps you took for this I'll see if I can reproduce and trace through the code as it shouldn't be crashing here.

Core dump: Task:Esp32WiFiConn (1073527224) crashed at PC 40081c2e Registers:

This is a duplicate of abort() was called at PC 0x4017d302 on core 1. Was there any information prior to the abort() call? I'll see if I can decode this trace and see if I can find any clues there.

Core dump: Task:tiT (1073532008) crashed at PC 4017a1b2 Registers:

This task is the LwIP driver task which handles all TCP/IP communication. It looks likely that it ran into an out-of-memory condition. Was JMRI connecting to the ESP32 CS as an OpenLCB hub? If so, can you try enabling the JMRI OpenLCB hub and have ESP32 CS connect to it instead? I'm wondering if JMRI requesting the locomotive FDI data (function configuration) or CDI data (locomotive configuration) is causing a backup of TCP/IP packets in LwIP leading to the crash due to running out of memory buffers.

I'll do some digging and see if I can reproduce this one as well.

UpBlueio commented 1 year ago

Which ESP32 model are you using? If it is one that is shaped like the Arduino Uno it is very likely that the A0 and A1 pins are not usable (ESP32 limitation on the ADC pins). Add a jumper wire from A0 to A2 and A1 to A3.

The Arduino Uno form factor. I connected 12 to 13, A0 to A2 and A1 to A3.

This is a duplicate of the Assertion failed in file ../components/OpenMRNIDF/src/openlcb/If.hxx line 344: assert(it != localNodes_.end()) one. For this one, was the train deleted from actively running or was it deleted from the Locomotive Roster page?

I first opened the Throttle and tried a few addresses to find the address of my loco. Later i found out this makes roster entries. When i try to deleted this entries in the Roster page I get the problem.

This is a duplicate of abort() was called at PC 0x4017d302 on core 1. Was there any information prior to the abort() call? I'll see if I can decode this trace and see if I can find any clues there.

Sadly no

This task is the LwIP driver task which handles all TCP/IP communication. It looks likely that it ran into an out-of-memory condition. Was JMRI connecting to the ESP32 CS as an OpenLCB hub?

ESP32 is the hub, JMRI the client i guess.

If so, can you try enabling the JMRI OpenLCB hub and have ESP32 CS connect to it instead? I'm wondering if JMRI requesting the locomotive FDI data (function configuration) or CDI data (locomotive configuration) is causing a backup of TCP/IP packets in LwIP leading to the crash due to running out of memory buffers.

I will try that this weekend.

I'll do some digging and see if I can reproduce this one as well.

Thank you.

Is there some way how i can debug why the train is not moving? The F functions also does not work.

atanisoft commented 1 year ago

@Nefkens I haven't been able to reproduce the issues you have reported unfortunately. I did find one bug related to deleting a roster entry that may result in the locomotive remaining as "active" which I'll be investigating.

I've made a couple updates on the trainmgr branch that should help improve performance / reliability of LwIP. I'll be merging these changes to master soon.

I'm also using Esp32OlcbNodeBrowser running on an ESP32-S3 for the "hub" which JMRI (and UWT-100/UWT-50) use to communicate with the CS via TWAI (CAN). If you have a spare ESP32 you can try using as the hub. Note that there is a possible performance bug in JMRI that can result in rather slow CDI access over TCP/IP.

Is there some way how i can debug why the train is not moving? The F functions also does not work.

The best way is to check the console output from the ESP32, I'd guess most likely since nothing appears to be working that the track output is not enabled (it defaults to off for safety).

You can enable the track output via the built-in web interface on the CS (click / tap on the power icon in top right, it should turn from red to green) or by sending the OpenLCB event 01.00.00.00.00.00.FF.FE. To send the OpenLCB Event from within JMRI select the Send Frame option from the OpenLCB menu, in the Send OpenLCB event message section enter 01.00.00.00.00.00.FF.FE in the Event ID box and click the Send Event Produced button. This will enable the track output.

atanisoft commented 1 year ago

Assertion failed in file ../components/OpenMRNIDF/src/openlcb/If.hxx line 344: assert(it != localNodes_.end())

assert failed: void openlcb::If::remove_local_node_from_map(openlcb::Node*) If.hxx:344 (0)

Backtrace:0x40081c2e:0x3ffc04000x400898dd:0x3ffc0420 0x40090c79:0x3ffc0440 0x40151abe:0x3ffc0570 0x4013e539:0x3ffc05a0 0x400fcd34:0x3ffc05c0 0x400ff6d2:0x3ffc05e0 0x400ee518:0x3ffc0610 0x400d9fed:0x3ffc0630 0x400eecfd:0x3ffc0650 0x40123d5b:0x3ffc0670 0x400df181:0x3ffc06a0 0x401cedd3:0x3ffc1550 0x4008cf7d:0x3ffc1570

ELF file SHA256: c69b3535851093b9

Rebooting...

I was finally able to reproduce this one, there was a duplicate call to deleting the local node. It is fixed in https://github.com/atanisoft/ESP32CommandStation/pull/111 which will be merged to master.

atanisoft commented 10 months ago

Closing this issue as the reported items have been fixed for nearly a year now.