atanisoft / ESP32CommandStation

An ESP32 based DCC Command Station with integrated OpenLCB (LCC) --- NOTE: this project is not under active development.
https://atanisoft.github.io/ESP32CommandStation/
GNU General Public License v3.0
90 stars 34 forks source link

Crash when powering on track #25

Closed desertspider closed 3 years ago

desertspider commented 4 years ago

ESP32CommandStation crashes when powering on the track. No train is on the track. ESP32 Uno board with aliexpress arduino motorboard shield

stack:

0x4014e4e9: sys_sem_signal at C:\Users\....\.platformio\packages\framework-espidf\components\lwip\port\esp32\freertos/sys_arch.c:203
0x40150b46: lwip_netconn_do_getaddr at C:\Users\...\.platformio\packages\framework-espidf\components\lwip\lwip\src\api/api_msg.c:1925 (discriminator 6)     
0x40141812: tcpip_thread at C:\Users\....\.platformio\packages\framework-espidf\components\lwip\lwip\src\api/tcpip.c:483
0x4008c2e9: vPortTaskWrapper at C:\Users\...\.platformio\packages\framework-espidf\components\freertos/port.c:403
atanisoft commented 4 years ago

How are you powering on the track? Is it via web UI, JMRI, etc?

Also this is not specific to just powering the track, I've seen it at random but haven't been able to track down the cause yet. It appears to possibly due to concurrent VFS select() calls. I'll do some further investigation.

desertspider commented 4 years ago

How are you powering on the track? Is it via web UI, JMRI, etc?

Also this is not specific to just powering the track, I've seen it at random but haven't been able to track down the cause yet. It appears to possibly due to concurrent VFS select() calls. I'll do some further investigation.

Web ui, but same thing with JMRI. Also the web uit is really slow with a lot of timeouts. In version 1.3 this was not a problem.

atanisoft commented 4 years ago

Are you connecting from JMRI as a DCC++ connection?

I haven't seen timeouts in the web interface but there could be issues in this area still, which modules did you enable via Config.h?

desertspider commented 4 years ago

Are you connecting from JMRI as a DCC++ connection?

I haven't seen timeouts in the web interface but there could be issues in this area still, which modules did you enable via Config.h?

DCC++ connection indeed. Config:

// MAIN TRACK NOTORBOARD ENABLED PIN
#define OPS_ENABLE_PIN 25
// MAIN TRACK H-Bridge Thermal Warning Pin
#define OPS_THERMAL_PIN -1
// MAIN TRACK MOTORBOARD CURRENT SENSE ADC PIN
#define OPS_CURRENT_SENSE_ADC ADC1_CHANNEL_0
// MAIN TRACK MOTORBOARD MOTOR_BOARD_TYPE
#define OPS_HBRIDGE_TYPE L298

// PROG TRACK NOTORBOARD ENABLED PIN
#define PROG_ENABLE_PIN 23
// PROG TRACK MOTORBOARD CURRENT SENSE ADC PIN
#define PROG_CURRENT_SENSE_ADC ADC1_CHANNEL_3
// PROG TRACK MOTORBOARD MOTOR_BOARD_TYPE
#define PROG_HBRIDGE_TYPE L298

New stack trace:

stack:
0x4008c928: xQueueGenericSend at C:\Users\...\.platformio\packages\framework-espidf\components\freertos/queue.c:2038
0x4014e4f2: sys_sem_signal at C:\Users\...\.platformio\packages\framework-espidf\components\lwip\port\esp32\freertos/sys_arch.c:203
0x4014026d: lwip_setsockopt_callback at C:\Users\...\.platformio\packages\framework-espidf\components\lwip\lwip\src\api/sockets.c:3532
0x40141871: tcpip_thread at C:\Users\...\.platformio\packages\framework-espidf\components\lwip\lwip\src\api/tcpip.c:483
0x4008c2e9: vPortTaskWrapper at C:\Users\...\.platformio\packages\framework-espidf\components\freertos/port.c:403
desertspider commented 4 years ago

Even if the CommandStation does not crash, the trains are not picking up any signal.

atanisoft commented 4 years ago

Even if the CommandStation does not crash, the trains are not picking up any signal.

Hmm, something isn't setup right. I'll do some digging and see what I can find. I can confirm that the signal IS being generated on the configured pins but something seems amiss after that.

desertspider commented 4 years ago

Even if the CommandStation does not crash, the trains are not picking up any signal.

Hmm, something isn't setup right. I'll do some digging and see what I can find. I can confirm that the signal IS being generated on the configured pins but something seems amiss after that.

Thanks. New backtrace if i send a drive signal by JMRI.

CORRUPT HEAP: Bad tail at 0x3ffe1de0. Expected 0xbaad5678 got 0xffffffff
assertion "head != NULL" failed: file "C:\Users\....\.platformio\packages\framework-espidf\components\heap\multi_heap_poisoning.c", line 214, function: multi_heap_free
abort() was called at PC 0x40187663 on core 0
0x4008c928: xQueueGenericSend at C:\Users\...\.platformio\packages\framework-espidf\components\freertos/queue.c:2038
0x4014e4f2: sys_sem_signal at C:\Users\...\.platformio\packages\framework-espidf\components\lwip\port\esp32\freertos/sys_arch.c:203
0x4014026d: lwip_setsockopt_callback at C:\Users\...\.platformio\packages\framework-espidf\components\lwip\lwip\src\api/sockets.c:3532
0x40141871: tcpip_thread at C:\Users\...\.platformio\packages\framework-espidf\components\lwip\lwip\src\api/tcpip.c:483
0x4008c2e9: vPortTaskWrapper at C:\Users\...\.platformio\packages\framework-espidf\components\freertos/port.c:403
PS C:\Users\...\Desktop\ESP32CommandStation-development> py decoder.py -p ESP32 -e .pio\build\esp32\firmware.elf stack2.txt -t C:\Users\...\.platformio\packages\toolchain-xtensa32
stack:
0x40086d55: invoke_abort at C:\Users\...\.platformio\packages\framework-espidf\components\esp32/panic.c:695
0x40086fe5: abort at C:\Users\...\.platformio\packages\framework-espidf\components\esp32/panic.c:695
0x40187663: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)
0x4008f59b: multi_heap_free at C:\Users\...\.platformio\packages\framework-espidf\components\heap/multi_heap_poisoning.c:301
0x40084365: heap_caps_free at C:\Users\...\.platformio\packages\framework-espidf\components\heap/heap_caps.c:354
0x40086995: _free_r at C:\Users\...\.platformio\packages\framework-espidf\components\newlib/syscalls.c:42
0x4008ce89: vQueueDelete at C:\Users\...\.platformio\packages\framework-espidf\components\freertos/queue.c:2038
0x4012c3e9: _mdns_search_free at C:\Users\...\.platformio\packages\framework-espidf\components\mdns/mdns.c:4219
0x401306c2: mdns_query at C:\Users\...\.platformio\packages\framework-espidf\components\mdns/mdns.c:4609
0x401306f5: mdns_query_ptr at C:\Users\...\.platformio\packages\framework-espidf\components\mdns/mdns.c:4620
0x40107081: mdns_lookup(char const*, addrinfo*, addrinfo**) at c:\Users\...\Desktop\ESP32CommandStation-development/lib/OpenMRNLite/src/utils/SocketClientParams.hxx:181
0x40112225: MDNS::lookup(char const*, addrinfo*, addrinfo**) at c:\Users\...\Desktop\ESP32CommandStation-development/lib\OpenMRNLite\src\os/MDNS.cpp:214
0x401073f4: SocketClient::mdns_lookup(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) at c:\Users\...\Desktop\ESP32CommandStation-development/lib/OpenMRNLite/src/utils/SocketClientParams.hxx:181
0x40107495: SocketClient::start_mdns()::{lambda()#1}::operator()() const at c:\Users\...\Desktop\ESP32CommandStation-development/lib/OpenMRNLite/src/utils/SocketClientParams.hxx:181
  \-> inlined by: std::_Function_handler<void (), SocketClient::start_mdns()::{lambda()#1}>::_M_invoke(std::_Any_data const&) at c:\users\...\.platformio\packages\toolchain-xtensa32\xtensa-esp32-elf\include\c++\5.2.0/functional:1871
0x400d3f0a: std::function<void ()>::operator()() const at c:\users\...\.platformio\packages\toolchain-xtensa32\xtensa-esp32-elf\include\c++\5.2.0\bits/shared_ptr_base.h:127 (discriminator 1)
  \-> inlined by: CallbackExecutable::run() at c:\Users\...\Desktop\ESP32CommandStation-development/lib/OpenMRNLite/src/executor/Executable.hxx:84 (discriminator 1)
0x40104f3b: ExecutorBase::entry() at c:\Users\...\Desktop\ESP32CommandStation-development/lib/OpenMRNLite/src/utils/Destructable.hxx:43
0x400ea77f: OSThread::inherit() at c:\Users\...\Desktop\ESP32CommandStation-development/lib/OpenMRNLite/src/utils/Destructable.hxx:43
  \-> inlined by: Executor<5u>::thread_body() at c:\Users\...\Desktop\ESP32CommandStation-development/lib/OpenMRNLite/src/executor/Executor.hxx:344
  \-> inlined by: openmrn_arduino::OpenMRN::loop_executor() at c:\Users\...\Desktop\ESP32CommandStation-development/lib/OpenMRNLite/src/OpenMRNLite.h:381
  \-> inlined by: app_main at c:\Users\...\Desktop\ESP32CommandStation-development/src/ESP32CommandStation.cpp:283
0x4013ba90: main_task at C:\Users\...\.platformio\packages\framework-espidf\components\esp32/cpu_start.c:542
0x4008c2e9: vPortTaskWrapper at C:\Users\...\.platformio\packages\framework-espidf\components\freertos/port.c:403
desertspider commented 4 years ago

Latest:

Also drive signal by JRMi

Assertion failed in file lib/OpenMRNLite/src/executor/Executor.hxx line 159: assert(os_thread_self() == thread_handle())
assertion "0" failed: file "lib/OpenMRNLite/src/executor/Executor.hxx", line 159, function: void ExecutorBase::assert_current()
abort() was called at PC 0x40187663 on core 1
E (1923) esp_apptrace: Application tracing via TRAX is disabled in menuconfig!
0x40086d55: invoke_abort at C:\Users\...\.platformio\packages\framework-espidf\components\esp32/panic.c:695
0x40086fe5: abort at C:\Users\...\.platformio\packages\framework-espidf\components\esp32/panic.c:695
0x40187663: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)
0x40112150: openlcb::TrainService::register_train(openlcb::TrainNode*) at c:\Users\...\Desktop\ESP32CommandStation-development/lib/OpenMRNLite/src/openlcb/TractionTrain.hxx:226
0x40112171: openlcb::TrainNodeForProxy::TrainNodeForProxy(openlcb::TrainService*, openlcb::TrainImpl*) at c:\Users\...\Desktop\ESP32CommandStation-development/lib/OpenMRNLite/src/openlcb/TractionTrain.hxx:226
0x4011e0f2: commandstation::AllTrainNodes::create_impl(int, commandstation::DccMode, int) at c:\users\...\.platformio\packages\toolchain-xtensa32\xtensa-esp32-elf\include\c++\5.2.0\bits/stl_tree.h:1633
0x4011e246: commandstation::AllTrainNodes::allocate_node(commandstation::DccMode, int) at c:\users\...\.platformio\packages\toolchain-xtensa32\xtensa-esp32-elf\include\c++\5.2.0\bits/stl_tree.h:1633
0x4011e414: commandstation::AllTrainNodes::get_train_impl(commandstation::DccMode, int) at c:\users\...\.platformio\packages\toolchain-xtensa32\xtensa-esp32-elf\include\c++\5.2.0\bits/stl_tree.h:1633
0x400f20af: ThrottleCommandAdapter::process(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) at c:\Users\...\Desktop\ESP32CommandStation-development/src\Interfaces/DCCppProtocol.cpp:613
  \-> inlined by: ThrottleCommandAdapter::process(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) at c:\Users\...\Desktop\ESP32CommandStation-development/src\Interfaces/DCCppProtocol.cpp:269
0x400f26bc: DCCPPProtocolHandler::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at c:\Users\...\Desktop\ESP32CommandStation-development/src\Interfaces/DCCppProtocol.cpp:613
0x400f286d: DCCPPProtocolConsumer::processData[abi:cxx11]() at c:\Users\...\Desktop\ESP32CommandStation-development/src\Interfaces/DCCppProtocol.cpp:613
0x400f2914: DCCPPProtocolConsumer::feed[abi:cxx11](unsigned char*, unsigned int) at c:\Users\...\Desktop\ESP32CommandStation-development/src\Interfaces/DCCppProtocol.cpp:613
0x400f6737: jmriClientHandler(void*) at c:\Users\...\Desktop\ESP32CommandStation-development/src\Interfaces/WiFiInterface.cpp:152
0x40112477: os_thread_start at c:\Users\...\Desktop\ESP32CommandStation-development/lib/OpenMRNLite/src/os/os.c:571
0x4008c2e9: vPortTaskWrapper at C:\Users\...\.platformio\packages\framework-espidf\components\freertos/port.c:403
atanisoft commented 4 years ago

The stacktrace about heap corruption points to the LCC uplink code being problematic and possibly leaking memory or double freeing memory.

Something to try:

  1. connect to web ui
  2. navigate to Command Station tab.
  3. navigate to the Configuration section.
  4. click on "LCC WiFi Uplink"
  5. set "LCC Uplink Mode" to "Manual Only"
  6. Click Save.

The CS may restart after this is updated (verify the console output when clicking save).

...

The back trace for openlcb::TrainService::register_train looks like a bug in the JMRI code in the CS, that is needing a bit more of an overhaul and is tracked here as the only JMRI tagged item left to fix. I'll take care of this as part of my next set of changes on the development branch, but in the meantime comment this line as a temporary workaround.

desertspider commented 4 years ago

...

The back trace for openlcb::TrainService::register_train looks like a bug in the JMRI code in the CS, that is needing a bit more of an overhaul and is tracked here as the only JMRI tagged item left to fix. I'll take care of this as part of my next set of changes on the development branch, but in the meantime comment this line as a temporary workaround.

This part helped with the crash when sending the move commando. Thnx. Sadly the train does not move.

atanisoft commented 4 years ago

@desertspider JMRI DCC++ compatibility has been fixed on the development branch now. No workarounds necessary for that. I'm still investigating the track signal issue since it should be functional since the DCC signal is being generated on the correct pin per the digital scope I've used. I'll be investigating that issue further though.

atanisoft commented 4 years ago

@desertspider Can you do a test for the DCC signal by adding a 1K resistor from pin 25 to GND and one from pin 23 to GND? This should be on the DIRA/B pins of the motor shield.

desertspider commented 4 years ago

@desertspider Can you do a test for the DCC signal by adding a 1K resistor from pin 25 to GND and one from pin 23 to GND? This should be on the DIRA/B pins of the motor shield.

Did not have the time to upload today's code. 1k resistor to PWMA/ground has no effect.

atanisoft commented 4 years ago

1k resistor to PWMA/ground has no effect.

The PWM pin is constant HIGH when the signal is ON so the resistor on that pin won't help. The DIRA and DIRB pins DO change as part of the signal and that is where we will need to put a PU/PD resistor..

Try connecting motor board pin 12 (DIRA) to GND (pin next to 13) and if that doesn't work let's try pin 12 (DIRA) to 3v3 (opposite side of motor shield).

desertspider commented 4 years ago

Try connecting motor board pin 12 (DIRA) to GND (pin next to 13) and if that doesn't work let's try pin 12 (DIRA) to 3v3 (opposite side of motor shield).

Tried both, no effect. Tried sending command by gui and JMRI

atanisoft commented 4 years ago

Ok, let me do some more digging. The PU/PD was suggested by someone at Espressif.

desertspider commented 4 years ago

In version 1.2.3 the train did drive, only very slowly and stuttering. Reading CV did not work, but the train did do the motor stutter. Was trying to lower the sensing threshold because I use n scale. Can't test that version anymore since the python 2 thing.

Because something did work in 1.2.3, I don't think it is a hardware thing.

atanisoft commented 4 years ago

Reading CV did not work, but the train did do the motor stutter.

Did you add the jumpers from A0->A2, A1->A3? On the ESP32 "uno" form factor board the A0 and A1 inputs are not usable.

Was trying to lower the sensing threshold because I use n scale.

Changing sensitivity is not necessary for various scales. The L298 maxes out at 2A per output and the CS will allow up to about 75-80% of this by default before it considers it a short (it will go way over this limit if it is really a short).

In version 1.2.3 the train did drive, only very slowly and stuttering.

There should definitely be no stuttering, that sounds like possibly dirty track or track feeders too far apart, possibly even too low of voltage for the track (Use a minimum of 12V DC supply).

Because something did work in 1.2.3, I don't think it is a hardware thing.

Likely it is not a hardware issue for the esp32/motor shield itself. On the software side the primary difference between v1.2.x and v1.5.x is the usage of the RMT peripheral instead of a hardware timer to generate the DCC signal on the DIR pin. The underlying hardware should be using the same voltage level for HIGH/LOW with both methods but there may be a slight difference in the RMT which we need to confirm.

desertspider commented 4 years ago

Did you add the jumpers from A0->A2, A1->A3? On the ESP32 "uno" form factor board the A0 and A1 inputs are not usable.

I did A0 to A4 and A1 to A5

Changing sensitivity is not necessary for various scales. The L298 maxes out at 2A per output and the CS will allow up to about 75-80% of this by default before it considers it a short (it will go way over this limit if it is really a short).

In DCC++ i had to change #define ACK_SAMPLE_THRESHOLD to 10 because otherwise it didnt work. Not sure where this setting is in ESP32CommandStaion

There should definitely be no stuttering, that sounds like possibly dirty track or track feeders too far apart, possibly even too low of voltage for the track (Use a minimum of 12V DC supply).

On same track, same powersupply, same train and same motorboard, but on a mega, it works just fine.

Likely it is not a hardware issue for the esp32/motor shield itself. On the software side the primary difference between v1.2.x and v1.5.x is the usage of the RMT peripheral instead of a hardware timer to generate the DCC signal on the DIR pin. The underlying hardware should be using the same voltage level for HIGH/LOW with both methods but there may be a slight difference in the RMT which we need to confirm.

atanisoft commented 4 years ago

In DCC++ i had to change #define ACK_SAMPLE_THRESHOLD to 10 because otherwise it didnt work. Not sure where this setting is in ESP32CommandStaion

That setting does not exist on the ESP32 CS. It is not necessary to adjust that as the current sense code is configured to scale the NMRA DCC spec of 60mA to the appropriate ADC value. On the AVR DCC++ code this was approximated and hard coded regardless of the h-bridge used and due to the nature of the ADC on the AVR could lead to inaccurate detection of the ACK pulses since they did not cross the hard coded threshold.

I did A0 to A4 and A1 to A5

That should work fine as the defaults are set to those mappings for most the "uno" form factor ESP32 boards. Please ensure that A4 is GPIO 36 and A5 is GPIO 39 on the ESP32. If the silk screen shows a different pin you will need to update Config_HBridge.h according to the pin mappings table in the comments.

On same track, same powersupply, same train and same motorboard, but on a mega, it works just fine.

Ok, something is definitely not adding up. I can't say for certain what it might be unfortunately. I haven't been able to reproduce this behavior in my testing thus far.

desertspider commented 4 years ago

Quick update. I Installed 1.1.1, and it works flawlessly. So; 1.1.1: works 1.2.3: train stutters latest: no train movement

Definitively no hardware issue. Hope this helps.

atanisoft commented 3 years ago

I believe this has now been addressed.