Closed TrainzLuvr closed 2 years ago
Copying from https://github.com/atanisoft/ESP32CommandStation/pull/84:
abort() was called at PC 0x400d5c47 on core 1
0x400d5c47: __cxa_end_catch at C:/espressif/esp-idf/components/cxx/cxx_exception_stubs.cpp:13
Backtrace:0x40088c7e:0x3ffd2cf0 0x40089291:0x3ffd2d10 0x4008fd91:0x3ffd2d30 0x400d5c47:0x3ffd2db0 0x40195bb1:0x3ffd2dd0 0x40196159:0x3ffd2df0 0x401962d8:0x3ffd2e10 0x401967e1:0x3ffd2e50 0x4019685a:0x3ffd2e70 0x4019686d:0x3ffd2e90 0x400eb4f9:0x3ffd2eb0 0x401a7577:0x3ffd2f80 0x401159a1:0x3ffd2fa0 0x40119bf1:0x3ffd2ff0 0x401199fd:0x3ffd3020 0x401a8259:0x3ffd3050 0x40125e37:0x3ffd3070
0x40088c7e: panic_abort at C:/espressif/esp-idf/components/esp_system/panic.c:341
0x40089291: esp_system_abort at C:/espressif/esp-idf/components/esp_system/system_api.c:106
0x4008fd91: abort at C:/espressif/esp-idf/components/newlib/abort.c:46
0x400d5c47: __cxa_end_catch at C:/espressif/esp-idf/components/cxx/cxx_exception_stubs.cpp:13
0x40195bb1: operator new(unsigned int) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54
0x40196159: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned int&, unsigned int) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/ext/new_allocator.h:111
(inlined by) ?? at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/alloc_traits.h:436
(inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned int&, unsigned int) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/basic_string.tcc:153
0x401962d8: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned int, unsigned int, char const*, unsigned int) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/basic_string.tcc:317
0x401967e1: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned int) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/basic_string.tcc:370
0x4019685a: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::append(char const*) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/basic_string.h:1259
0x4019686d: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator+=(char const*) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/basic_string.h:1169
0x400eb4f9: process_loco(http::HttpRequest*) at r:\code\esp32commandstation\build/../main/WebServer.cpp:1071
0x401a7577: std::_Function_handler<http::AbstractHttpResponse* (http::HttpRequest*), http::AbstractHttpResponse* (*)(http::HttpRequest*)>::_M_invoke(std::_Any_data const&, http::HttpRequest*&&) at c:\espressif\tools\xtensa-esp32-elf\esp-2020r2-8.2.0\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\8.2.0\bits/std_function.h:282
0x401159a1: http::HttpRequestFlow::process_request_handler() at c:\espressif\tools\xtensa-esp32-elf\esp-2020r2-8.2.0\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\8.2.0\bits/std_function.h:687 (inlined by) http::HttpRequestFlow::process_request_handler() at r:\code\esp32commandstation\build/../components/HttpServer/src/HttpRequestFlow.cpp:398
0x40119bf1: StateFlowBase::run() at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/executor/StateFlow.cpp:63 (discriminator 4)
0x401199fd: ExecutorBase::entry() at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/executor/Executor.cpp:324
0x401a8259: OSThread::start(void*) at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/os/OS.hxx:193
0x40125e37: os_thread_start at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/os/os.c:391
This appears to be due to low heap and additional work will be needed both in the CS and HttpServer stack likely.
@TrainzLuvr can you test with CONFIG_LWIP_TCP_SND_BUF_DEFAULT=2048
in sdkconfig?
This on the master branch, what should I be looking for?
Testing to see how bad heap gets with UWT-100, CAN and Hub working. The question will then be how does this setting impact TCP/IP performance.
I see it's set to 5744 atm:
CONFIG_LWIP_TCP_SND_BUF_DEFAULT=5744
CONFIG_LWIP_TCP_WND_DEFAULT=5744
Yeah, it is 4x MSS by default. I'm looking at an option to set this only for GC TCP/IP connections as well but this is a question for test of the setting to see if it might help some
And only SND_BUF should be lowered
So then run full blown CS, not ESP32 in stand-alone mode?
I'll connect my USB-UART adapter to it, hopefully it does not blow my ESP32.
I haven't had a chance to make an opto-coupler addon for it, need to see if I have any logic level opto-couplers on hand.
I'd say start in standalone to see how it behaves as a quick test. If it seems stable try it on the full blown setup and I'll add some code to tweak this for just the hub mode
Ok but then there are no peripherals connected to it (no CAN, or DCC) except for WiFi.
Yeah, that should be fine. I suspect that is where the heap is going since the mainBufferPool grows and is used for most of the traffic
I grabbed another ESP32 to play with so I can keep the first one in the CS.
I went to check whether Hub was on and it wasn't so I turned it on. But then I noticed a crash in the monitor:
[LCC] Enabling CAN interface, reinitialization required.
[Reboot] Shutting down DCC signal generator(s)...
[Reboot] Shutting down DCC turnout manager...
[Reboot] Shutting down Train Database...
[Reboot] Waiting for httpd to shutdown...
[Reboot] Shutting down WiFi...
[Uplink] Disconnecting from uplink.
[Reboot] Shutting down Lcc stack...
[LCC] Disabling automatic fsync(7) calls...
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x400f86c4 PS : 0x00060a30 A0 : 0x800e8746 A1 : 0x3ffe6420
0x400f86c4: esp32cs::LCCStackManager::shutdown() at c:\espressif\tools\xtensa-esp32-elf\esp-2020r2-8.2.0\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\8.2.0\bits/move.h:193
(inlined by) ?? at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/utils/AutoSyncFileFlow.hxx:72
(inlined by) esp32cs::LCCStackManager::shutdown() at r:\code\esp32commandstation\build/../components/Configuration/LCCStackManager.cpp:190
A2 : 0x3ffbe108 A3 : 0x3ffb499c A4 : 0x3ffb49a4 A5 : 0xa5a5a5a5
A6 : 0x3ffe6428 A7 : 0x00000001 A8 : 0x00000001 A9 : 0x3ffe6400
A10 : 0x3ffda514 A11 : 0x00000000 A12 : 0x3ffc138c A13 : 0x00000000
A14 : 0x00000000 A15 : 0x00000001 SAR : 0x00000000 EXCCAUSE: 0x0000001c
EXCVADDR: 0xa5a5a611 LBEG : 0x4000142d LEND : 0x4000143a LCOUNT : 0xffffffdd
Backtrace:0x400f86c1:0x3ffe6420 0x400e8743:0x3ffe6450 0x40125db7:0x3ffe6470
0x400f86c1: esp32cs::LCCStackManager::shutdown() at r:\code\esp32commandstation\build/../components/Configuration/LCCStackManager.cpp:190
0x400e8743: node_reboot at r:\code\esp32commandstation\build/../main/OpenMRNEsp32Overrides.cpp:79
0x40125db7: os_thread_start at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/os/os.c:391
After the restart, I see:
[TaskMon] uptime: 00:03:00 freeHeap: 158532, largest free block: 113536, tasks: 17, mainBufferPool: 0.72kB
Which seems good for largest free block, compared to what I recall before being quite lower.
Then I proceed to connect UWT-100 to it.
Obviously it fails to select my loco but I continue to fiddle with switching between the loco # and default id 3, pressing other function buttons etc. None of that registers since the loco is not allocated, but at least I don't see any crashes. IIRC it would've crashed at some point after doing some of this fiddling.
Taskmon says:
[TaskMon] uptime: 00:10:30 freeHeap: 132524, largest free block: 96392, tasks: 17, mainBufferPool: 10.04kB.
I'll let it idle now and see what happens in 20-30 mins when I come back to it after dinner.
Oh and after I disconnected the throttle I saw this in the monitor, don't remember it before:
GCHubPort: Shutting down gridconnect port -1. (0x3ffe3170)
Taskmon then says:
[TaskMon] uptime: 00:12:45 freeHeap: 137092, largest free block: 95532, tasks: 17, mainBufferPool: 10.66kB
I guess some memory has been returned, which is good.
That seems promising kinda. There is likely a bug preventing the timely acquire of the loco. It does look like based on the memory pool that it did get partway through the acquire process at least. Since this seems to work so far I'll do some digging to add setting of the SNDBUF size only on GC connections.
The GCHubPort shutdown is normal for a detected disconnect.
No crashes so far, but there are two bugs I'll file separate from this.
That's good progress then! I'll review the new issues.
I guess the next test would be in a fully operational CS, with WiFi, CAN, DCC, et al.
Those changes from the dcc and perf branches were not merged into master yet, correct?
Dcc is not in yet but most of the perf branch has been merged and what is left is not critical and likely won't help.
After a single throttle is connected via WiFi in LCC mode, and a desktop computer connected to the WebUI, the freeHeap drops significantly.