atanisoft / ESP32CommandStation

An ESP32 based DCC Command Station with integrated OpenLCB (LCC) --- NOTE: this project is not under active development.
https://atanisoft.github.io/ESP32CommandStation/
GNU General Public License v3.0
90 stars 34 forks source link

Free heap can get extremely low when CAN and HUB are enabled with devices connected #78

Closed TrainzLuvr closed 2 years ago

TrainzLuvr commented 3 years ago

After a single throttle is connected via WiFi in LCC mode, and a desktop computer connected to the WebUI, the freeHeap drops significantly.

[TaskMon] uptime: 00:07:30 freeHeap: 111032, largest free block: 63788, tasks: 17, mainBufferPool: 0.72kB
....
Incoming connection from 192.168.X.X, fd 52. (TCS THROTTLE VIA WiFi LCC)
Allocating new alias 126 for node 06010000XXXX
Incoming connection from 192.168.X.X, fd 53. (WEB BROWSER CONNECTION FROM DESKTOP WS)
Incoming connection from 192.168.X.X, fd 54. (WEB BROWSER CONNECTION FROM DESKTOP WS)
....
[TaskMon] uptime: 00:08:15 freeHeap: 56000, largest free block: 12224, tasks: 17, mainBufferPool: 35.90kB
atanisoft commented 3 years ago

Copying from https://github.com/atanisoft/ESP32CommandStation/pull/84:

abort() was called at PC 0x400d5c47 on core 1
0x400d5c47: __cxa_end_catch at C:/espressif/esp-idf/components/cxx/cxx_exception_stubs.cpp:13

Backtrace:0x40088c7e:0x3ffd2cf0 0x40089291:0x3ffd2d10 0x4008fd91:0x3ffd2d30 0x400d5c47:0x3ffd2db0 0x40195bb1:0x3ffd2dd0 0x40196159:0x3ffd2df0 0x401962d8:0x3ffd2e10 0x401967e1:0x3ffd2e50 0x4019685a:0x3ffd2e70 0x4019686d:0x3ffd2e90 0x400eb4f9:0x3ffd2eb0 0x401a7577:0x3ffd2f80 0x401159a1:0x3ffd2fa0 0x40119bf1:0x3ffd2ff0 0x401199fd:0x3ffd3020 0x401a8259:0x3ffd3050 0x40125e37:0x3ffd3070

0x40088c7e: panic_abort at C:/espressif/esp-idf/components/esp_system/panic.c:341
0x40089291: esp_system_abort at C:/espressif/esp-idf/components/esp_system/system_api.c:106
0x4008fd91: abort at C:/espressif/esp-idf/components/newlib/abort.c:46
0x400d5c47: __cxa_end_catch at C:/espressif/esp-idf/components/cxx/cxx_exception_stubs.cpp:13
0x40195bb1: operator new(unsigned int) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54
0x40196159: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned int&, unsigned int) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/ext/new_allocator.h:111
 (inlined by) ?? at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/alloc_traits.h:436
 (inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned int&, unsigned int) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/basic_string.tcc:153
0x401962d8: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned int, unsigned int, char const*, unsigned int) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/basic_string.tcc:317
0x401967e1: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned int) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/basic_string.tcc:370
0x4019685a: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::append(char const*) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/basic_string.h:1259
0x4019686d: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator+=(char const*) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/build/build-cc-gcc-final/xtensa-esp32-elf/no-rtti/libstdc++-v3/include/bits/basic_string.h:1169
0x400eb4f9: process_loco(http::HttpRequest*) at r:\code\esp32commandstation\build/../main/WebServer.cpp:1071
0x401a7577: std::_Function_handler<http::AbstractHttpResponse* (http::HttpRequest*), http::AbstractHttpResponse* (*)(http::HttpRequest*)>::_M_invoke(std::_Any_data const&, http::HttpRequest*&&) at c:\espressif\tools\xtensa-esp32-elf\esp-2020r2-8.2.0\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\8.2.0\bits/std_function.h:282
0x401159a1: http::HttpRequestFlow::process_request_handler() at c:\espressif\tools\xtensa-esp32-elf\esp-2020r2-8.2.0\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\8.2.0\bits/std_function.h:687  (inlined by) http::HttpRequestFlow::process_request_handler() at r:\code\esp32commandstation\build/../components/HttpServer/src/HttpRequestFlow.cpp:398
0x40119bf1: StateFlowBase::run() at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/executor/StateFlow.cpp:63 (discriminator 4)
0x401199fd: ExecutorBase::entry() at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/executor/Executor.cpp:324
0x401a8259: OSThread::start(void*) at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/os/OS.hxx:193
0x40125e37: os_thread_start at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/os/os.c:391

This appears to be due to low heap and additional work will be needed both in the CS and HttpServer stack likely.

atanisoft commented 3 years ago

@TrainzLuvr can you test with CONFIG_LWIP_TCP_SND_BUF_DEFAULT=2048 in sdkconfig?

TrainzLuvr commented 3 years ago

This on the master branch, what should I be looking for?

atanisoft commented 3 years ago

Testing to see how bad heap gets with UWT-100, CAN and Hub working. The question will then be how does this setting impact TCP/IP performance.

TrainzLuvr commented 3 years ago

I see it's set to 5744 atm:

CONFIG_LWIP_TCP_SND_BUF_DEFAULT=5744
CONFIG_LWIP_TCP_WND_DEFAULT=5744
atanisoft commented 3 years ago

Yeah, it is 4x MSS by default. I'm looking at an option to set this only for GC TCP/IP connections as well but this is a question for test of the setting to see if it might help some

atanisoft commented 3 years ago

And only SND_BUF should be lowered

TrainzLuvr commented 3 years ago

So then run full blown CS, not ESP32 in stand-alone mode?

I'll connect my USB-UART adapter to it, hopefully it does not blow my ESP32.

I haven't had a chance to make an opto-coupler addon for it, need to see if I have any logic level opto-couplers on hand.

atanisoft commented 3 years ago

I'd say start in standalone to see how it behaves as a quick test. If it seems stable try it on the full blown setup and I'll add some code to tweak this for just the hub mode

TrainzLuvr commented 3 years ago

Ok but then there are no peripherals connected to it (no CAN, or DCC) except for WiFi.

atanisoft commented 3 years ago

Yeah, that should be fine. I suspect that is where the heap is going since the mainBufferPool grows and is used for most of the traffic

TrainzLuvr commented 3 years ago

I grabbed another ESP32 to play with so I can keep the first one in the CS.

I went to check whether Hub was on and it wasn't so I turned it on. But then I noticed a crash in the monitor:

[LCC] Enabling CAN interface, reinitialization required.
[Reboot] Shutting down DCC signal generator(s)...
[Reboot] Shutting down DCC turnout manager...
[Reboot] Shutting down Train Database...
[Reboot] Waiting for httpd to shutdown...
[Reboot] Shutting down WiFi...
[Uplink] Disconnecting from uplink.
[Reboot] Shutting down Lcc stack...
[LCC] Disabling automatic fsync(7) calls...
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x400f86c4  PS      : 0x00060a30  A0      : 0x800e8746  A1      : 0x3ffe6420
0x400f86c4: esp32cs::LCCStackManager::shutdown() at c:\espressif\tools\xtensa-esp32-elf\esp-2020r2-8.2.0\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\8.2.0\bits/move.h:193
 (inlined by) ?? at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/utils/AutoSyncFileFlow.hxx:72
 (inlined by) esp32cs::LCCStackManager::shutdown() at r:\code\esp32commandstation\build/../components/Configuration/LCCStackManager.cpp:190

A2      : 0x3ffbe108  A3      : 0x3ffb499c  A4      : 0x3ffb49a4  A5      : 0xa5a5a5a5
A6      : 0x3ffe6428  A7      : 0x00000001  A8      : 0x00000001  A9      : 0x3ffe6400
A10     : 0x3ffda514  A11     : 0x00000000  A12     : 0x3ffc138c  A13     : 0x00000000
A14     : 0x00000000  A15     : 0x00000001  SAR     : 0x00000000  EXCCAUSE: 0x0000001c
EXCVADDR: 0xa5a5a611  LBEG    : 0x4000142d  LEND    : 0x4000143a  LCOUNT  : 0xffffffdd

Backtrace:0x400f86c1:0x3ffe6420 0x400e8743:0x3ffe6450 0x40125db7:0x3ffe6470
0x400f86c1: esp32cs::LCCStackManager::shutdown() at r:\code\esp32commandstation\build/../components/Configuration/LCCStackManager.cpp:190

0x400e8743: node_reboot at r:\code\esp32commandstation\build/../main/OpenMRNEsp32Overrides.cpp:79

0x40125db7: os_thread_start at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/os/os.c:391

After the restart, I see:

[TaskMon] uptime: 00:03:00 freeHeap: 158532, largest free block: 113536, tasks: 17, mainBufferPool: 0.72kB

Which seems good for largest free block, compared to what I recall before being quite lower.

Then I proceed to connect UWT-100 to it.

Obviously it fails to select my loco but I continue to fiddle with switching between the loco # and default id 3, pressing other function buttons etc. None of that registers since the loco is not allocated, but at least I don't see any crashes. IIRC it would've crashed at some point after doing some of this fiddling.

Taskmon says:

[TaskMon] uptime: 00:10:30 freeHeap: 132524, largest free block: 96392, tasks: 17, mainBufferPool: 10.04kB.

I'll let it idle now and see what happens in 20-30 mins when I come back to it after dinner.

TrainzLuvr commented 3 years ago

Oh and after I disconnected the throttle I saw this in the monitor, don't remember it before:

GCHubPort: Shutting down gridconnect port -1. (0x3ffe3170)

Taskmon then says:

[TaskMon] uptime: 00:12:45 freeHeap: 137092, largest free block: 95532, tasks: 17, mainBufferPool: 10.66kB

I guess some memory has been returned, which is good.

atanisoft commented 3 years ago

That seems promising kinda. There is likely a bug preventing the timely acquire of the loco. It does look like based on the memory pool that it did get partway through the acquire process at least. Since this seems to work so far I'll do some digging to add setting of the SNDBUF size only on GC connections.

The GCHubPort shutdown is normal for a detected disconnect.

TrainzLuvr commented 3 years ago

No crashes so far, but there are two bugs I'll file separate from this.

atanisoft commented 3 years ago

That's good progress then! I'll review the new issues.

TrainzLuvr commented 3 years ago

I guess the next test would be in a fully operational CS, with WiFi, CAN, DCC, et al.

Those changes from the dcc and perf branches were not merged into master yet, correct?

atanisoft commented 3 years ago

Dcc is not in yet but most of the perf branch has been merged and what is left is not critical and likely won't help.