atanisoft / ESP32CommandStation

An ESP32 based DCC Command Station with integrated OpenLCB (LCC) --- NOTE: this project is not under active development.
https://atanisoft.github.io/ESP32CommandStation/
GNU General Public License v3.0
90 stars 34 forks source link

Update dependencies, rework WebUI, new features #96

Closed atanisoft closed 3 years ago

TrainzLuvr commented 3 years ago

When I did save and update complete, these are the last messages, nothing after:

{"req":"cdi-get","offs":932,"size":8,"type":"eventid","target":"cdi_0_04382203987529776"}
{"req":"cdi-set","offs":1,"size":63,"type":"string","target":"cdi_0_832503141433458","value":"LCC Command Station"}
{"req":"update-complete"}
atanisoft commented 3 years ago

In that case it looks like there is a problem between the browser and the esp32. Are you seeing something like:

{"req":"info"} | 14 | 14:29:00.388 |  
-- | -- | -- | --
{"res":"info","build":"v1.5.0-beta1-57-ge14b810-dirty","timestamp":"Jan 4 2021 14:09:12","ota":"ota_0","snip_name":"ESP32 Command Stati......

This is the first message sent as soon as the page loads. When you click on the config tab there should be 39 or so cdi-get requests sent which will be processed sequentially.

TrainzLuvr commented 3 years ago

Yeah I get those and the messages when the Config tab opens. But then when I go to update the name or description, it sends the request and nothing.

atanisoft commented 3 years ago

Change the log levels from VERBOSE to INFO in https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L151 (all occurrences in that class). That should help narrow down where it is getting stuck in your env.

TrainzLuvr commented 3 years ago

So I changed the log level but I do not see anything of significance in the serial console.

But now looking at the Network tab of my browser, after trying to update User Name, I see new WS is being spun up - every few seconds a new connection. The browser Console says that WS has been closed, and reconnecting. Reason: undefined, repeatedly.

Trying to refresh the page in the browser does not work, it says waiting... while the Console says:

(index):679 Uncaught DOMException: Failed to execute 'send' on 'WebSocket': Still in CONNECTING state.
    at WebSocket.ws_opened (http://192.168.254.52/:679:12)
ws_opened @ (index):679
(index):698 WebSocket connection to 'ws://192.168.254.52/wsjson' failed: Connection closed before receiving a handshake response
TrainzLuvr commented 3 years ago

Oh, CS finally crashed:

[WebSocket fd:60] Failed to allocate websocket, aborting!
CORRUPT HEAP: Bad head at 0x3ffef044. Expected 0xabba1234 got 0x3ffef5f0

abort() was called at PC 0x40082a28 on core 1
0x40082a28: lock_acquire_generic at C:/espressif/esp-idf/components/newlib/locks.c:142

Backtrace:0x40088f8e:0x3ffd32e0 0x400895a1:0x3ffd3300 0x40090109:0x3ffd3320 0x40082a28:0x3ffd33a0 0x40082b41:0x3ffd33e0 0x4019d88d:0x3ffd3400 0x401979dd:0x3ffd36c0 0x40197929:0x3ffd3710 0x4008cf13:0x3ffd3740 0x40081615:0x3ffd3760 0x40090131:0x3ffd3780 0x4010aaaa:0x3ffd37a0 0x4010aafd:0x3ffd37c0 0x401a8a2b:0x3ffd37e0 0x400d65da:0x3ffd3800 0x400d8a35:0x3ffd3820 0x401167d1:0x3ffd3840 0x401a89e5:0x3ffd3870 0x4012415b:0x3ffd3890
0x40088f8e: panic_abort at C:/espressif/esp-idf/components/esp_system/panic.c:341

0x400895a1: esp_system_abort at C:/espressif/esp-idf/components/esp_system/system_api.c:106

0x40090109: abort at C:/espressif/esp-idf/components/newlib/abort.c:46

0x40082a28: lock_acquire_generic at C:/espressif/esp-idf/components/newlib/locks.c:142

0x40082b41: _lock_acquire_recursive at C:/espressif/esp-idf/components/newlib/locks.c:170

0x4019d88d: _vfiprintf_r at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vfprintf.c:853 (discriminator 2)

0x401979dd: fiprintf at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/fiprintf.c:48

0x40197929: __assert_func at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdlib/assert.c:58 (discriminator 8)

0x4008cf13: multi_heap_free at C:/espressif/esp-idf/components/heap/multi_heap_poisoning.c:267
 (inlined by) multi_heap_free at C:/espressif/esp-idf/components/heap/multi_heap_poisoning.c:259

0x40081615: heap_caps_free at C:/espressif/esp-idf/components/heap/heap_caps.c:305

0x40090131: free at C:/espressif/esp-idf/components/newlib/heap.c:47

0x4010aaaa: http::WebSocketFlow::~WebSocketFlow() at r:\code\esp32commandstation\build/../components/HttpServer/src/HttpRequestWebSocket.cpp:131

0x4010aafd: http::WebSocketFlow::~WebSocketFlow() at r:\code\esp32commandstation\build/../components/HttpServer/src/HttpRequestWebSocket.cpp:134

0x401a8a2b: operator() at r:\code\esp32commandstation\build/../components/HttpServer/src/HttpServer.cpp:289
 (inlined by) _M_invoke at c:\espressif\tools\xtensa-esp32-elf\esp-2020r3-8.4.0\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h:297

0x400d65da: std::function<void ()>::operator()() const at c:\espressif\tools\xtensa-esp32-elf\esp-2020r3-8.4.0\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h:687

0x400d8a35: CallbackExecutable::run() at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/executor/Executable.hxx:84

0x401167d1: ExecutorBase::entry() at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/executor/Executor.cpp:324

0x401a89e5: OSThread::start(void*) at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/os/OS.hxx:193

0x4012415b: os_thread_start at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/os/os.c:391
atanisoft commented 3 years ago

Interesting. I've got a local change that I'm testing which will help with the heap usage a bit.

The WS closing and reconnecting is due to no response from the CS within ~2sec.

atanisoft commented 3 years ago

@TrainzLuvr give this latest update a try. I've seen a couple heap corruptions from the WS client cleanup which I'll be investigating further

TrainzLuvr commented 3 years ago

Now it does not even want to connect to my WiFi, it keeps trying and waiting to connect to SSID. I power cycled the CS several times, still the same.

I think we are regressing more and more...

atanisoft commented 3 years ago

It sounds like wifi interference or too low of tx power settings in CDI.

TrainzLuvr commented 3 years ago

Nothing else was changed since the previous code compile.

I erased flash but it made no difference.

After it times out trying to connect to WiFi, it crashes:

0x401173ed: openmrn_arduino::Esp32WiFiManager::start_wifi_system() at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/freertos_drivers/esp32/Esp32WiFiManager.cpp:935 (discriminator 4)

Backtrace:0x40088f8e:0x3ffd7e10 0x400895a1:0x3ffd7e30 0x40090109:0x3ffd7e50 0x401173ed:0x3ffd7ed0 0x40119205:0x3ffd8030 0x40123f77:0x3ffd8050
0x40088f8e: panic_abort at C:/espressif/esp-idf/components/esp_system/panic.c:341

0x400895a1: esp_system_abort at C:/espressif/esp-idf/components/esp_system/system_api.c:106

0x40090109: abort at C:/espressif/esp-idf/components/newlib/abort.c:46

0x401173ed: openmrn_arduino::Esp32WiFiManager::start_wifi_system() at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/freertos_drivers/esp32/Esp32WiFiManager.cpp:935 (discriminator 4)

0x40119205: openmrn_arduino::Esp32WiFiManager::wifi_manager_task(void*) at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/freertos_drivers/esp32/Esp32WiFiManager.cpp:963

0x40123f77: os_thread_start at r:\code\esp32commandstation\build/../components/OpenMRNLite/src/os/os.c:391
atanisoft commented 3 years ago

The crash at timeout is normal, it forces a restart of the esp32 so it can retry connecting to wifi. Nothing has changed in this area.

It still sounds like an environmental or issue with the ssid/pw in your config. It is rather unfortunate that the esp32 doesn't really help when there is a password failure. Also make sure that if you are using WPA (any variety) that the password is at least eight characters.

TrainzLuvr commented 3 years ago

I have not changed anything in my environment, or the ESP32CS configuration since the last git pull.

atanisoft commented 3 years ago

I'm not sure then. I'm not seeing any issues like this except when the password etc is wrong (even a stray space in the value can do it)

TrainzLuvr commented 3 years ago

Ok, erasing the SD card fixed the connection issue.

Now back to the hanging issue. I see when I try to update User Name, it sends the request, but never receives a response after I click Save so it keeps spinning and hangs the server.

{"req":"cdi","o":1,"s":63,"t":"string","tgt":"cdi_0_41636424117479787","v":"LCC Command Station","id":42}

When I do Update Complete immediately after I press Save, it sends a request and gets a response, but the id variable name is not properly closed - missing an opening "

{"req":"update-complete","id":44}
{"res":"update-complete",id":44}

Perhaps there are other places where the variable names are not closed properly causing this hanging issue/lack of response?

atanisoft commented 3 years ago

but never receives a response after I click Save so it keeps spinning and hangs the server.

{"req":"cdi","o":1,"s":63,"t":"string","tgt":"cdi_0_41636424117479787","v":"LCC Command Station","id":42}

Check in the browser the exact length of the message being sent. I'm investigating an issue with the WS code where it does not process the message correctly if the length is >125 characters.

Ok, erasing the SD card fixed the connection issue.

Odd, that points to something corrupted in the config file.

TrainzLuvr commented 3 years ago

https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L577

This is where the " is missing

TrainzLuvr commented 3 years ago

You do not appear to have a handler for parameter type "t" that is being sent in the above request.

Looking at the code from https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L602-L644 it does not handle parameter "t", with a string value nor is it null terminating it.

Is it necessary to do so, or is your next line 645 catch-all?

atanisoft commented 3 years ago

https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L577

This is where the " is missing

Good catch, I looked over the rest and didn't spot any others.

atanisoft commented 3 years ago

Is it necessary to do so, or is your next line 645 catch-all?

not needed. The block you linked to checks the "t" parameter (https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L583-L584) to determine how to convert the "v" field into something that is blindly writable to CDI.

TrainzLuvr commented 3 years ago

It still does not work - basically it hangs when it's supposed to save my User Name to CDI and does not send a response.

Even if I don't try to save anything, but just open the Configuration tab, the Refresh buttons keep spinning on the User Name and User Description.

If I change to any of the sub-tabs under Configuration, they are not populated at all with values, and Refresh buttons next to each of the parameters are spinning as well.

TrainzLuvr commented 3 years ago

So where do you send a response after the value is supposedly written to CDI (or read from)?

I don't see it here: https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L648-L651 so it basically does not send a response neither during the Read or the Write phase.

atanisoft commented 3 years ago

So where do you send a response after the value is supposedly written to CDI (or read from)?

https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L317 (READ) https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L337 (WRITE)

TrainzLuvr commented 3 years ago

It does not appear to ever reach there though so none of the values are populated.

atanisoft commented 3 years ago

It does not appear to ever reach there though so none of the values are populated.

It's getting there.. I found a bug in the write code that I'm working on fixing now.

[WS:CDI-READ] {"res":"field","tgt":"cdi_0_35301670748681535","value":"5020103fffe000a","type":"eventid","id":39}
[WS:CDI-READ] {"res":"field","tgt":"cdi_0_6485711272724943","value":"5020103fffe000b","type":"eventid","id":40}
[CDI:cdi_0_45710242475669927] Writing 63 bytes from offset 1
[WS:CDI-WRITE] {"res":"saved","tgt":"cdi_0_45710242475669927","id":42}
[WS:CDI-READ] {"res":"field","tgt":"cdi_0_45710242475669927","value":"ESP32 Command Station01234567890�������������������������������","type":"string","id":45}
TrainzLuvr commented 3 years ago

Well, I'm telling you it never reaches read_complete or write_complete here.

I see this for example

[Web] WS: {"req":"info","id":1} -> {"res":"info","build":"v1.5.0-beta1-60-gb445cb0b","timestamp":"Jan 5 2021 21:23:04","ota":"ota_0","snip_name":"ESP32 Command Station","snip_hw":"ESP32-v1","snip_sw":"1.5.0","node_id":"5010101xx00","s88":false,"sensorIDBase":0,"outputs":false,"sensors":false,"id":1}

But if I go to Configuration, I don't see an appropriate response for any of the requests from it.

TrainzLuvr commented 3 years ago

...and I added log output in both calls. Never gets to the end.

atanisoft commented 3 years ago

Well, I'm telling you it never reaches read_complete or write_complete here.

It won't be directly called, it is called async and the response is not sent within the process_wsjson method.

But if I go to Configuration, I don't see an appropriate response for any of the requests from it.

They should show up within 1-2 sec after accessing the page. This is the same behavior as seen on the Esp32OlcbHub (though in that case it does have more heap churn than the CS now)

TrainzLuvr commented 3 years ago

It never gets called...it sends the first blast of data for "info" but then for "cdi" nothing...silence.

atanisoft commented 3 years ago

It never gets called...it sends the first blast of data for "info" but then for "cdi" nothing...silence.

Enable these lines: https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L241 https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L249 https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L278 https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L317 https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L338

I can guarantee it is reaching these lines otherwise it wouldn't work at all even for me.

TrainzLuvr commented 3 years ago

Now I see none of the LOG output in the serial console?!?! I turned all those lines to INFO, and nada, before at least I was getting something.

atanisoft commented 3 years ago

Something definitely weird going on. Confirm in the browser that it didn't kill the socket and reconnect. We may need to adjust the socket timeout from 2000ms to 5000ms and bump the ping time to at least 7500ms.

TrainzLuvr commented 3 years ago

I am seeing NO debug output messages in the serial console anymore.

Prior to your last change (before adding those 5 lines) I had my own and at least they were displaying in the serial console.

atanisoft commented 3 years ago

I'm not sure what might be broken in your case as it is working correctly in all of my tests (both read and write)

TrainzLuvr commented 3 years ago

This is my last pull before things stopped displaying:

git pull
remote: Enumerating objects: 21, done.
remote: Counting objects: 100% (21/21), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 12 (delta 10), reused 12 (delta 10), pack-reused 0
Unpacking objects: 100% (12/12), 1.79 KiB | 4.00 KiB/s, done.
From https://github.com/atanisoft/ESP32CommandStation
   b445cb0b..04a0c84b  sync       -> upstream/sync
Updating b445cb0b..04a0c84b
Fast-forward
 components/HttpServer/src/HttpRequestWebSocket.cpp | 39 +++++++++-------------
 data/index.html                                    | 22 +++++++++---
 main/WebServer.cpp                                 | 34 ++++++++++---------
 3 files changed, 52 insertions(+), 43 deletions(-)

Why did it pull HttpRequestWebSocket.cpp when the last commit there was in October 2020?!

atanisoft commented 3 years ago

There was a bug fix in the websocket code today relating to the 126+ character messages. I'll be rolling that change out to the other repos tomorrow.

TrainzLuvr commented 3 years ago

I am going to delete my local copy and reclone the repo, then try again tomorrow.

TrainzLuvr commented 3 years ago

It made no difference.

This code https://github.com/atanisoft/ESP32CommandStation/blob/sync/main/WebServer.cpp#L235-L263 never executes - I put a LOG statement before L238 and it never hit it.

atanisoft commented 3 years ago

I'm not sure what could be different in your setup compared to mine. I'm using the default config updated only to enable wifi station mode.

Are you able to monitor the lcc bus traffic to see if the requests are somehow ending up on the bus rather than processed locally? It should be processed locally since the node handle is the local cs node handle.

TrainzLuvr commented 3 years ago

I'm running the CS through USB power only, not through full setup, so don't have LCC bus up.

I'll record my settings, delete my config, reconfig and recompile. I don't know what else is left to try...

atanisoft commented 3 years ago

I'm running the CS through USB power only, not through full setup, so don't have LCC bus up.

Ok, I'm also running the CS via USB but I do have it seated in the CS PCB (but no CAN connection currently).

I don't know what else is left to try.

Same here, I'm going through the code to add a few more debug statements and moving the update-complete to be implemented async as well. This code is fairly generic and is planned to be used for a generic CDI editor long term.

TrainzLuvr commented 3 years ago

Yes, ESP32 in the CS board and all, everything is connected, only USB powered.

Update-Complete is one of those things that actually works here, because it's not async.

atanisoft commented 3 years ago

Update-Complete is one of those things that actually works here, because it's not async.

It's async but the response is sent immediately. The only change will be in that the response will be sent once the update-complete message has actually been sent.

TrainzLuvr commented 3 years ago

Still nothing. I only see [Web] WS: LOG output (info at the beginning and ping/pong).

Configuration still not populated, Refresh buttons still spinning.

atanisoft commented 3 years ago

Configuration still not populated, Refresh buttons still spinning.

Check the browser console/network details to see if it did send the message or if it got dropped as not understood.

TrainzLuvr commented 3 years ago

No errors in the browser console. The CS/WS is simpy not responding to reqs.

atanisoft commented 3 years ago

The CS/WS is simpy not responding to reqs.

pull the latest code and enable the various VERBOSE LOG statements. I'm not seeing the same behavior you are of the requests being dropped.

TrainzLuvr commented 3 years ago

I changed all VERBOSE to INFO in that class.

After clicking on the Configuration tab I got an alert popup in the browser "request is missing one (or more) required parameters"

The WS inspector shows similar response: {"res":"error", "error":"request is missing one (or more) required parameters","id":3} and so does the serial console

[WSJSON:4] One or more required parameters are missing: {"req":"cdi","o":64,"s":64,"t":"str","tgt":"cdi_0_4345226546825096","id":4}
[WSJSON:5] One or more required parameters are missing: {"req":"cdi","o":132,"s":1,"t":"int","tgt":"cdi_0_23423935627194137","id":5}
[WSJSON:6] One or more required parameters are missing: {"req":"cdi","o":133,"s":21,"t":"str","tgt":"cdi_0_8798068416960174","id":6}
[WSJSON:7] One or more required parameters are missing: {"req":"cdi","o":154,"s":32,"t":"str","tgt":"cdi_0_6990582236195353","id":7}
[WSJSON:8] One or more required parameters are missing: {"req":"cdi","o":186,"s":128,"t":"str","tgt":"cdi_0_013274707212409087","id":8}
[WSJSON:9] One or more required parameters are missing: {"req":"cdi","o":314,"s":32,"t":"str","tgt":"cdi_0_42365144095072793","id":9}
[WSJSON:10] One or more required parameters are missing: {"req":"cdi","o":346,"s":128,"t":"str","tgt":"cdi_0_3268559261877111","id":10}
[WSJSON:11] One or more required parameters are missing: {"req":"cdi","o":474,"s":1,"t":"int","tgt":"cdi_0_8648675543942514","id":11}
.....
atanisoft commented 3 years ago

After clicking on the Configuration tab I got an alert popup in the browser "request is missing one (or more) required parameters"

your browser is using a cached copy of the html. Make sure to delete data/*.gz and then reupload to the esp32 and finally force a refresh of the index.html.

TrainzLuvr commented 3 years ago

I did that every time previously. It made no difference.

Did it again, and now I don't have logging output again. WTH?