Aircoookie / WLED

Control WS2812B and many more types of digital RGB LEDs with an ESP8266 or ESP32 over WiFi!
https://kno.wled.ge
MIT License
14.7k stars 3.16k forks source link

Websocket connections not working anymore for >0.14.1 #3855

Closed smhex closed 4 months ago

smhex commented 6 months ago

What happened?

I am using a third-party library (wled-client) to access my WLED using websockets. After upgrading my WLED to any version newer than 0.14.1 the library stopped working. I replaced it with a native websocket implementation and it didn't work either. I always get ECONNRESET when trying to open the socket. After downgrading to 0.14.1. everything works again. The web interface is accessible without any issues. Http API is also working as expected.

To Reproduce Bug

Use any websocket client and connect to ws:///ws. It should return immediately the wled state upon connect. In my case it returns nothing.

Expected Behavior

Upon successful connection the state object should be returned.

Install Method

Binary from WLED.me

What version of WLED?

0.14.2

Which microcontroller/board are you seeing the problem on?

ESP32

Relevant log/trace output

No response

Anything else?

No response

Code of Conduct

Jedden19 commented 6 months ago

Same with ESP8266

blazoncek commented 6 months ago

Please post your WS connection details and @willmmiles may have solution.

willmmiles commented 6 months ago

Replicated on 0.14.2, works on 0.15. Investigating. I'm using websocat.

willmmiles commented 6 months ago

Seems to work on 0.14.3 as well. I suspect this may have been fixed by the websocket memory management fixes in the newer version of AsyncWebServer.

smhex commented 6 months ago

In the meantime I am able to catch a serial dump, however my WLED version does not include debug symbols. So nothing here to see :-(. I tested with 0.15.0-b1 and fails again. The connection request triggers a reboot of my device. If someone can provide a WLED_0.15.0-b1_ESP32.bin or any other affected version with debug symbols enabled I am happy to test.

How do I get 0.14.3?

blazoncek commented 6 months ago

Both versions available on Discord.

smhex commented 6 months ago

Thx, will download and continue testing...

smhex commented 6 months ago

Okay, here is the serial dump from a 0.14.3 debug build. As soon as my client connects, a reboot is triggered. I did this several times (see attached log)

putty.log

Is there anything more I can test/provide?

willmmiles commented 6 months ago

Can you please send a backup of your cfg and presets? Then erase the flash storage (pio run -t erase -e <your_envname>), and reload your firmware and cfg from scratch.

I had a case yesterday where the filesystem seemed to be corrupted, and reading presets.json was causing crashes and other weird behaviour. Rebuilding the filesystem seems to fix it. I'm not sure what caused the corruption yet - I go back and forth between versions a lot.

smhex commented 6 months ago

Here are the requested files. I changed the servers/user credentials. Hope, that this is not the deciding factor. At the moment I do not have PlatformIO installed. Will a factory reset using der web interface also to rebuilding the file system?

Wled_setup.zip

Thanks for your help!

blazoncek commented 6 months ago

Is there anything more I can test/provide?

Please add this to your environment and monitor from within PIO. monitor_filters = esp32_exception_decoder It will tell us where it crashes.

It looks to me as it does not crash within WLED procedures though.

smhex commented 6 months ago

Okay, please give me some time to setup everything. It has been a long time since I have been working with PlatformIO 😅.

theapache64 commented 6 months ago

happening to me as well. downgrading to 0.13.1 fixes the issue.

smhex commented 6 months ago

I got the toolchain working :-). Here is the output when connecting via websocket to a freshly uploaded esp32 image.

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DOUT, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1084
load:0x40078000,len:11220
load:0x40080400,len:5360
entry 0x4008067c
Ada
CORRUPT HEAP: Bad head at 0x3ffdd26c. Expected 0xabba1234 got 0x3ffdd558
abort() was called at PC 0x4008eb39 on core 1

ELF file SHA256: 0000000000000000

Backtrace: 0x40089af8:0x3ffd8c40 0x40089e55:0x3ffd8c60 0x4008eb39:0x3ffd8c80 0x4008543a:0x3ffd8ca0 0x40085805:0x3ffd8cc0 0x4000bec7:0x3ffd8ce0 0x401ac671:0x3ffd8d00 0x40128839:0x3ffd8d20 0x401288e0:0x3ffd8d40 0x4012899a:0x3ffd8d90 0x40128aa9:0x3ffd8de0 0x40128d0d:0x3ffd8e20 0x4011eb2d:0x3ffd8e40 0x4011ebc1:0x3ffd8e80 0x4011f1f2:0x3ffd8ea0 0x4008b89e:0x3ffd8ed0
  #0  0x40089af8:0x3ffd8c40 in invoke_abort at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:648
  #1  0x40089e55:0x3ffd8c60 in abort at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:648
  #2  0x4008eb39:0x3ffd8c80 in multi_heap_free at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:321
  #3  0x4008543a:0x3ffd8ca0 in heap_caps_free at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:232
  #4  0x40085805:0x3ffd8cc0 in _free_r at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/newlib/syscalls.c:42
  #5  0x4000bec7:0x3ffd8ce0 in ?? ??:0
  #6  0x401ac671:0x3ffd8d00 in operator delete(void*) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/del_op.cc:46
  #7  0x40128839:0x3ffd8d20 in LinkedList<AsyncWebHeader, LinkedListNode>::_remove(LinkedListNode<AsyncWebHeader>*, LinkedListNode<AsyncWebHeader>*) at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:1015   
  #8  0x401288e0:0x3ffd8d40 in LinkedList<AsyncWebHeader, LinkedListNode>::remove(LinkedList<AsyncWebHeader, LinkedListNode>::Iterator const&, LinkedList<AsyncWebHeader, LinkedListNode>::Iterator const&) at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:1015
      (inlined by) AsyncWebServerRequest::_removeNotInterestingHeaders() at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:197
  #9  0x4012899a:0x3ffd8d90 in AsyncWebServerRequest::_parseLine() at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:1015
  #10 0x40128aa9:0x3ffd8de0 in AsyncWebServerRequest::_onData(void*, unsigned int) at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:1015
  #11 0x40128d0d:0x3ffd8e20 in std::_Function_handler<void (void*, AsyncClient*, void*, unsigned int), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*, void*, unsigned int)#5}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&, std::_Any_data const&, unsigned int&&) at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:1015
      (inlined by) _M_invoke at c:\users\thomas\.platformio\packages\toolchain-xtensa32\xtensa-esp32-elf\include\c++\5.2.0/functional:1871
  #12 0x4011eb2d:0x3ffd8e40 in std::function<void (void*, AsyncClient*, void*, unsigned int)>::operator()(void*, AsyncClient*, void*, unsigned int) const at .pio\libdeps\esp32dev\AsyncTCP@src-39de97abf7348c44d4dda815b8aab0ae\src/AsyncTCP.cpp:1153
      (inlined by) AsyncClient::_recv(tcp_pcb*, pbuf*, signed char) at .pio\libdeps\esp32dev\AsyncTCP@src-39de97abf7348c44d4dda815b8aab0ae\src/AsyncTCP.cpp:968
  #13 0x4011ebc1:0x3ffd8e80 in AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, signed char) at .pio\libdeps\esp32dev\AsyncTCP@src-39de97abf7348c44d4dda815b8aab0ae\src/AsyncTCP.cpp:1153
  #14 0x4011f1f2:0x3ffd8ea0 in _async_service_task(void*) at .pio\libdeps\esp32dev\AsyncTCP@src-39de97abf7348c44d4dda815b8aab0ae\src/AsyncTCP.cpp:1153
      (inlined by) _async_service_task at .pio\libdeps\esp32dev\AsyncTCP@src-39de97abf7348c44d4dda815b8aab0ae\src/AsyncTCP.cpp:201
  #15 0x4008b89e:0x3ffd8ed0 in vPortTaskWrapper at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)

Rebooting...
ets Jul 29 2019 12:21:46

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DOUT, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1084
load:0x40078000,len:11220
load:0x40080400,len:5360
entry 0x4008067c
Ada

My WLED has no configuration except an IP adress for my Wifi. No presets or any other changes were made. If you need more information just tell. The reboot is triggered by a simple node app trying to connect...

blazoncek commented 6 months ago

As I suspected, the error is not in WLED code but, unfortunately, in AsyncWebServer library. Hopefully @willmmiles will find what's wrong.

Can you capture a packet that causes the crash? Using Wireshark or similar.

willmmiles commented 6 months ago

Or alternately, post a link to your node app. There's something different about the headers from the usual browser connections.

smhex commented 6 months ago

Here are both: the very basic node app and the Wireshark recordings...

Wireshark.zip wled-ws-test.zip

Please change your WLED's IP in index.ts and run

npm install
npx tsc
node index

If everything works, WLED's current state should be logged to the console.

Happy easter

willmmiles commented 6 months ago

Thanks for bearing with me! I was able to replicate it using your code and config, and tracked this to a use-after-free in AsyncWebServer. It was indeed triggered by the headers sent by node, and wouldn't replicate unless some other code caught it with its pants down, so to speak.

I've pushed AsyncWebServer v2.2.1 which has the fix, and opened PR #3873 to adopt it.

smhex commented 6 months ago

Perfect, I will re-run my tests after the PR merge.

blazoncek commented 6 months ago

Perfect, I will re-run my tests after the PR merge.

You can test it immediately by temporarily changing your copy of platformio.ini

smhex commented 6 months ago

You're right. I quickly rebuilt the image with version 2.2.1 of AsyncWebserver, restored my presets and other settings and .... everything works again 👍

I enabled my Homebridge plugin again and as far as I could see, communication worked as it should be and as it was last seen in WLED 0.14.1. I will keep an eye on it for some of days...

Thank you very much @willmmiles , @blazoncek !