Aircoookie / WLED

Control WS2812B and many more types of digital RGB LEDs with an ESP8266 or ESP32 over WiFi!
https://kno.wled.ge
MIT License
15k stars 3.24k forks source link

Corrupted data in response, when changing a preset #3641

Closed nilvanis closed 10 months ago

nilvanis commented 10 months ago

What happened?

When requesting a preset change via JSON, I see some corrupted data in the response. Preset changes successfully, but the response generates an error in Home Assistant, e.g.:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 861: invalid start byte

It happens most of the times. Very rarely response is correct. It happens when trying to activate any configured preset.

To Reproduce Bug

  1. Use ESP8266_2M (Athom LS-4P in my case)
  2. Create multiple segments and presets.
  3. Change preset via JSON. Example:
    
    $ curl -D - -s -H 'Content-Type: application/json' -X POST http://192.168.1.10/json/state -d '{"ps":1,"v":true}'
    HTTP/1.1 200 OK
    Content-Length: 1708
    Content-Type: application/json
    Access-Control-Allow-Origin: *
    Access-Control-Allow-Methods: *
    Access-Control-Allow-Headers: *
    Connection: close
    Accept-Ranges: none

{"on":true,"bri":93,"transition":7,"ps":2,"pl":-1,"nl":{"on":false,"dur":60,"mode":1,"tbri":0,"rem":-1},"udpn":{"send":false,"recv":true,"sgrp":1,"rgrp":1},"lor":0,"mainseg":3,"seg":[{"id":0,"start":22,"stop":89,"len":67,"grp":1,"spc":0,"of":0,"on":true,"frz":false,"bri":255,"cct":127,"set":0,"n":"Prawo","col":[[255,160,0],[0,0,0],[0,0,0]],"fx":28,"sx":94,"ix":128,"pal":0,"c1":128,"c2":128,"c3":16,"sel":false,"rev":false,"mi":false,"o1":false,"o2":false,"o3":false,"si":0,"m12":1},{"id":1,"start":89,"stop":128,"len":39,"grp":1,"spc":0,"of":0,"on":true,"frz":false,"bri":255,"cct":127,"set":0,"n":"Furtka","col":[[255,170,0],[0,0,0],[0,0,0]],"fx":0,"sx":128,"ix":128,"pal":0,"c1":128,"c2":128,"c3":16,"sel":false,"rev":false,"mi":false,"o1":false,"o2":false,"o3":false,"si":0,"m12":1},{"id":2,"start":128,"stop":137,"len":9,"grp":1,"spc":0,"of":0,"on":true,E8|▒▒▒*▒▒u▒▒::▒▒▒p

                                                                                                                                                     <<▒.

00▒] ,,▒,,▒▒ ,, ,,(#<<PF$00▒▒

V O 8s-Control-Allow-Origin: Access-Control-Allow-Methods: Access-Control-Allow-Headers: * Connection: close Accept-Ranges: none

"202

V HTTP/1.1 200 OK Content-Length: 1708 Content-Type: application/json Access-Control-Allow-Origin: Access-Control-Allow-Methods: Access-Control-Allow-Headers: * Connection: close Accept-Ranges: none

{"on":true,"bri":93,"transition":7,"ps":2,"pl":-1,"nl":{"on":false,"dur":60,"mode":1,"tbri":0,"rem":-1},"udpn":{"send":false,"recv":true,"sgrp":1,"rgrp":1},"lor":0,"mainseg":3,"seg":[{"id":0,"start":22,"stop":89,"len":67,"grp":1,"spc":0,"of":0,"on":true,"frz":


### Expected Behavior

$ curl -D - -s -H 'Content-Type: application/json' -X POST http://192.168.1.10/json/state -d '{"ps":1,"v":true}' HTTP/1.1 200 OK Content-Length: 1715 Content-Type: application/json Access-Control-Allow-Origin: Access-Control-Allow-Methods: Access-Control-Allow-Headers: * Connection: close Accept-Ranges: none

{"on":true,"bri":93,"transition":7,"ps":1,"pl":-1,"nl":{"on":false,"dur":60,"mode":1,"tbri":0,"rem":-1},"udpn":{"send":false,"recv":true,"sgrp":1,"rgrp":1},"lor":0,"mainseg":0,"seg":[{"id":0,"start":22,"stop":89,"len":67,"grp":1,"spc":0,"of":0,"on":true,"frz":false,"bri":255,"cct":127,"set":0,"n":"Prawo","col":[[255,160,0],[0,0,0],[0,0,0]],"fx":49,"sx":128,"ix":128,"pal":0,"c1":128,"c2":128,"c3":16,"sel":true,"rev":false,"mi":false,"o1":false,"o2":false,"o3":false,"si":0,"m12":0},{"id":1,"start":89,"stop":128,"len":39,"grp":1,"spc":0,"of":0,"on":true,"frz":false,"bri":255,"cct":127,"set":0,"n":"Furtka","col":[[255,170,0],[0,0,0],[0,0,0]],"fx":49,"sx":128,"ix":128,"pal":0,"c1":128,"c2":128,"c3":16,"sel":false,"rev":false,"mi":false,"o1":false,"o2":false,"o3":false,"si":0,"m12":0},{"id":2,"start":128,"stop":137,"len":9,"grp":1,"spc":0,"of":0,"on":true,"frz":false,"bri":255,"cct":127,"set":0,"n":"SlupekPrawy","col":[[255,170,0],[0,0,0],[0,0,0]],"fx":49,"sx":128,"ix":128,"pal":0,"c1":128,"c2":128,"c3":16,"sel":false,"rev":false,"mi":false,"o1":false,"o2":false,"o3":false,"si":0,"m12":0},{"id":3,"start":150,"stop":161,"len":11,"grp":1,"spc":0,"of":0,"on":true,"frz":false,"bri":255,"cct":127,"set":0,"n":"SlupekLewy","col":[[255,170,0],[0,0,0],[0,0,0]],"fx":49,"sx":128,"ix":128,"pal":0,"c1":128,"c2":128,"c3":16,"sel":false,"rev":false,"mi":false,"o1":false,"o2":false,"o3":false,"si":0,"m12":0},{"id":4,"start":161,"stop":200,"len":39,"grp":1,"spc":0,"of":0,"on":true,"frz":false,"bri":255,"cct":127,"set":0,"n":"Lewo","col":[[255,170,0],[0,0,0],[0,0,0]],"fx":49,"sx":128,"ix":128,"pal":0,"c1":128,"c2":128,"c3":16,"sel":false,"rev":false,"mi":false,"o1":false,"o2":false,"o3":false,"si":0,"m12":0}]}



### Install Method

Self-Compiled

### What version of WLED?

0.15, 202401010

### Which microcontroller/board are you seeing the problem on?

ESP8266

### Relevant log/trace output

_No response_

### Anything else?

Issue probably exists also on 0.14.0, see: https://github.com/home-assistant/core/issues/106375

### Code of Conduct

- [X] I agree to follow this project's Code of Conduct
blazoncek commented 10 months ago

Known issue. Cause still unknown, suspected TCP stack or AsyncTCP error (reuse pointer after free). Possible duplicate of: #3599 #3447

EDIT: It will not happen on all ESP8266. In my experience original Lolin Wemos D1 mini will not exhibit this error.

nilvanis commented 10 months ago

Actually I experienced the https://github.com/Aircoookie/WLED/issues/3447 issue in 0.14, but I don't see it in 0.15

blazoncek commented 10 months ago

The issue appeared when we updated ESP8266 platform in PIO. We cannot revert because newer NeoPixelBus requires it.

willmmiles commented 10 months ago

I'm not 100% finished debugging yet, but I'm out of time for the moment, so I'll give a quick report. It's looking to me like the issue is re-use of the global ArduinoJSON buffer before the AsyncJsonResponse from serveJSON is finished with it. In the case when the response is too big to send in a single packet case, the AsyncWebServer seems to hold a reference to the JSON object until it receives an ack packet from the client, at which point it generates the rest of the response. However serveJson immediately releases the lock, and in the trace when I got a corrupted packet, I found another client took the lock (handlePresets) and started using the buffer before the final reply packet was sent. So when AsyncWebServer went to re-generate the reply text to send the second packet, it wound up walking through now invalid pointers.

blazoncek commented 10 months ago

@willmmiles thanks to your hints I can now reliably reproduce corruption (not exactly the same as above, but it is consistent). It is indeed linked to use-after-free situation regarding JSON buffer. But not JSON document itself but strings within it. It will always happen in serializeModeData() and serializeModeNames() which fill JSON array with names extracted from PROGMEM.

I wonder why that doesn't happen on ESP32 (and variants) but only on ESP8266. And not always at that.

willmmiles commented 10 months ago

I wonder why that doesn't happen on ESP32 (and variants) but only on ESP8266. And not always at that.

You're safe if the reply is less than the outbound TCP buffer size, so the server code could queue the entire response to the underlying IP stack before returning to the wled code. On ESP8266 that's 1024 bytes, which I think will cover most use cases of 2 segments or less. A quick google search suggests the ESP32 defaults to 5760 bytes, which will cover a lot more segments and effects without issues.

And even then, it'll only have problems if something else tries to use the JSON buffer and overwrites the strings before the wifi code finishes the transfer (ie. a classic race condition). This didn't consistenly happen in my test setup -- depending on what else was going on, most of the time it would still complete successfully for me.

blazoncek commented 10 months ago

Closing as corrupt JSON has been fixed in 0.14.1-b3. If the issue persist in 0.14.1-b3 or later, please open a new issue.