arendst / Tasmota

Alternative firmware for ESP8266 and ESP32 based devices with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX. Full documentation at
https://tasmota.github.io/docs
GNU General Public License v3.0
22.11k stars 4.79k forks source link

WebQuery get right answer on 8266 but erratic on ESP32 #20102

Closed ygelt closed 11 months ago

ygelt commented 11 months ago

PROBLEM DESCRIPTION

I use the same script on 4 tasmota devices : 3 with 8266, 1 with ESP32 which are all querying the same web service every 3 seconds using WebQuery GET. This works perfectly on the 8266 devices, I mean they receives the web service answer on each call; while the ESP32 does not receive the data most of the time, quite erratically. My web service is correctly called by the device (correct headers and parameter).

It "receives the answer" means it logs like this : 23:49:00.760 RSL: RESULT = {"WebQuery":{"cmd":0, "con":0, "alv":172297}} 23:49:00.771 RSL: RESULT = {"WebQuery":"Done"}

It "does not receive the answer" means it logs like this : 23:46:34.111 RSL: RESULT = {"WebQuery":{"WebQuery":} 23:46:34.120 RSL: RESULT = {"WebQuery":"Done"}

The calling WebQuery is: =>WebQuery http://www.xxxx.xxxx/connect/v1/device?al=%0nWS% GET [id:%topic%|ip:%lip%] (The issue is the same in http and https)

Resulting in log: 00:09:07.260 CMD: Grp 0, Cmd 'WEBQUERY', Idx 1, Len 107, Pld -99, Data 'http://www.xxxx.xxxx/connect/v1/device?al=29236 GET [id:01234567890123456789012345678901|ip:192.168.0.38]'

REQUESTED INFORMATION

Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!

- [N/A ] If using rules, provide the output of this command: `Backlog Rule1; Rule2; Rule3`:
```lua
  Rules output here:
- [x ] Set `weblog` to 4 and then, when you experience your issue, provide the output of the Console log:
```lua
  Console output here:
00:09:07.260 CMD: Grp 0, Cmd 'WEBQUERY', Idx 1, Len 107, Pld -99, Data 'http://www.xxxx.xxxx/connect/v1/device?al=29236 GET [id:01234567890123456789012345678901|ip:192.168.0.38]'
00:09:07.461 RSL: RESULT = {"WebQuery":{"WebQuery":}
00:09:07.470 RSL: RESULT = {"WebQuery":"Done"}

TO REPRODUCE

Steps to reproduce the behavior: I can send my script in mp for a real-time test

EXPECTED BEHAVIOUR

A clear and concise description of what you expected to happen. I expect this answer as on 8266 devices: 23:49:00.760 RSL: RESULT = {"WebQuery":{"cmd":0, "con":0, "alv":172297, "cf1":2, "cf2":3, "cf3":4, "cf4":4, "cf5":20}}

SCREENSHOTS

If applicable, add screenshots to help explain your problem.

ADDITIONAL CONTEXT

Add any other context about the problem here. I compiled Tasmota 13.0.0 using TasmoCompiler including Script and TLS MQTT support but not Berry. Here are my user_config_override.h options:

ifndef ESP32

define ESP32

endif

ifdef USE_BERRY

undef USE_BERRY

endif

ifdef SCRIPT_MAXSSIZE

undef SCRIPT_MAXSSIZE

endif

define SCRIPT_MAXSSIZE 64

ifndef USE_SCRIPT

define USE_SCRIPT

endif

ifndef USE_WEBSEND_RESPONSE

define USE_WEBSEND_RESPONSE

endif

ifdef USE_RULES

undef USE_RULES

endif

define PRECONFIGURED_SCRIPT ">D 128\n;--- start setup ---\n;......"

ifndef START_SCRIPT_FROM_BOOT

define START_SCRIPT_FROM_BOOT

endif

(Please, remember to close the issue when the problem has been addressed)

barbudor commented 11 months ago

I've seen cases of corrupted response buffer. I can't remember if it was exclusively on ESP esp32 but, yes it might be the case. It has been in my to do list for months I need to put it back on top of the list

sfromis commented 11 months ago

May not be related, by recently another case of broken WebQuery responses was mentioned. IDK if the investigation got further. https://discord.com/channels/479389167382691863/790182324939128832/1175492686833266739

barbudor commented 11 months ago

I get some funny corruptions where part of the previous log message got it's way in the WebQuery response

18:04:00.948 CMD: backlog rule1 1 ; rule2 1
18:04:00.982 MQT: stat/esp32dev3/RULE = {"Rule1":{"State":"ON","Once":"OFF","StopOnError":"OFF","Length":66,"Free":445,"Rules":"on tele-Time do webquery http://192.168.168.244/test.php GET endon"}}
18:04:01.210 MQT: stat/esp32dev3/RULE = {"Rule2":{"State":"ON","Once":"OFF","StopOnError":"OFF","Length":38,"Free":473,"Rules":"on WebQuery#Time do var1 %value% endon"}}
18:04:03.964 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:04:03+01:00","Uptime":"0T00:01:35","UptimeSec":95,"Heap":169,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":100,"Signal":-45,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:04:03.996 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:04:04.031 {"Time":"2023-11-26T18:04:04+01:00"}
18:04:04.032 AssumeJSON=1
18:04:04.039 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{ssumeJSON=1}
18:04:04.053 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
18:04:13.927 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:04:13+01:00","Uptime":"0T00:01:45","UptimeSec":105,"Heap":170,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":100,"Signal":-50,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:04:13.960 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:04:14.002 {"Time":"2023-11-26T18:04:14+01:00"}
18:04:14.004 AssumeJSON=1
18:04:14.010 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{ssumeJSON=1}
18:04:14.021 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
18:04:23.946 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:04:23+01:00","Uptime":"0T00:01:55","UptimeSec":115,"Heap":170,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":100,"Signal":-50,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:04:23.983 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:04:24.029 {"Time":"2023-11-26T18:04:24+01:00"}
18:04:24.031 AssumeJSON=1
18:04:24.038 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{ssumeJSON=1}
18:04:24.049 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
18:04:33.925 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:04:33+01:00","Uptime":"0T00:02:05","UptimeSec":125,"Heap":170,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":98,"Signal":-51,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:04:33.958 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:04:34.022 {"Time":"2023-11-26T18:04:34+01:00"}
18:04:34.024 AssumeJSON=1
18:04:34.033 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{"Time":"2023-11-26T18:04:34+01:00"}}
18:04:34.049 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
18:04:43.921 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:04:43+01:00","Uptime":"0T00:02:15","UptimeSec":135,"Heap":170,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":96,"Signal":-52,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:04:43.954 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:04:43.993 {"Time":"2023-11-26T18:04:44+01:00"}
18:04:43.994 AssumeJSON=1
18:04:44.001 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{ssumeJSON=1}
18:04:44.014 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
18:04:53.937 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:04:53+01:00","Uptime":"0T00:02:25","UptimeSec":145,"Heap":170,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":92,"Signal":-54,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:04:53.970 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:04:54.007 {"Time":"2023-11-26T18:04:54+01:00"}
18:04:54.008 AssumeJSON=1
18:04:54.016 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{ssumeJSON=1}
18:04:54.032 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
18:05:03.954 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:05:03+01:00","Uptime":"0T00:02:35","UptimeSec":155,"Heap":170,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":96,"Signal":-52,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:05:03.988 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:05:04.229 {"Time":"2023-11-26T18:05:04+01:00"}
18:05:04.231 AssumeJSON=1
18:05:04.238 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{}
18:05:04.251 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
18:05:13.923 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:05:13+01:00","Uptime":"0T00:02:45","UptimeSec":165,"Heap":170,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":96,"Signal":-52,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:05:13.957 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:05:14.035 {"Time":"2023-11-26T18:05:14+01:00"}
18:05:14.036 AssumeJSON=1
18:05:14.045 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{"Time":"2023-11-26T18:05:14+01:00"}}
18:05:14.063 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
18:05:23.934 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:05:23+01:00","Uptime":"0T00:02:55","UptimeSec":175,"Heap":170,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":100,"Signal":-47,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:05:23.967 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:05:24.203 {"Time":"2023-11-26T18:05:24+01:00"}
18:05:24.204 AssumeJSON=1
18:05:24.212 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{ssumeJSON=xV���?�d}
18:05:24.226 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
18:05:33.951 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:05:33+01:00","Uptime":"0T00:03:05","UptimeSec":185,"Heap":170,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":100,"Signal":-48,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:05:33.984 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:05:34.018 {"Time":"2023-11-26T18:05:34+01:00"}
18:05:34.019 AssumeJSON=1
18:05:34.027 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{ssumeJSON=xV���?,d}
18:05:34.041 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
18:05:43.965 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:05:43+01:00","Uptime":"0T00:03:15","UptimeSec":195,"Heap":170,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":100,"Signal":-49,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:05:44.000 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:05:44.043 {"Time":"2023-11-26T18:05:44+01:00"}
18:05:44.045 AssumeJSON=1
18:05:44.053 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{ssumeJSON=xV���?`d}
18:05:44.066 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
18:05:53.940 MQT: tele/esp32dev3/STATE = {"Time":"2023-11-26T18:05:53+01:00","Uptime":"0T00:03:25","UptimeSec":205,"Heap":170,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":100,"Signal":-49,"LinkCount":1,"Downtime":"0T00:00:10"}}
18:05:53.976 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
18:05:54.009 {"Time":"2023-11-26T18:05:54+01:00"}
18:05:54.010 AssumeJSON=1
18:05:54.017 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{ssumeJSON=xV���?�d}
18:05:54.031 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}
barbudor commented 11 months ago

@arendst I believe we've found a memory coruption problem around Response_P() and AddLog() functions in WebQuery Reponse. Probably pointing to ext_vsnprintf_P(...). The problem occurs only on ESP32, not on ESP8266. I've seen such behavior for quite some time, months is not years but didn't had the time to investigate further before.

I have added some serial log here: https://github.com/barbudor/Tasmota/blob/5d2a8d1c792305e18e6b7e74be3a40ac284396f3/tasmota/tasmota_xdrv_driver/xdrv_01_9_webserver.ino#L3357-L3383

When it works fine and there is no corruption, it looks like:

19:03:52.946 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
a={"Time":"2023-11-26T19:03:53+01:00"}
b={"Time":"2023-11-26T19:03:53+01:00"}
c={"Time":"2023-11-26T19:03:53+01:00"}
y={"Time":"2023-11-26T19:03:53+01:00"}
z={"Time":"2023-11-26T19:03:53+01:00"}
19:03:52.000 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{"Time":"2023-11-26T19:03:53+01:00"}}
19:03:53.015 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}

But randomly (often) it shows things like:

18:58:42.951 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
a={"Time":"2023-11-26T18:58:43+01:00"}
b={"Time":"2023-11-26T18:58:43+01:00"}
c={"WebQuery":
y={"WebQuery":
z={"WebQuery":
18:58:42.993 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{"WebQuery":}
18:58:43.017 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}

or like:

19:02:02.983 RUL: TELE-TIME performs "webquery http://192.168.168.244/test.php GET"
a={"Time":"2023-11-26T19:02:03+01:00"}
b={"Time":"2023-11-26T19:02:03+01:00"}
c={"Time":"2023-11-26T19:02:03+01:00"}
y=
z=
19:02:03.022 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":{"Time":"2<}
19:02:03.041 MQT: stat/esp32dev3/WEBQUERY = {"WebQuery":"Done"}

It's not related to the length of the HTTP response which here is quite short (less than 40 bytes).

IMHO, this makes obvious that the buffer holding the WebQuery response (read0 in the code linked) gets corrupted when calling Response_P(...). The same problems also affect AddLog(...) because before using Serial.print I was using AddLog and the log message was written in the WebQuery MQTT response.

arendst commented 11 months ago

What happens if you copy the result from http.getstring() into a new variable? It might well be the http result is freed by garbage collection before the rest of the code is finished....

barbudor commented 11 months ago

It looks better. No corruption so far I need to find a way to get a proper test running for the night

barbudor commented 11 months ago

PR sent