Closed markus-seidl closed 2 years ago
Hi,
Seems to be more a Wi-Fi issue rather than MQTT.
Try to do a reset 3
in the console to recalibrate the Wi-Fi Radio.
I noticed that Tasmota reports
{"Info3":{"RestartReason":{"Exception":0,"Reason":"Blocked Loop","EPC":["00000000","00000000","00000000"],"EXCVADDR":"00000000","DEPC":"00000000","CallChain":["40201604","401007f4","4023847a","40263ac8","4023902e","40263ad5","40263b1a","40235c44","40000f49","40000f49","40000e19","401055a1","401055a7","4010000d","4025c564","4025c515","401007f4","40105d41","4000050c","4010315f","40103144","401007f4","40102619","401021e8","4010549b","40102d7f","40102f60","40103422","4000050c","40103144","401007f4"]}}}
twice last night, I don't know if this is related.
This is the missing WebLog 4 of the issue
03:33:40.233 CFG: Saved to flash at F6, Count 2184, Bytes 4096
03:33:49.439 WIF: Checking connection...
03:34:00.654 MQT: Attempting connection...
03:34:00.962 MQT: TLS connected in 304 ms, max ThunkStack used 3700
03:34:00.965 MQT: Server fingerprint: CB 87 25 B5 A8 A3 65 EB 39 46 5F 3D A8 C5 9B C6 B1 EE 6C 8A
03:34:00.967 MQT: Connected
03:34:01.018 MQT: f2/buero/shutter/tele/shutter_0EB06C/LWT = Online (retained)
... more Checking connection...
03:36:30.500 WIF: Checking connection...
03:36:32.750 MQT: Attempting connection...
03:36:33.049 MQT: TLS connected in 294 ms, max ThunkStack used 3700
03:36:33.051 MQT: Server fingerprint: CB 87 25 B5 A8 A3 65 EB 39 46 5F 3D A8 C5 9B C6 B1 EE 6C 8A
03:36:33.053 MQT: Connected
03:36:33.106 MQT: f2/buero/shutter/tele/shutter_0EB06C/LWT = Online (retained)
03:36:33.114 MQT: f2/buero/shutter/cmnd/shutter_0EB06C/POWER =
03:36:33.117 MQT: Subscribe to f2/buero/shutter/cmnd/shutter_0EB06C/#
03:36:33.130 MQT: Subscribe to f2/buero/shutter/cmnd/tasmotas/#
03:36:33.139 MQT: Subscribe to cmnd/DVES_0EB06C_fb/#
03:36:34.408 MQT: f2/buero/shutter/tele/shutter_0EB06C/STATE = {"Time":"2021-06-30T03:36:34","Uptime":"0T00:05:12","UptimeSec":312,"Heap":25,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":24,"MqttCount":3,"POWER1":"OFF","POWER2":"OFF","Wifi":{"AP":1,"SSId":"Germain","BSSId":"68:D7:9A:45:B3:32","Channel":11,"Mode":"11n","RSSI":76,"Signal":-62,"LinkCount":1,"Downtime":"0T00:00:05"}}
03:36:34.448 MQT: f2/buero/shutter/tele/shutter_0EB06C/SENSOR = {"Time":"2021-06-30T03:36:34","Switch1":"OFF","Switch2":"OFF","Shutter1":{"Position":96,"Direction":0,"Target":96}}
03:36:50.477 WIF: Checking connection...
03:37:10.735 WIF: Checking connection...
03:37:30.720 WIF: Checking connection...
03:37:33.979 MQT: Attempting connection...
03:37:34.279 MQT: TLS connected in 298 ms, max ThunkStack used 3700
03:37:34.282 MQT: Server fingerprint: CB 87 25 B5 A8 A3 65 EB 39 46 5F 3D A8 C5 9B C6 B1 EE 6C 8A
03:37:34.284 MQT: Connected
03:37:34.345 MQT: f2/buero/shutter/tele/shutter_0EB06C/LWT = Online (retained)
I've executed now a RESET 3
and am monitoring the results.
RESET 3
hasn't brought any relief. The connection between MQTT and Tasmota is still flaky and only working causally.
The single 9.5 device is the only one that produces the reconnect logs over 4h (with around ~20 9.3 devices on the same MQTT server).
mqtt_1 | 1625097207: New connection from 192.168.20.24:61446 on port 8883.
mqtt_1 | 1625097207: New client connected from 192.168.20.24:61446 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1 | 1625097478: Client DVES_0EB06C closed its connection.
mqtt_1 | 1625097478: New connection from 192.168.20.24:60074 on port 8883.
mqtt_1 | 1625097478: New client connected from 192.168.20.24:60074 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1 | 1625098079: Client DVES_0EB06C closed its connection.
mqtt_1 | 1625098079: New connection from 192.168.20.24:64885 on port 8883.
mqtt_1 | 1625098080: New client connected from 192.168.20.24:64885 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1 | 1625098380: Client DVES_0EB06C closed its connection.
mqtt_1 | 1625098381: New connection from 192.168.20.24:53598 on port 8883.
mqtt_1 | 1625098381: New client connected from 192.168.20.24:53598 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1 | 1625098644: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1 | 1625100445: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1 | 1625100483: Client DVES_0EB06C closed its connection.
mqtt_1 | 1625100484: New connection from 192.168.20.24:52636 on port 8883.
mqtt_1 | 1625100484: New client connected from 192.168.20.24:52636 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1 | 1625101385: Client DVES_0EB06C closed its connection.
mqtt_1 | 1625101385: New connection from 192.168.20.24:52881 on port 8883.
mqtt_1 | 1625101385: New client connected from 192.168.20.24:52881 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1 | 1625102246: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1 | 1625102887: Client DVES_0EB06C closed its connection.
mqtt_1 | 1625102887: New connection from 192.168.20.24:53987 on port 8883.
mqtt_1 | 1625102887: New client connected from 192.168.20.24:53987 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1 | 1625104047: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1 | 1625104389: Client DVES_0EB06C closed its connection.
mqtt_1 | 1625104389: New connection from 192.168.20.24:59563 on port 8883.
mqtt_1 | 1625104389: New client connected from 192.168.20.24:59563 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1 | 1625105848: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1 | 1625107543: Client DVES_0EB06C closed its connection.
mqtt_1 | 1625107543: New connection from 192.168.20.24:52684 on port 8883.
mqtt_1 | 1625107543: New client connected from 192.168.20.24:52684 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C')
The logs in Tasmota haven't changed, it just reports it lost connection:
04:22:38.001 WIF: Checking connection...
04:22:50.323 MQT: Attempting connection...
04:22:50.625 MQT: TLS connected in 298 ms, max ThunkStack used 3700
04:22:50.627 MQT: Server fingerprint: CB 87 25 B5 A8 A3 65 EB 39 46 5F 3D A8 C5 9B C6 B1 EE 6C 8A
04:22:50.629 MQT: Connected
04:22:50.683 MQT: f2/buero/shutter/tele/shutter_0EB06C/LWT = Online (retained)
04:22:50.691 MQT: f2/buero/shutter/cmnd/shutter_0EB06C/POWER =
04:22:50.694 MQT: Subscribe to f2/buero/shutter/cmnd/shutter_0EB06C/#
04:22:50.703 MQT: Subscribe to f2/buero/shutter/cmnd/tasmotas/#
04:22:50.713 MQT: Subscribe to cmnd/DVES_0EB06C_fb/#
04:22:58.226 WIF: Checking connection...
04:23:18.238 WIF: Checking connection...
04:23:38.754 WIF: Checking connection...
04:23:58.988 WIF: Checking connection...
According to the server it says, that the client has closed the connection. Which sounds to me it's actively doing it. Mosquito has a different log if the server closes the connection (mostly something with "timeout") and/or if a client re-connects without closing the old one ("Client XXX already connected, closing old connection."). Both is not happening with this device.
What else to test? I could try downgrading to 9.3 again to check if the problem goes away. Anything else?
I think I've made an additional discovery: If I leave the web console open, the device is much more responsive to MQTT commands.
It's still the only devices that regularly reports restarts because of exceptions. Is there a way to decode these? Maybe these help.
{"Info3":{"RestartReason":{"Exception":3,"Reason":"Exception","EPC":["4010130d","00000000","00000000"],"EXCVADDR":"40035910","DEPC":"00000000","CallChain":["4010140f","4010140f","40101697","402544f4","402461c8","402461c8","4023ebf5","4023ecec","4023f773","40101697","4023f79c","402416bd","401016ce","40245915","40101697","4025da50","4023eab1","4023e8d7","40267efe","40267f0f","40263713","40000f49","40000f49","40000e19","401055a1","401055a7","4010000d","4025c564","4025c515","401007f4","401021e8"]}}}
Same problem here with 9.x version (both ESP32 and ESP8266 - Sonoff mini, mini R2 and dual R3). Until 8.5.1 everything is ok. If I upgrade to 9.3.1 or 9.4 or 9.5 I experience very slow web interface (almost unusable), mqtt reconnections, unresponsive wall switches. I noted that we have in common the fritz box as router (but the strange thing is that the problem is present only with 9.x version). Can I contribute in some way? (every change of firmware was made via serial interface).
I have the same issue, orbi rbr50. Keeps on reconnecting. Time it disconnects for varies.
Might not be related to this though as it could be an issue with my mesh wifi.
... it could be an issue with my mesh wifi.
Maybe. Please, try sleep 0
in your device. Thanks.
Sleep 0 seems to have improved a bit the mqtt connection, but Dual R3 keeps on reconnecting (v9.3.1). I downgraded all other sonoff to 8.5.1 and they have no problem now (fast and reliable).
Play with command MqttTimeout
which is now 4 seconds where it used to be 15 seconds and/or command MqttWifiTimeout
which is now 200 ms where it used to be 5000 ms.
I don't know what helped on my side, I did a few RESET 3
, also a restart of the MQTT Broker. Since a few days I cannot reproduce the problem any more, without a clear indicator what was the culprit.
From my point of view I would close the defect, feel free to reopen it if there are others that have the same problem.
Also the MqttTimeout and MqttWifiTimeout sound a reasonable advice to play around with, that's something I'll do if it happens again. I'll try to report it if I encounter that problem again.
Thanks for you help and time!
@markus-seidl, I have about 35 sonoffs (D1 dimmers, R2 mini , 4ch pro) Ans since I've updated from 9.4.0 to 9.5.0 I've multiple D1 not connecting to wifi (stable) anymore. There must have been some changes that seriously disrupt the wifi. Coverage can't be a problem as I have multiple AP's per floor and signal status value is never under 62%. I will likely flash the problematic devices back to 4.2.0.
Niels
Thanks for informing me @NielsPiersma , but I'm an user too, not a committer.
You could try what @arendst said (he also highlighted the changes).
@markus-seidl , just responded as you were the first to notice :).
I was able to " improve" my wifi by killing syslog. I was syslogging from all my tastmota devices to a central syslog server.
I group disabled logging by issuing the following to the topic " cmnd/tasmotas/backlog "
payload: "SerialLog 0; loghost 1; syslog 0; SLEEP 0; MqttTimeout 15; MqttWifiTimeout 5000; reset 3"
This killed all logging and set the parameters as reported by @arendst .
So I think the issue has more to do with syslog than with wifi...
NIels
Play with command
MqttTimeout
which is now 4 seconds where it used to be 15 seconds and/or commandMqttWifiTimeout
which is now 200 ms where it used to be 5000 ms.
Does anybody know why the default value for MqttTimeout was reduced from 15 to 4 seconds? I doesn´t really make sense to me.
After updating all my tasmota devices to latest version (10.1.0) I was getting a lot mqtt disconnections, and some of them were "Client DVES_xxxxxx has exceeded timeout, disconnecting" (others are "Socket error on client DVES_xxxxxxx, disconnecting", still trying to figure out what is wrong there) and I fixed them by changing the MqttTimeout from 4 to 60 seconds, shouldn´t the default value for MqttTimeout be 15 seconds at least as it was before?
PROBLEM DESCRIPTION
I have around 20 devices that are running Tasmota 9.3 and I've upgraded one to 9.5. Since then this single devices has problems to connect to the MQTT server and reconnects often. Additionally the response to MQTT commands is sporadic.
REQUESTED INFORMATION
Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!
Backlog Template; Module; GPIO 255
:Status 0
:TO REPRODUCE
Upgraded to 9.5 from 9.3 and expected to have a stable MQTT connection.
EXPECTED BEHAVIOUR
Stable MQTT connection.
SCREENSHOTS
If applicable, add screenshots to help explain your problem.
ADDITIONAL CONTEXT
Server Log File of MQTT. In the timespan of a few hours, the 9.5 device is the only one that reconnects (the log is not filtered in that timespan, these are the full last lines).
Mosquitto Configuration file
Server Version 2.0.10.
user_config_override.h (I've stripped most comments, for improved readability)