Closed helmar74 closed 5 years ago
Hi, the real fix was that he uses core 2.3.0.
Some routers are better for core 2.3.0 and some others for newer cores. You need to test it for your case.
OK, I'm running a Fritzbox 7590 wit 1750E Accesspoints in MESH Configuration. Is someone here, who have the same setup and is not suffering this problem?
The link above shows for CORE 2.3.0: Some Routers of brands Ubiquity and Fritzbox produces conflicts and disconnections
As I'm running a Fritzbox Core 2.3.0 shouldn't be a solution....
If you look into issues you will see that fritzbox has issues with the Arduino core. Please, try core 2.3.0. The precompiled bins with core 2.3.0 are already compiled for you.
If the Fritzbox is the cause, it's more a WIfI issue, than an MQTT issue? (I also was thinking of changing MQTT Library to MQTT_TASMOTAMQTT) Currently I'm using MQTT_PUBSUBCLIENT
I also turned off mDNS, until now the issue doesn't occured on that devices any more.. Have to wait longer, will provide feedback...
The mqtt reconnections are for socket error. Sockets are managed by the core. The core 2.3.0 may give you the solution. Please please test it. You can just do an OTA with the precompiled bin.
TasmotaMQTT is deprecated and no longer supported. It will not solve your issue. In fact it will give you under some circunstances software watchdogs restarts.
mDNS is unrelated to mqtt and sockets.
After studying https://github.com/arendst/Sonoff-Tasmota/wiki/Troubleshooting#wifi-issues-arduino-core-versions-and-expressif-sdk
My "plan" is like that:
If the disconnects occurs again, I'll try sleep 0
If this doesn't work I'll try Core 2.3.0 Library
I keep you informed...
Setting sleep 0 isn't really an option as the Load grows dramatically. So I decided to "downgrade" to Core 2.3.0 all my devices. Will provide feedback. Thanks for your help so far!
Great! Will confirm/test the fix too then. I am also running into the issues with mqtt disconnects. I am having a fritz box 7490. Could probably be the issue. Was suspecting node-red as culprit, but this could also explain it.
Downgraded to: Program Version 6.4.1(release-sonoff) Build Date & Time 2018-12-24T15:51:34 Core/SDK Version 2_3_0/1.5.3(aec24ac9)
for now. Will give feedback in a day or two. Should be easy to spot. As I had disconnects every 20-40 minutes
Looks like I did not get a single disconnect since downgrading. I noticed the Vcc reading changed a bit.
Should be a measurement change, not some actual change, right? (I used the Vcc reported from the telemetry)
I'll leave it this way until tomorrow. I will then switch back to the latest version and check if it starts again.
The change in VCC measurement is a known issue of the old core. Don't worry. It is ok.
I updated 27 devices with core 2.3.0. Still having MQTT reconnects on 3 of them it is one Sonoff S20, one Wemos Mini D (with WS 8212 connected) and one Shelly2... The other Devices are working fine. Not quite sure if Core 2.3.0 really fixes the issue...
1549566494: New client connected from 192.168.1.180 as sonoffShelly2-1 (c1, k10, u'DVES_USER'). 1549566520: Client sonoffShelly2-1 has exceeded timeout, disconnecting. 1549566520: Socket error on client sonoffShelly2-1, disconnecting. 1549566541: New connection from 192.168.1.180 on port 1883.
1549561198: Client S20Steckdose3 has exceeded timeout, disconnecting. 1549561198: Socket error on client S20Steckdose3, disconnecting. 1549561199: New connection from 192.168.1.151 on port 1883. 1549561199: New client connected from 192.168.1.151 as S20Steckdose3 (c1, k10, u'DVES_USER').
Client sonoffWemosMini-7-GRBW has exceeded timeout, disconnecting. 1549564518: Socket error on client sonoffWemosMini-7-GRBW, disconnecting. 1549564527: New connection from 192.168.1.181 on port 1883. 1549564527: New client connected from 192.168.1.181 as sonoffWemosMini-7-GRBW (c1, k10, u'DVES_USER').
When the Shelly2 starts reconnecting, it happened very often:
Disconnected Shelly2, giving again Feedback
I had a single disconnect since installing the Core 2.3.0 version. However, this was when the Fritz-Box switched the wireless channel to a different one for performance reasons. (My device count is only 3, so it cannot be compared to your setup.)
Were the now disconnecting devices having issues before?
@helmar74 can you config your router to use a fixed channel?
@AndyPuettmann : Yes the Devices, which are disconnecting now, had also trouble before @ascillato2 : I configured all of my Accesspoints (and the AVM MESH WIFI Network) to fixed channels.
Will provide feedback.........
I have a FB 7590 and 7490 (mesh repeater) so 2 strikes for me but the 2.3.0 core is 100% stable here. 2.4.2 and 2.5.0(beta 2 or 3) were an absolute nightmare! 2.3.0 is rock solid. I was also previously using auto for wifi channel but did fix that to a static channel... haven’t tried 2.3.0 core with auto channel..
Okay. Well. Not a single disconnect since switching version.
I the cause known exactly or pinned down specific parts of the Fritz-Box? I am playing with the idea to get an old router with openwrt in access point mode for the sonoffs. Also possibly the raspberry pis with wlan to lan bridge via hostapd so they serve as access points in a transparent mode. Would this fix the issue? I.e. is the direct connection to the fritzbox the problem or the fritzbox as dhcp server?
Hi everbody,
what I've done so far: I switched my Wifi completely to fixed channels. I "upgraded" 9 of my S20 Sonoffs to Tasmota 6.4.1 with Core Version 2.5.0 (non BETA) The FW was compiled by myself with some personal settings. I was used to disable the mDNS usage in my_user_config.h, as I'm getting DNS failed messages every second on a few of the S20's after a while. (not after startup, it took ~60 minutes)
After about 90 minutes the first MQTT reconnects occured. When the MQTT reconnect occurs, I also see, that there has been a WIFI reconnect on the Access Point:
The Tasmota Device didn't logged a Wifi reconnect in his webconsole log. (I raised webconsole loglevel to debug) The Device I was looking on is in the same room as the wifi access point.
Until now there were two reconnects of one device within 3 hours.
All other Tasmota Devices, which are running on a 2.3.0 Core based Firmware aren't reconnecting.... at the moment
I'll give regular feedback in this post.
Regards
helmar
Nice to know that fixing the wifi channel in the router and using core 2.3.0, the reconnection issue has been improved. Anyway, this is a known issue of the core that it is outside the control of Tasmota.
Closing this issue as it is not a Tasmota bug, but anyway, feel free to ask and comment on this trying to find a better way to workaround this core problem with fritzbox routers.
OK, will do so ;-)
Here's the weblog of a 2.5.0 CORE Device, when MQTT Reconnect occurs:
17:25:23 MQT: tele/sonoffS20_7/STATE = {"Time":"2019-02-08T17:25:23","Uptime":"0T03:40:16","Vcc":3.552,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"POWER":"OFF","Wifi":{"AP":1,"SSId":"SSID","BSSId":"E8:DF:70:67:B9:00","Channel":11,"RSSI":68}} 17:28:14 WIF: verbinden mit AP1 SSID in Modus 11N wie S20Steckdose7... 17:28:18 WIF: verbunden 17:28:19 MQT: Verbindungsversuch... 17:28:19 MQT: verbunden 17:28:19 MQT: tele/sonoffS20_7/LWT = Online (beibehalten) 17:28:19 MQT: cmnd/sonoffS20_7/POWER = 17:30:24 MQT: tele/sonoffS20_7/STATE = {"Time":"2019-02-08T17:30:24","Uptime":"0T03:45:17","Vcc":3.540,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"POWER":"OFF","Wifi":{"AP":1,"SSId":"SSID","BSSId":"E8:DF:70:67:B9:00","Channel":11,"RSSI":72}}
I increased weblog level to 4. Now I'm getting the following messages in webconsole:
17:33:08 WIF: Prüfe Verbindung... 17:33:08 WIF: verbunden 17:33:28 WIF: Prüfe Verbindung... 17:33:28 WIF: verbunden 17:33:48 WIF: Prüfe Verbindung... 17:33:48 WIF: verbunden
Will post update, when the reconnect occurs.....
After about 3:30 hours it seems that the reconnects occurs more often. Now I have reconnects of more and more 2.5.0 devices....
Helmar,
Do you have a system where you can set up a syslog server? If so, it may be prudent to do so and to configure your devices to send their logs there so you don't miss any details. YOu can set the same level of logging (SysLog 4) and perhaps back down your WebLog level since it will be redundant once the detailed information is being sent to the server.
Mike
When the MQTT Reconnect occurs, the 2.5.0 Device is logging this:
20:17:26 WIF: Prüfe Verbindung... 20:17:26 WIF: verbunden 20:17:44 MQT: Verbindungsversuch... 20:17:45 MQT: verbunden 20:17:45 MQT: tele/sonoffS20_8/LWT = Online (beibehalten) 20:17:45 MQT: cmnd/sonoffS20_8/POWER = 20:17:45 MQT: abonniere cmnd/sonoffS20_8/# 20:17:45 MQT: abonniere cmnd/sonoffStandardGruppe/# 20:17:45 MQT: abonniere cmnd/S20Steckdose8_fb/# 20:17:47 WIF: Prüfe Verbindung... 20:17:47 WIF: verbunden 20:18:07 WIF: Prüfe Verbindung...
During the night (about 14 hours) all Devices on CORE 2.3.0 hadn't any disconnects. now I tried to flash CORE 2.5.0 on 2 devices again, with only 1 Accesspoint configured. One of the devices i do a reset 5 after flashing and I reconfigured new.
Waiting for any results... will provide Feedback. I think, this will be my last try.
Had reconnects again. Also tried sleep 0, (with core 2.5.0) as suggested here: https://github.com/arendst/Sonoff-Tasmota/wiki/Troubleshooting#disconnects-and-reconnects
Now (of course) the load get's higher. Is it a Problem?
15:18:29 MQT: tele/sonoffS26_1/STATE = {"Time":"2019-02-09T15:18:29","Uptime":"0T06:51:53","Vcc":3.532,"SleepMode":"Dynamic","Sleep":0,"LoadAvg":677,"POWER":"OFF","Wifi":{"AP":1,"SSId":"SSID","BSSId":"9C:C7:A6:F7:72:A2","Channel":11,"RSSI":80}}
I've done the following with one of my core 2.5.0 S20s which was having permanent MQTT reconnects:
Erase the complete flash with command esptool.py --port COM5 erase_flash Load Tasmota into a device with command esptool.py --port COM5 write_flash -fs 1MB -fm dout 0x0 sonoff.bin
Will provide feedback....
Sleep 0 prevents the cpu from going to sleep. So it does only increase the power consumption from something around 0,6W up to around 1.1W.
This may make a difference on your bill for electricity if running a ton of those devices. But should not have any effect other than this. (Could possibly get warmer if its inside an air-tight spot, should examine it after a day or two.
I flashed a few S20 by wire. First I erased flash with esptool like suggested here: https://github.com/arendst/Sonoff-Tasmota/wiki/Troubleshooting#disconnects-and-reconnects
After a while a few of the S20 reconnected, others didn't. What I tried next:
Will see, what happens.... Provide again feedback
Hi,
I do have permanent MQTT Disconnects/Connects in my mosquitto MQTT Broker log:
1549537348: New client connected from 192.168.1.XXX as S20Steckdose5 (c1, k10, u'DVES_USER'). 1549537348: New connection from 192.168.1.YYY on port 1883. 1549537348: New client connected from 192.168.1.YYY as S20Steckdose4 (c1, k10, u'DVES_USER'). 1549537349: New connection from 192.168.1.ZZZ on port 1883. 1549537349: New client connected from 192.168.1.ZZZ as S20Steckdose7 (c1, k10, u'DVES_USER'). 1549537577: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
I had this issue when using CORE Library 2.4.2. I updated to CORE Library 2.5.0, but the issue still persists. Does anyone know, how to fix this issue?
I also found this interessting articles: https://community.home-assistant.io/t/constant-mqtt-devices-disconnections-socket-error/83477/50
He fixed it like that:
He raised KEEPALIVE and TIMEOUT Parameter for MQTT connection. Is this "recommend" to fix the issue, or is there an other solution?
Thank's a lot
Helmar
IMPORTANT NOTICE If you do not complete the template below it is likely that your issue will not be addressed. When providing information about your issue please be as extensive as possible so that it can be solved by as little as possible responses.
FAILURE TO COMPLETE THE REQUESTED INFORMATION WILL RESULT IN YOUR ISSUE BEING CLOSED
Make sure these boxes are checked [x] before submitting your issue - Thank you!
status 0
:(Please, remember to close the issue when the problem has been addressed)