arendst / Tasmota

Alternative firmware for ESP8266 and ESP32 based devices with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX. Full documentation at
https://tasmota.github.io/docs
GNU General Public License v3.0
22.02k stars 4.77k forks source link

Permanently MQTT Reconnects #5146

Closed helmar74 closed 5 years ago

helmar74 commented 5 years ago

Hi,

I do have permanent MQTT Disconnects/Connects in my mosquitto MQTT Broker log: 1549537348: New client connected from 192.168.1.XXX as S20Steckdose5 (c1, k10, u'DVES_USER'). 1549537348: New connection from 192.168.1.YYY on port 1883. 1549537348: New client connected from 192.168.1.YYY as S20Steckdose4 (c1, k10, u'DVES_USER'). 1549537349: New connection from 192.168.1.ZZZ on port 1883. 1549537349: New client connected from 192.168.1.ZZZ as S20Steckdose7 (c1, k10, u'DVES_USER'). 1549537577: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.

I had this issue when using CORE Library 2.4.2. I updated to CORE Library 2.5.0, but the issue still persists. Does anyone know, how to fix this issue?

I also found this interessting articles: https://community.home-assistant.io/t/constant-mqtt-devices-disconnections-socket-error/83477/50

He fixed it like that: image

He raised KEEPALIVE and TIMEOUT Parameter for MQTT connection. Is this "recommend" to fix the issue, or is there an other solution?

Thank's a lot

Helmar

IMPORTANT NOTICE If you do not complete the template below it is likely that your issue will not be addressed. When providing information about your issue please be as extensive as possible so that it can be solved by as little as possible responses.

FAILURE TO COMPLETE THE REQUESTED INFORMATION WILL RESULT IN YOUR ISSUE BEING CLOSED

Make sure these boxes are checked [x] before submitting your issue - Thank you!

(Please, remember to close the issue when the problem has been addressed)

ascillato commented 5 years ago

Hi, the real fix was that he uses core 2.3.0.

Some routers are better for core 2.3.0 and some others for newer cores. You need to test it for your case.

ascillato commented 5 years ago

https://github.com/arendst/Sonoff-Tasmota/wiki/Troubleshooting#wifi-issues-arduino-core-versions-and-expressif-sdk

helmar74 commented 5 years ago

OK, I'm running a Fritzbox 7590 wit 1750E Accesspoints in MESH Configuration. Is someone here, who have the same setup and is not suffering this problem?

The link above shows for CORE 2.3.0: Some Routers of brands Ubiquity and Fritzbox produces conflicts and disconnections

As I'm running a Fritzbox Core 2.3.0 shouldn't be a solution....

ascillato commented 5 years ago

If you look into issues you will see that fritzbox has issues with the Arduino core. Please, try core 2.3.0. The precompiled bins with core 2.3.0 are already compiled for you.

helmar74 commented 5 years ago

If the Fritzbox is the cause, it's more a WIfI issue, than an MQTT issue? (I also was thinking of changing MQTT Library to MQTT_TASMOTAMQTT) Currently I'm using MQTT_PUBSUBCLIENT

I also turned off mDNS, until now the issue doesn't occured on that devices any more.. Have to wait longer, will provide feedback...

ascillato2 commented 5 years ago

The mqtt reconnections are for socket error. Sockets are managed by the core. The core 2.3.0 may give you the solution. Please please test it. You can just do an OTA with the precompiled bin.

TasmotaMQTT is deprecated and no longer supported. It will not solve your issue. In fact it will give you under some circunstances software watchdogs restarts.

mDNS is unrelated to mqtt and sockets.

helmar74 commented 5 years ago

After studying https://github.com/arendst/Sonoff-Tasmota/wiki/Troubleshooting#wifi-issues-arduino-core-versions-and-expressif-sdk

My "plan" is like that:

  1. If the disconnects occurs again, I'll try sleep 0 image

  2. If this doesn't work I'll try Core 2.3.0 Library

I keep you informed...

helmar74 commented 5 years ago

Setting sleep 0 isn't really an option as the Load grows dramatically. So I decided to "downgrade" to Core 2.3.0 all my devices. Will provide feedback. Thanks for your help so far!

AndyPuettmann commented 5 years ago

Great! Will confirm/test the fix too then. I am also running into the issues with mqtt disconnects. I am having a fritz box 7490. Could probably be the issue. Was suspecting node-red as culprit, but this could also explain it.

Downgraded to: Program Version 6.4.1(release-sonoff) Build Date & Time 2018-12-24T15:51:34 Core/SDK Version 2_3_0/1.5.3(aec24ac9)

for now. Will give feedback in a day or two. Should be easy to spot. As I had disconnects every 20-40 minutes

AndyPuettmann commented 5 years ago

Looks like I did not get a single disconnect since downgrading. I noticed the Vcc reading changed a bit.

image

Should be a measurement change, not some actual change, right? (I used the Vcc reported from the telemetry)

I'll leave it this way until tomorrow. I will then switch back to the latest version and check if it starts again.

ascillato commented 5 years ago

The change in VCC measurement is a known issue of the old core. Don't worry. It is ok.

helmar74 commented 5 years ago

I updated 27 devices with core 2.3.0. Still having MQTT reconnects on 3 of them it is one Sonoff S20, one Wemos Mini D (with WS 8212 connected) and one Shelly2... The other Devices are working fine. Not quite sure if Core 2.3.0 really fixes the issue...

1549566494: New client connected from 192.168.1.180 as sonoffShelly2-1 (c1, k10, u'DVES_USER'). 1549566520: Client sonoffShelly2-1 has exceeded timeout, disconnecting. 1549566520: Socket error on client sonoffShelly2-1, disconnecting. 1549566541: New connection from 192.168.1.180 on port 1883.

1549561198: Client S20Steckdose3 has exceeded timeout, disconnecting. 1549561198: Socket error on client S20Steckdose3, disconnecting. 1549561199: New connection from 192.168.1.151 on port 1883. 1549561199: New client connected from 192.168.1.151 as S20Steckdose3 (c1, k10, u'DVES_USER'). Client sonoffWemosMini-7-GRBW has exceeded timeout, disconnecting. 1549564518: Socket error on client sonoffWemosMini-7-GRBW, disconnecting. 1549564527: New connection from 192.168.1.181 on port 1883. 1549564527: New client connected from 192.168.1.181 as sonoffWemosMini-7-GRBW (c1, k10, u'DVES_USER').

When the Shelly2 starts reconnecting, it happened very often: image

helmar74 commented 5 years ago

Disconnected Shelly2, giving again Feedback

AndyPuettmann commented 5 years ago

I had a single disconnect since installing the Core 2.3.0 version. However, this was when the Fritz-Box switched the wireless channel to a different one for performance reasons. (My device count is only 3, so it cannot be compared to your setup.)

Were the now disconnecting devices having issues before?

ascillato2 commented 5 years ago

@helmar74 can you config your router to use a fixed channel?

helmar74 commented 5 years ago

@AndyPuettmann : Yes the Devices, which are disconnecting now, had also trouble before @ascillato2 : I configured all of my Accesspoints (and the AVM MESH WIFI Network) to fixed channels.

Will provide feedback.........

DavidFW1960 commented 5 years ago

I have a FB 7590 and 7490 (mesh repeater) so 2 strikes for me but the 2.3.0 core is 100% stable here. 2.4.2 and 2.5.0(beta 2 or 3) were an absolute nightmare! 2.3.0 is rock solid. I was also previously using auto for wifi channel but did fix that to a static channel... haven’t tried 2.3.0 core with auto channel..

AndyPuettmann commented 5 years ago

Okay. Well. Not a single disconnect since switching version.

I the cause known exactly or pinned down specific parts of the Fritz-Box? I am playing with the idea to get an old router with openwrt in access point mode for the sonoffs. Also possibly the raspberry pis with wlan to lan bridge via hostapd so they serve as access points in a transparent mode. Would this fix the issue? I.e. is the direct connection to the fritzbox the problem or the fritzbox as dhcp server?

helmar74 commented 5 years ago

Hi everbody,

what I've done so far: I switched my Wifi completely to fixed channels. I "upgraded" 9 of my S20 Sonoffs to Tasmota 6.4.1 with Core Version 2.5.0 (non BETA) The FW was compiled by myself with some personal settings. I was used to disable the mDNS usage in my_user_config.h, as I'm getting DNS failed messages every second on a few of the S20's after a while. (not after startup, it took ~60 minutes)

After about 90 minutes the first MQTT reconnects occured. When the MQTT reconnect occurs, I also see, that there has been a WIFI reconnect on the Access Point: image

The Tasmota Device didn't logged a Wifi reconnect in his webconsole log. (I raised webconsole loglevel to debug) The Device I was looking on is in the same room as the wifi access point.

Until now there were two reconnects of one device within 3 hours.

All other Tasmota Devices, which are running on a 2.3.0 Core based Firmware aren't reconnecting.... at the moment

I'll give regular feedback in this post.

Regards

helmar

ascillato2 commented 5 years ago

Nice to know that fixing the wifi channel in the router and using core 2.3.0, the reconnection issue has been improved. Anyway, this is a known issue of the core that it is outside the control of Tasmota.

Closing this issue as it is not a Tasmota bug, but anyway, feel free to ask and comment on this trying to find a better way to workaround this core problem with fritzbox routers.

helmar74 commented 5 years ago

OK, will do so ;-)

Here's the weblog of a 2.5.0 CORE Device, when MQTT Reconnect occurs: 17:25:23 MQT: tele/sonoffS20_7/STATE = {"Time":"2019-02-08T17:25:23","Uptime":"0T03:40:16","Vcc":3.552,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"POWER":"OFF","Wifi":{"AP":1,"SSId":"SSID","BSSId":"E8:DF:70:67:B9:00","Channel":11,"RSSI":68}} 17:28:14 WIF: verbinden mit AP1 SSID in Modus 11N wie S20Steckdose7... 17:28:18 WIF: verbunden 17:28:19 MQT: Verbindungsversuch... 17:28:19 MQT: verbunden 17:28:19 MQT: tele/sonoffS20_7/LWT = Online (beibehalten) 17:28:19 MQT: cmnd/sonoffS20_7/POWER = 17:30:24 MQT: tele/sonoffS20_7/STATE = {"Time":"2019-02-08T17:30:24","Uptime":"0T03:45:17","Vcc":3.540,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"POWER":"OFF","Wifi":{"AP":1,"SSId":"SSID","BSSId":"E8:DF:70:67:B9:00","Channel":11,"RSSI":72}}

helmar74 commented 5 years ago

I increased weblog level to 4. Now I'm getting the following messages in webconsole: 17:33:08 WIF: Prüfe Verbindung... 17:33:08 WIF: verbunden 17:33:28 WIF: Prüfe Verbindung... 17:33:28 WIF: verbunden 17:33:48 WIF: Prüfe Verbindung... 17:33:48 WIF: verbunden

Will post update, when the reconnect occurs.....

helmar74 commented 5 years ago

After about 3:30 hours it seems that the reconnects occurs more often. Now I have reconnects of more and more 2.5.0 devices....

meingraham commented 5 years ago

Helmar,

Do you have a system where you can set up a syslog server? If so, it may be prudent to do so and to configure your devices to send their logs there so you don't miss any details. YOu can set the same level of logging (SysLog 4) and perhaps back down your WebLog level since it will be redundant once the detailed information is being sent to the server.

Mike

helmar74 commented 5 years ago

When the MQTT Reconnect occurs, the 2.5.0 Device is logging this:

20:17:26 WIF: Prüfe Verbindung... 20:17:26 WIF: verbunden 20:17:44 MQT: Verbindungsversuch... 20:17:45 MQT: verbunden 20:17:45 MQT: tele/sonoffS20_8/LWT = Online (beibehalten) 20:17:45 MQT: cmnd/sonoffS20_8/POWER = 20:17:45 MQT: abonniere cmnd/sonoffS20_8/# 20:17:45 MQT: abonniere cmnd/sonoffStandardGruppe/# 20:17:45 MQT: abonniere cmnd/S20Steckdose8_fb/# 20:17:47 WIF: Prüfe Verbindung... 20:17:47 WIF: verbunden 20:18:07 WIF: Prüfe Verbindung...

helmar74 commented 5 years ago

During the night (about 14 hours) all Devices on CORE 2.3.0 hadn't any disconnects. now I tried to flash CORE 2.5.0 on 2 devices again, with only 1 Accesspoint configured. One of the devices i do a reset 5 after flashing and I reconfigured new.

Waiting for any results... will provide Feedback. I think, this will be my last try.

helmar74 commented 5 years ago

Had reconnects again. Also tried sleep 0, (with core 2.5.0) as suggested here: https://github.com/arendst/Sonoff-Tasmota/wiki/Troubleshooting#disconnects-and-reconnects

Now (of course) the load get's higher. Is it a Problem?

15:18:29 MQT: tele/sonoffS26_1/STATE = {"Time":"2019-02-09T15:18:29","Uptime":"0T06:51:53","Vcc":3.532,"SleepMode":"Dynamic","Sleep":0,"LoadAvg":677,"POWER":"OFF","Wifi":{"AP":1,"SSId":"SSID","BSSId":"9C:C7:A6:F7:72:A2","Channel":11,"RSSI":80}}

helmar74 commented 5 years ago

I've done the following with one of my core 2.5.0 S20s which was having permanent MQTT reconnects:

Erase the complete flash with command esptool.py --port COM5 erase_flash Load Tasmota into a device with command esptool.py --port COM5 write_flash -fs 1MB -fm dout 0x0 sonoff.bin

Will provide feedback....

AndyPuettmann commented 5 years ago

Sleep 0 prevents the cpu from going to sleep. So it does only increase the power consumption from something around 0,6W up to around 1.1W.

This may make a difference on your bill for electricity if running a ton of those devices. But should not have any effect other than this. (Could possibly get warmer if its inside an air-tight spot, should examine it after a day or two.

helmar74 commented 5 years ago

I flashed a few S20 by wire. First I erased flash with esptool like suggested here: https://github.com/arendst/Sonoff-Tasmota/wiki/Troubleshooting#disconnects-and-reconnects

After a while a few of the S20 reconnected, others didn't. What I tried next:

  1. One S20 is connected to a 1750E, which is part of a FB7590 WLAN (Both running mesh on WIFI CHANNEL 6)
  2. One S20 is connected to separat Fritzbox 7390 (WIFI CHANNEL 1)
  3. As said, the Fritzbox is set to fixed channel, not autochannel.
  4. The 1750E is connected with Wire to the 7590.
  5. Both S20 are placed in the same room as the wifi components. (only 50cm between S20 and 1750E/7390)

Will see, what happens.... Provide again feedback