arendst / Tasmota

Alternative firmware for ESP8266 and ESP32 based devices with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX. Full documentation at
https://tasmota.github.io/docs
GNU General Public License v3.0
21.75k stars 4.72k forks source link

Increased MQTT reconnections/Spontaneous MQTT response starting with 9.4/5.0 #12501

Closed markus-seidl closed 2 years ago

markus-seidl commented 3 years ago

PROBLEM DESCRIPTION

I have around 20 devices that are running Tasmota 9.3 and I've upgraded one to 9.5. Since then this single devices has problems to connect to the MQTT server and reconnects often. Additionally the response to MQTT commands is sporadic.

REQUESTED INFORMATION

Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!

- [x] If using rules, provide the output of this command: `Backlog Rule1; Rule2; Rule3`:
```lua
  Rules output here:
04:59:53.575 MQT: f2/buero/shutter/stat/shutter_0EB06C/RESULT = {"Rule1":{"State":"ON","Once":"OFF","StopOnError":"OFF","Length":85,"Free":426,"Rules":"ON Shutter1#Position DO Publish2 f2/buero/shutter/tele/%topic%/POSITION %value% ENDON"}}
04:59:53.841 MQT: f2/buero/shutter/stat/shutter_0EB06C/RESULT = {"Rule2":{"State":"OFF","Once":"OFF","StopOnError":"OFF","Length":0,"Free":511,"Rules":""}}
04:59:54.010 MQT: f2/buero/shutter/stat/shutter_0EB06C/RESULT = {"Rule3":{"State":"OFF","Once":"OFF","StopOnError":"OFF","Length":0,"Free":511,"Rules":""}}
- [x] Set `weblog` to 4 and then, when you experience your issue, provide the output of the Console log:
```lua
  Console output here:
-- Currently I only have the log without weblog 4, working on it.
04:48:34.740 MQT: f2/buero/shutter/tele/shutter_0EB06C/SENSOR = {"Time":"2021-06-29T04:48:34","Switch1":"ON","Switch2":"OFF","Shutter1":{"Position":0,"Direction":0,"Target":0}}
04:51:52.632 MQT: Attempting connection...
04:51:52.928 MQT: TLS connected in 292 ms, max ThunkStack used 3700
04:51:52.930 MQT: Connected
04:51:52.981 MQT: f2/buero/shutter/tele/shutter_0EB06C/LWT = Online (retained)
04:51:52.989 MQT: f2/buero/shutter/cmnd/shutter_0EB06C/POWER = 

TO REPRODUCE

Upgraded to 9.5 from 9.3 and expected to have a stable MQTT connection.

EXPECTED BEHAVIOUR

Stable MQTT connection.

SCREENSHOTS

If applicable, add screenshots to help explain your problem.

ADDITIONAL CONTEXT

Server Log File of MQTT. In the timespan of a few hours, the 9.5 device is the only one that reconnects (the log is not filtered in that timespan, these are the full last lines).

mqtt_1             | 1624929708: Client DVES_0EB06C closed its connection.
mqtt_1             | 1624929709: New connection from 192.168.20.24:52575 on port 8883.
mqtt_1             | 1624929709: New client connected from 192.168.20.24:52575 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1624931151: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1             | 1624931361: Client DVES_0EB06C closed its connection.
mqtt_1             | 1624931361: New connection from 192.168.20.24:62007 on port 8883.
mqtt_1             | 1624931361: New client connected from 192.168.20.24:62007 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1624932563: Client DVES_0EB06C closed its connection.
mqtt_1             | 1624932563: New connection from 192.168.20.24:54463 on port 8883.
mqtt_1             | 1624932563: New client connected from 192.168.20.24:54463 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1624932952: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1             | 1624933194: Client DVES_0EB06C closed its connection.
mqtt_1             | 1624933195: New connection from 192.168.20.24:59226 on port 8883.
mqtt_1             | 1624933195: New client connected from 192.168.20.24:59226 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1624933766: Client DVES_0EB06C closed its connection.
mqtt_1             | 1624933766: New connection from 192.168.20.24:62665 on port 8883.
mqtt_1             | 1624933766: New client connected from 192.168.20.24:62665 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1624934067: Client DVES_0EB06C closed its connection.
mqtt_1             | 1624934067: New connection from 192.168.20.24:62065 on port 8883.
mqtt_1             | 1624934067: New client connected from 192.168.20.24:62065 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1624934668: Client DVES_0EB06C closed its connection.
mqtt_1             | 1624934668: New connection from 192.168.20.24:52231 on port 8883.
mqtt_1             | 1624934669: New client connected from 192.168.20.24:52231 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1624934753: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1             | 1624935750: Client DVES_0EB06C closed its connection.
mqtt_1             | 1624935751: New connection from 192.168.20.24:59551 on port 8883.
mqtt_1             | 1624935751: New client connected from 192.168.20.24:59551 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1624935841: Client DVES_0EB06C closed its connection.
mqtt_1             | 1624935842: New connection from 192.168.20.24:59420 on port 8883.
mqtt_1             | 1624935842: New client connected from 192.168.20.24:59420 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').

Mosquitto Configuration file

persistence true
persistence_location /mosquitto/data
log_dest stdout
allow_anonymous false
password_file /mosquitto/config/passwd
listener 8883
certfile /mosquitto/config/tls.crt
keyfile /mosquitto/config/tls.key
acl_file /mosquitto/config/acl

Server Version 2.0.10.

user_config_override.h (I've stripped most comments, for improved readability)

#undef USE_WS2812                              // WS2812 Led string using library NeoPixelBus (+5k code, +1k mem, 232 iram) - Disable by //
#undef USE_MY92X1                               // Add support for MY92X1 RGBCW led controller as used in Sonoff B1, Ailight and Lohas
#undef USE_SM16716                              // Add support for SM16716 RGB LED controller (+0k7 code)
#undef USE_SM2135                               // Add support for SM2135 RGBCW led control as used in Action LSC (+0k6 code)
#undef USE_SONOFF_L1

#undef USE_I2C
#undef USE_SPI

#undef USE_MHZ19                                // Add support for MH-Z19 CO2 sensor (+2k code)
#undef USE_SENSEAIR                             // Add support for SenseAir K30, K70 and S8 CO2 sensor (+2k3 code)
#undef USE_PMS5003                              // Add support for PMS5003 and PMS7003 particle concentration sensor (+1k3 code)
#undef USE_NOVA_SDS                             // Add support for SDS011 and SDS021 particle concentration sensor (+1k5 code)
#undef USE_SERIAL_BRIDGE                        // Add support for software Serial Bridge (+0k8 code)

#undef USE_DHT
#undef USE_IR_REMOTE

#undef USE_SR04
#undef USE_HX711

#define USE_JAVASCRIPT_ES6 
#undef USE_EMULATION_HUE
#undef USE_EMULATION_WEMO

#undef USE_DISCOVERY

#undef USE_TIMERS

#undef USE_SONOFF_RF                            // Add support for Sonoff Rf Bridge (+3k2 code)
#undef USE_SONOFF_SC                            // Add support for Sonoff Sc (+1k1 code)
#undef USE_ARMTRONIX_DIMMERS                    // Add support for Armtronix Dimmers (+1k4 code)
#undef USE_PS_16_DZ                             // Add support for PS-16-DZ Dimmer (+2k code)
#undef USE_SONOFF_IFAN                          // Add support for Sonoff iFan02 and iFan03 (+2k code)
#undef USE_BUZZER                               // Add support for a buzzer (+0k6 code)
#undef USE_ARILUX_RF                            // Add support for Arilux RF remote controller (+0k8 code, 252 iram (non 2.3.0))
#undef USE_EXS_DIMMER                           // Add support for ES-Store WiFi Dimmer (+1k5 code)
#undef USE_DEVICE_GROUPS                        // Add support for device groups (+5k5 code)
#undef USE_PWM_DIMMER                           // Add support for MJ-SD01/acenx/NTONPOWER PWM dimmers (+2k3 code, DGR=0k7)
#undef USE_SHELLY_DIMMER                        // Add support for Shelly Dimmer (+3k code)

#undef USE_DS18x20

#undef SHELLY_FW_UPGRADE
#undef USE_HOME_ASSISTANT
#undef USE_DOMOTICZ
#define USE_WEBSERVER

#define USE_TUYA_MCU                             // Add support for Tuya Serial MCU

#undef USE_SDM72

// - SETTINGS

#define STA_SSID1              "MINE2"                // [Ssid1] Wifi SSID
#define STA_PASS1              "XXX"                // [Password1] Wifi password

#define USE_MQTT_TLS
#define MQTT_PORT   8883

#define MQTT_FINGERPRINT1      0xCB,0x87,0x25,0xB5,0xA8,0xA3,0x65,0xEB,0x39,0x46,0x5F,0x3D,0xA8,0xC5,0x9B,0xC6,0xB1,0xEE,0x6C,0x8A  // [MqttFingerprint1] (auto-learn)
#define MQTT_FINGERPRINT2      0xCB,0x87,0x25,0xB5,0xA8,0xA3,0x65,0xEB,0x39,0x46,0x5F,0x3D,0xA8,0xC5,0x9B,0xC6,0xB1,0xEE,0x6C,0x8A  // [MqttFingerprint2] (auto-learn)

#define MQTT_HOST               "192.168.1.121"
#define MQTT_FULLTOPIC          "unadopted/%prefix%/%topic%/"
#define MQTT_USER              "unadopted"
#define MQTT_PASS              "XXXX"
#define TELE_PERIOD            300               // [TelePeriod] Telemetry (0 = disable, 10 - 3600 seconds)
#define TELE_ON_POWER          true             // [SetOption59] send tele/STATE together with stat/RESULT (false = Disable, true = Enable)
#define WEB_PASSWORD            "XXX"              // [WebPassword] Web server Admin mode Password for WEB_USERNAME (empty string = Disable)
#define OTA_URL                "CUSTOM_URL"  // [OtaUrl]
#define NTP_SERVER1             "fritz.box" // [NtpServer1] Select first NTP server by name or IP address
#define CODE_IMAGE_STR "tasmota 20.06.21"
ascillato commented 3 years ago

Hi,

Seems to be more a Wi-Fi issue rather than MQTT.

Try to do a reset 3 in the console to recalibrate the Wi-Fi Radio.

markus-seidl commented 3 years ago

I noticed that Tasmota reports

{"Info3":{"RestartReason":{"Exception":0,"Reason":"Blocked Loop","EPC":["00000000","00000000","00000000"],"EXCVADDR":"00000000","DEPC":"00000000","CallChain":["40201604","401007f4","4023847a","40263ac8","4023902e","40263ad5","40263b1a","40235c44","40000f49","40000f49","40000e19","401055a1","401055a7","4010000d","4025c564","4025c515","401007f4","40105d41","4000050c","4010315f","40103144","401007f4","40102619","401021e8","4010549b","40102d7f","40102f60","40103422","4000050c","40103144","401007f4"]}}}

twice last night, I don't know if this is related.

This is the missing WebLog 4 of the issue

03:33:40.233 CFG: Saved to flash at F6, Count 2184, Bytes 4096
03:33:49.439 WIF: Checking connection...
03:34:00.654 MQT: Attempting connection...
03:34:00.962 MQT: TLS connected in 304 ms, max ThunkStack used 3700
03:34:00.965 MQT: Server fingerprint: CB 87 25 B5 A8 A3 65 EB 39 46 5F 3D A8 C5 9B C6 B1 EE 6C 8A
03:34:00.967 MQT: Connected
03:34:01.018 MQT: f2/buero/shutter/tele/shutter_0EB06C/LWT = Online (retained)
... more Checking connection...
03:36:30.500 WIF: Checking connection...
03:36:32.750 MQT: Attempting connection...
03:36:33.049 MQT: TLS connected in 294 ms, max ThunkStack used 3700
03:36:33.051 MQT: Server fingerprint: CB 87 25 B5 A8 A3 65 EB 39 46 5F 3D A8 C5 9B C6 B1 EE 6C 8A
03:36:33.053 MQT: Connected
03:36:33.106 MQT: f2/buero/shutter/tele/shutter_0EB06C/LWT = Online (retained)
03:36:33.114 MQT: f2/buero/shutter/cmnd/shutter_0EB06C/POWER = 
03:36:33.117 MQT: Subscribe to f2/buero/shutter/cmnd/shutter_0EB06C/#
03:36:33.130 MQT: Subscribe to f2/buero/shutter/cmnd/tasmotas/#
03:36:33.139 MQT: Subscribe to cmnd/DVES_0EB06C_fb/#
03:36:34.408 MQT: f2/buero/shutter/tele/shutter_0EB06C/STATE = {"Time":"2021-06-30T03:36:34","Uptime":"0T00:05:12","UptimeSec":312,"Heap":25,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":24,"MqttCount":3,"POWER1":"OFF","POWER2":"OFF","Wifi":{"AP":1,"SSId":"Germain","BSSId":"68:D7:9A:45:B3:32","Channel":11,"Mode":"11n","RSSI":76,"Signal":-62,"LinkCount":1,"Downtime":"0T00:00:05"}}
03:36:34.448 MQT: f2/buero/shutter/tele/shutter_0EB06C/SENSOR = {"Time":"2021-06-30T03:36:34","Switch1":"OFF","Switch2":"OFF","Shutter1":{"Position":96,"Direction":0,"Target":96}}
03:36:50.477 WIF: Checking connection...
03:37:10.735 WIF: Checking connection...
03:37:30.720 WIF: Checking connection...
03:37:33.979 MQT: Attempting connection...
03:37:34.279 MQT: TLS connected in 298 ms, max ThunkStack used 3700
03:37:34.282 MQT: Server fingerprint: CB 87 25 B5 A8 A3 65 EB 39 46 5F 3D A8 C5 9B C6 B1 EE 6C 8A
03:37:34.284 MQT: Connected
03:37:34.345 MQT: f2/buero/shutter/tele/shutter_0EB06C/LWT = Online (retained)

I've executed now a RESET 3 and am monitoring the results.

markus-seidl commented 3 years ago

RESET 3 hasn't brought any relief. The connection between MQTT and Tasmota is still flaky and only working causally. The single 9.5 device is the only one that produces the reconnect logs over 4h (with around ~20 9.3 devices on the same MQTT server).

mqtt_1             | 1625097207: New connection from 192.168.20.24:61446 on port 8883.
mqtt_1             | 1625097207: New client connected from 192.168.20.24:61446 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1625097478: Client DVES_0EB06C closed its connection.
mqtt_1             | 1625097478: New connection from 192.168.20.24:60074 on port 8883.
mqtt_1             | 1625097478: New client connected from 192.168.20.24:60074 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1625098079: Client DVES_0EB06C closed its connection.
mqtt_1             | 1625098079: New connection from 192.168.20.24:64885 on port 8883.
mqtt_1             | 1625098080: New client connected from 192.168.20.24:64885 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1625098380: Client DVES_0EB06C closed its connection.
mqtt_1             | 1625098381: New connection from 192.168.20.24:53598 on port 8883.
mqtt_1             | 1625098381: New client connected from 192.168.20.24:53598 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1625098644: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1             | 1625100445: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1             | 1625100483: Client DVES_0EB06C closed its connection.
mqtt_1             | 1625100484: New connection from 192.168.20.24:52636 on port 8883.
mqtt_1             | 1625100484: New client connected from 192.168.20.24:52636 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1625101385: Client DVES_0EB06C closed its connection.
mqtt_1             | 1625101385: New connection from 192.168.20.24:52881 on port 8883.
mqtt_1             | 1625101385: New client connected from 192.168.20.24:52881 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1625102246: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1             | 1625102887: Client DVES_0EB06C closed its connection.
mqtt_1             | 1625102887: New connection from 192.168.20.24:53987 on port 8883.
mqtt_1             | 1625102887: New client connected from 192.168.20.24:53987 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1625104047: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1             | 1625104389: Client DVES_0EB06C closed its connection.
mqtt_1             | 1625104389: New connection from 192.168.20.24:59563 on port 8883.
mqtt_1             | 1625104389: New client connected from 192.168.20.24:59563 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C').
mqtt_1             | 1625105848: Saving in-memory database to /mosquitto/data/mosquitto.db.
mqtt_1             | 1625107543: Client DVES_0EB06C closed its connection.
mqtt_1             | 1625107543: New connection from 192.168.20.24:52684 on port 8883.
mqtt_1             | 1625107543: New client connected from 192.168.20.24:52684 as DVES_0EB06C (p2, c1, k30, u'shutter_0EB06C')

The logs in Tasmota haven't changed, it just reports it lost connection:

04:22:38.001 WIF: Checking connection...
04:22:50.323 MQT: Attempting connection...
04:22:50.625 MQT: TLS connected in 298 ms, max ThunkStack used 3700
04:22:50.627 MQT: Server fingerprint: CB 87 25 B5 A8 A3 65 EB 39 46 5F 3D A8 C5 9B C6 B1 EE 6C 8A
04:22:50.629 MQT: Connected
04:22:50.683 MQT: f2/buero/shutter/tele/shutter_0EB06C/LWT = Online (retained)
04:22:50.691 MQT: f2/buero/shutter/cmnd/shutter_0EB06C/POWER = 
04:22:50.694 MQT: Subscribe to f2/buero/shutter/cmnd/shutter_0EB06C/#
04:22:50.703 MQT: Subscribe to f2/buero/shutter/cmnd/tasmotas/#
04:22:50.713 MQT: Subscribe to cmnd/DVES_0EB06C_fb/#
04:22:58.226 WIF: Checking connection...
04:23:18.238 WIF: Checking connection...
04:23:38.754 WIF: Checking connection...
04:23:58.988 WIF: Checking connection...

According to the server it says, that the client has closed the connection. Which sounds to me it's actively doing it. Mosquito has a different log if the server closes the connection (mostly something with "timeout") and/or if a client re-connects without closing the old one ("Client XXX already connected, closing old connection."). Both is not happening with this device.

What else to test? I could try downgrading to 9.3 again to check if the problem goes away. Anything else?

markus-seidl commented 3 years ago

I think I've made an additional discovery: If I leave the web console open, the device is much more responsive to MQTT commands.

It's still the only devices that regularly reports restarts because of exceptions. Is there a way to decode these? Maybe these help.

{"Info3":{"RestartReason":{"Exception":3,"Reason":"Exception","EPC":["4010130d","00000000","00000000"],"EXCVADDR":"40035910","DEPC":"00000000","CallChain":["4010140f","4010140f","40101697","402544f4","402461c8","402461c8","4023ebf5","4023ecec","4023f773","40101697","4023f79c","402416bd","401016ce","40245915","40101697","4025da50","4023eab1","4023e8d7","40267efe","40267f0f","40263713","40000f49","40000f49","40000e19","401055a1","401055a7","4010000d","4025c564","4025c515","401007f4","401021e8"]}}}
crllsn commented 3 years ago

Same problem here with 9.x version (both ESP32 and ESP8266 - Sonoff mini, mini R2 and dual R3). Until 8.5.1 everything is ok. If I upgrade to 9.3.1 or 9.4 or 9.5 I experience very slow web interface (almost unusable), mqtt reconnections, unresponsive wall switches. I noted that we have in common the fritz box as router (but the strange thing is that the problem is present only with 9.x version). Can I contribute in some way? (every change of firmware was made via serial interface).

Final-Hawk commented 3 years ago

I have the same issue, orbi rbr50. Keeps on reconnecting. Time it disconnects for varies.

Might not be related to this though as it could be an issue with my mesh wifi.

ascillato commented 3 years ago

... it could be an issue with my mesh wifi.

Maybe. Please, try sleep 0 in your device. Thanks.

crllsn commented 3 years ago

Sleep 0 seems to have improved a bit the mqtt connection, but Dual R3 keeps on reconnecting (v9.3.1). I downgraded all other sonoff to 8.5.1 and they have no problem now (fast and reliable).

arendst commented 2 years ago

Play with command MqttTimeout which is now 4 seconds where it used to be 15 seconds and/or command MqttWifiTimeout which is now 200 ms where it used to be 5000 ms.

markus-seidl commented 2 years ago

I don't know what helped on my side, I did a few RESET 3, also a restart of the MQTT Broker. Since a few days I cannot reproduce the problem any more, without a clear indicator what was the culprit.

From my point of view I would close the defect, feel free to reopen it if there are others that have the same problem.

Also the MqttTimeout and MqttWifiTimeout sound a reasonable advice to play around with, that's something I'll do if it happens again. I'll try to report it if I encounter that problem again.

Thanks for you help and time!

NielsPiersma commented 2 years ago

@markus-seidl, I have about 35 sonoffs (D1 dimmers, R2 mini , 4ch pro) Ans since I've updated from 9.4.0 to 9.5.0 I've multiple D1 not connecting to wifi (stable) anymore. There must have been some changes that seriously disrupt the wifi. Coverage can't be a problem as I have multiple AP's per floor and signal status value is never under 62%. I will likely flash the problematic devices back to 4.2.0.

Niels

markus-seidl commented 2 years ago

Thanks for informing me @NielsPiersma , but I'm an user too, not a committer.

You could try what @arendst said (he also highlighted the changes).

NielsPiersma commented 2 years ago

@markus-seidl , just responded as you were the first to notice :).

I was able to " improve" my wifi by killing syslog. I was syslogging from all my tastmota devices to a central syslog server.

I group disabled logging by issuing the following to the topic " cmnd/tasmotas/backlog "

payload: "SerialLog 0; loghost 1; syslog 0; SLEEP 0; MqttTimeout 15; MqttWifiTimeout 5000; reset 3"

This killed all logging and set the parameters as reported by @arendst .

So I think the issue has more to do with syslog than with wifi...

NIels

heisenberg2980 commented 2 years ago

Play with command MqttTimeout which is now 4 seconds where it used to be 15 seconds and/or command MqttWifiTimeout which is now 200 ms where it used to be 5000 ms.

Does anybody know why the default value for MqttTimeout was reduced from 15 to 4 seconds? I doesn´t really make sense to me.

After updating all my tasmota devices to latest version (10.1.0) I was getting a lot mqtt disconnections, and some of them were "Client DVES_xxxxxx has exceeded timeout, disconnecting" (others are "Socket error on client DVES_xxxxxxx, disconnecting", still trying to figure out what is wrong there) and I fixed them by changing the MqttTimeout from 4 to 60 seconds, shouldn´t the default value for MqttTimeout be 15 seconds at least as it was before?