emsesp / EMS-ESP32

ESP32 firmware to read and control EMS and Heatronic compatible equipment such as boilers, thermostats, solar modules, and heat pumps
https://emsesp.github.io/docs
GNU Lesser General Public License v3.0
579 stars 101 forks source link

MQTT disconnects after ~3hrs on ESP32 #19

Closed proddy closed 3 years ago

proddy commented 4 years ago

Should i make a new issue for the esp32-mqtt-disconnects? I have no idea what causes it. What i know about this issue:

since mqtt reconnects immediately it does not harm much.

Originally posted by @MichaelDvP in https://github.com/proddy/EMS-ESP/issues/460#issuecomment-683403367

proddy commented 4 years ago

@MichaelDvP I'll run some tests too.

One thing worth trying is whether it still crashes when the ESP32 is standalone and USB powered (only the heartbeat is sent)

MichaelDvP commented 4 years ago

Tried the usb-powered standalone in parallel, connected to a different mqtt broker, only heartbeat and syslog. But same disconnects (half a minute delayed to the other, but i think thats due to AP-settings. Standalone: AP until connect, other AP off.)

Standalone:

Aug 30 16:20:25 ems-esp2 - 000+00:00:00.000 I 0: [syslog] Log level set to INFO
Aug 30 16:20:25 ems-esp2 - 000+00:00:00.000 I 1: [emsesp] EMS Device library loaded with 70 records
Aug 30 16:20:25 ems-esp2 - 000+00:00:30.521 I 2: [mqtt] MQTT connected
Aug 30 19:40:56 ems-esp2 - 000+03:21:01.910 I 3: [mqtt] MQTT disconnected: TCP
Aug 30 19:40:57 ems-esp2 - 000+03:21:02.930 I 4: [mqtt] MQTT connected
Aug 30 23:01:28 ems-esp2 - 000+06:41:34.243 I 5: [mqtt] MQTT disconnected: TCP
Aug 30 23:01:29 ems-esp2 - 000+06:41:35.262 I 6: [mqtt] MQTT connected
Aug 31 02:22:00 ems-esp2 - 000+10:02:06.794 I 7: [mqtt] MQTT disconnected: TCP
Aug 31 02:22:01 ems-esp2 - 000+10:02:07.812 I 8: [mqtt] MQTT connected
Aug 31 05:42:33 ems-esp2 - 000+13:22:39.386 I 9: [mqtt] MQTT disconnected: TCP
Aug 31 05:42:34 ems-esp2 - 000+13:22:40.408 I 10: [mqtt] MQTT connected

Working

Aug 30 19:13:10 ems - 000+00:00:00.000 I 0: [syslog] Log level set to INFO
Aug 30 19:13:10 ems - 000+00:00:00.000 I 1: [emsesp] EMS Device library loaded with 70 records
Aug 30 19:13:10 ems - 000+00:00:00.000 E 2: [telegram] Rx: F5 B5 7D 55 FD FD 15 (CRC 15 != 94)
Aug 30 19:13:10 ems - 000+00:00:00.566 I 3: [mqtt] MQTT connected
Aug 30 19:13:14 ems - 000+00:00:04.513 E 4: [telegram] Last Tx Read operation failed after 3 retries. Ignoring request.
Aug 30 21:41:51 ems - 000+02:28:41.989 E 5: [telegram] Rx: 19 00 AF 00 00 DC 00 B5 (CRC B5 != BD)
Aug 30 22:33:42 ems - 000+03:20:32.630 I 6: [mqtt] MQTT disconnected: TCP
Aug 30 22:33:43 ems - 000+03:20:33.648 I 7: [mqtt] MQTT connected
Aug 30 22:41:11 ems - 000+03:28:01.251 E 8: [telegram] Rx: 19 00 AB 00 00 DD 00 BF (CRC BF != FF)
Aug 31 01:54:14 ems - 000+06:41:05.085 I 9: [mqtt] MQTT disconnected: TCP
Aug 31 01:54:15 ems - 000+06:41:06.103 I 10: [mqtt] MQTT connected
Aug 31 03:45:58 ems - 000+08:32:49.164 E 11: [telegram] Rx: BF 0C 08 00 34 00 0A 01 F9 7D 00 20 00 00 03 00 01 19 45 00 18 B8 1F (CRC 1F != 4D)
Aug 31 05:14:47 ems - 000+10:01:37.572 I 12: [mqtt] MQTT disconnected: TCP
Aug 31 05:14:48 ems - 000+10:01:38.591 I 13: [mqtt] MQTT connected
proddy commented 4 years ago

I'm running some tests too on an ESP32, USB powered and no EMS bus attached. I'll see if it croaks after 3hrs. BTW MQTT is now part of the Espressif ESP32 core (https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/protocols/mqtt.html). Not that we should re-write everything, but nice to know.

proddy commented 4 years ago

Been running for 5hrs, no MQTT disconnects. ESP32 (https://docs.wemos.cc/en/latest/d32/d32.html) with firmware 2.0.1b1, USB power, no gateway/ems attached. Using mosquitto on Ubuntu.

MichaelDvP commented 4 years ago

Was syslog activated? I tried also usb powered, no connection to EMS, with syslog debug. After your post i setup a mosquitto on a rapberry and connect mqtt to this. My Hardware is a MH-ET D1 Mini 32 Edit: and this test was with version v2b11 (before the cmd-build).

Starting on my default mqtt broker:

Aug 31 13:08:39 ems-esp2 - 000+00:00:00.000 I 0: [syslog] Log level set to DEBUG
Aug 31 13:08:39 ems-esp2 - 000+00:00:00.000 D 1: [telegram] Tx read request to device 0x08 for type ID 0x07
Aug 31 13:08:39 ems-esp2 - 000+00:00:04.221 D 2: [emsesp] WiFi Connected with IP=192.168.0.23, hostname=ems-esp2
Aug 31 13:08:39 ems-esp2 - 000+00:00:04.242 I 3: [mqtt] MQTT connected
Aug 31 13:08:39 ems-esp2 - 000+00:00:04.242 D 4: [mqtt] Publishing topic ems-esp2/heartbeat (#00, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 13:08:39 ems-esp2 - 000+00:00:04.443 D 5: [mqtt] Publishing topic ems-esp2/info (#01, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 13:08:40 ems-esp2 - 000+00:00:04.644 D 6: [mqtt] Publishing topic ems-esp2/status (#02, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 13:08:40 ems-esp2 - 000+00:00:04.845 D 7: [mqtt] Subscribing to topic: ems-esp2/cmd
Aug 31 13:09:36 ems-esp2 - 000+00:01:01.529 D 8: [mqtt] Publishing topic ems-esp2/heartbeat (#04, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 13:10:36 ems-esp2 - 000+00:02:01.628 D 9: [mqtt] Publishing topic ems-esp2/heartbeat (#05, attempt emsesp/EMS-ESP#1, pid 1)
...
Aug 31 14:27:36 ems-esp2 - 000+01:19:01.716 D 99: [mqtt] Publishing topic ems-esp2/heartbeat (#85, attempt emsesp/EMS-ESP#1, pid 1)

after 1h20 i change the mqtt setting to the mosquitto

Aug 31 14:28:16 ems-esp2 - 000+01:19:41.125 I 100: [mqtt] MQTT disconnected: TCP
Aug 31 14:28:16 ems-esp2 - 000+01:19:41.172 I 101: [mqtt] MQTT connected
Aug 31 14:28:16 ems-esp2 - 000+01:19:41.314 D 102: [mqtt] Publishing topic ems-esp2/info (#86, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 14:28:16 ems-esp2 - 000+01:19:41.515 D 103: [mqtt] Publishing topic ems-esp2/status (#87, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 14:28:16 ems-esp2 - 000+01:19:41.716 D 104: [mqtt] Subscribing to topic: ems-esp2/cmd
Aug 31 14:28:36 ems-esp2 - 000+01:20:01.616 D 105: [mqtt] Publishing topic ems-esp2/heartbeat (#89, attempt emsesp/EMS-ESP#1, pid 1)
...
Aug 31 16:27:36 ems-esp2 - 000+03:19:01.758 D 224: [mqtt] Publishing topic ems-esp2/heartbeat (#208, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 16:28:36 ems-esp2 - 000+03:20:01.857 D 225: [mqtt] Publishing topic ems-esp2/heartbeat (#209, attempt emsesp/EMS-ESP#1, pid 1)

and after 3h20 from start without doing something:

Aug 31 16:29:12 ems-esp2 - 000+03:20:37.028 I 226: [mqtt] MQTT disconnected: TCP
Aug 31 16:29:17 ems-esp2 - 000+03:20:42.086 I 227: [mqtt] MQTT connected
Aug 31 16:29:17 ems-esp2 - 000+03:20:42.086 D 228: [mqtt] Publishing topic ems-esp2/info (#210, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 16:29:17 ems-esp2 - 000+03:20:42.287 D 229: [mqtt] Publishing topic ems-esp2/status (#211, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 16:29:17 ems-esp2 - 000+03:20:42.488 D 230: [mqtt] Subscribing to topic: ems-esp2/cmd
Aug 31 16:29:36 ems-esp2 - 000+03:21:01.784 D 231: [mqtt] Publishing topic ems-esp2/heartbeat (#213, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 16:30:36 ems-esp2 - 000+03:22:01.883 D 232: [mqtt] Publishing topic ems-esp2/heartbeat (#214, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 16:31:36 ems-esp2 - 000+03:23:01.781 D 233: [mqtt] Publishing topic ems-esp2/heartbeat (#215, attempt emsesp/EMS-ESP#1, pid 1)
Aug 31 16:32:36 ems-esp2 - 000+03:24:01.880 D 234: [mqtt] Publishing topic ems-esp2/heartbeat (#216, attempt emsesp/EMS-ESP#1, pid 1)

The log from mosquitto:

1594380036: mosquitto version 1.5.7 starting
1594380036: Config loaded from /etc/mosquitto/mosquitto.conf.
1594380036: Opening ipv4 listen socket on port 1883.
1594380036: Opening ipv6 listen socket on port 1883.
1598876896: New connection from 192.168.0.23 on port 1883.
1598876896: New client connected from 192.168.0.23 as ems-esp2 (c1, k120).
1598878410: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1598880212: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1598882013: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1598883813: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1598884152: Socket error on client ems-esp2, disconnecting.
1598884157: New connection from 192.168.0.23 on port 1883.

I'll flash the precompiled travis-bin, set syslog off and test again.

proddy commented 4 years ago

I had syslog on. I'll run some more tests.

MichaelDvP commented 4 years ago

No luck with the travis bin. No EMS, no syslog, no NTP, QoS 0, no username/password, only mqtt-heartbeat to mosquitto on rasperry 3b (rasbian buster) and this raspberry runs nothing but mosquitto. mosquitto.log:

1598886218: New client connected from 192.168.0.23 as ems-esp2 (c1, k120).
1598887416: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1598889216: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1598891017: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1598892818: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1598894620: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1598896420: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1598898221: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1598898251: Socket error on client ems-esp2, disconnecting.
1598898252: New connection from 192.168.0.23 on port 1883.
1598898252: New client connected from 192.168.0.23 as ems-esp2 (c1, k120).
proddy commented 4 years ago

If the heartbeat is disabled (i.e. no Publishing) does the same thing happen?

I think we need to create a new branch with a stripped-down version of EMS-ESP with nothing except the Mqtt library and the heartbeat. No web, console etc.

proddy commented 4 years ago

Also Pablo2048 has an updated version of the AsyncMQTT Library with the PRs 156, 157, 159 and 180 applied. Also he modified it to use the TCP_FLAG_WRITE_COPY and says it solves a lot of the problems he was seeing. Worth a try?

async-mqtt-client (1).zip

He also suggests to modify the ESPAsyncTCP library with correct 'cansend' version from here: https://github.com/glmnet/ESPAsyncTCP/tree/patch-cansend

MichaelDvP commented 4 years ago

Worth a try? async-mqtt-client (1).zip

Tried, get some warnings and errors when compiling, but can fix it. Now 3h20 later i got the same reconnect-error as before. I go back to standard lib.

He also suggests to modify the ESPAsyncTCP library

This is esp8266 only, but 8266 is working for me without any error.

I give up searching for the cause, the reconnects do not really bother. Maybe with the next core update they disappear.

proddy commented 4 years ago

Then lets close this and re-open if others have similar issues. In my testing I wasn't able to reproduce the MQTT disconnects