LAN/WiFi interface for Boiler-System-Bus (BSB) and Local Process Bus (LPB) and Punkt-zu-Punkt Schnittstelle (PPS) with a Siemens® controller used by Elco®, Brötje® and similar heating systems
216
stars
83
forks
source link
[BUG] Device hangs for some seconds on socket error with MQTT #638
This retry must be dropped. Next attempt will occur on next call to loop. To not flood the TCP connection with 0 length packets if server is not responding, a retry delay should be reached before retrying
To Reproduce
Start device and monitor log
Log files - Bug reports without log files will be closed
Published status 'online' to topic 'BSB2/status'
[ 38796][D][WiFiClient.cpp:536] connected(): Disconnected: RES: 0, ERR: 128
Client ID: BSB-LAN2
Will topic: BSB2/status
[ 38797][E][WiFiGeneric.cpp:1584] hostByName(): DNS Failed for
Failed to connect to MQTT broker, retrying...
[ 39801][E][WiFiGeneric.cpp:1584] hostByName(): DNS Failed for
Failed to connect to MQTT broker, retrying...
[ 40801][E][WiFiGeneric.cpp:1584] hostByName(): DNS Failed for
Failed to connect to MQTT broker, retrying...
Client ID: BSB-LAN2
Will topic: BSB2/status
Connect to MQTT broker, updating will topic
Expected behavior
Try to connect without blocking on failure
Additional context
Maybe this is caused by an unstable USB power supply. I'll try to improve connection by adding a small capacitor on power supply.
BSB-LAN Version master BSB-LAN version: 3.4.1-20240313181534
Architecture ESP32 WT32-ETH01
Bus system BSB but not relevant
Describe the bug The
loop
contain some high delay causing the device to hang for some seconds when such delay is reachedThe main
loop
keeps checking the socket by calling inmqtt_handler
MQTTPubSubClient->connected()
This calls: https://github.com/espressif/arduino-esp32/blob/master/libraries/WiFi/src/WiFiClient.cpp#L549 It returns sometimes a socket errorerrno 128 Transport endpoint is not connected
The Ethernet connection does not stop, it is still connected. Only the socket fails.
Identified causes:
PubSubClient implementation of synchronous socket connection
mqtt_connect
is called in mainloop
and can cause hanging. If the socket connection takes 3 seconds to establish, nothing else will happen.That's a design issue of PubSubClient and unless Async-mqtt-client is used instead, this cannot be solved.
We have to rely on fast responding mqtt server which we can assume to be the case.
MQTT Connection retry & delay
If the socket cannot be established, the mqtt handler will hang the main
loop
with 3 retries of 1sec here https://github.com/fredlcore/BSB-LAN/blob/master/BSB_LAN/include/mqtt_handler.h#L223This retry must be dropped. Next attempt will occur on next call to
loop
. To not flood the TCP connection with 0 length packets if server is not responding, a retry delay should be reached before retryingTo Reproduce Start device and monitor log
Log files - Bug reports without log files will be closed
Expected behavior Try to connect without blocking on failure
Additional context Maybe this is caused by an unstable USB power supply. I'll try to improve connection by adding a small capacitor on power supply.