fredlcore / BSB-LAN

LAN/WiFi interface for Boiler-System-Bus (BSB) and Local Process Bus (LPB) and Punkt-zu-Punkt Schnittstelle (PPS) with a Siemens® controller used by Elco®, Brötje® and similar heating systems
216 stars 83 forks source link

[BUG] Device hangs for some seconds on socket error with MQTT #638

Closed jbaudoux closed 3 months ago

jbaudoux commented 3 months ago

BSB-LAN Version master BSB-LAN version: 3.4.1-20240313181534

Architecture ESP32 WT32-ETH01

Bus system BSB but not relevant

Describe the bug The loop contain some high delay causing the device to hang for some seconds when such delay is reached

The main loop keeps checking the socket by calling in mqtt_handler MQTTPubSubClient->connected() This calls: https://github.com/espressif/arduino-esp32/blob/master/libraries/WiFi/src/WiFiClient.cpp#L549 It returns sometimes a socket error errno 128 Transport endpoint is not connected

The Ethernet connection does not stop, it is still connected. Only the socket fails.

Identified causes:

PubSubClient implementation of synchronous socket connection

mqtt_connect is called in main loop and can cause hanging. If the socket connection takes 3 seconds to establish, nothing else will happen.

That's a design issue of PubSubClient and unless Async-mqtt-client is used instead, this cannot be solved.

We have to rely on fast responding mqtt server which we can assume to be the case.

MQTT Connection retry & delay

If the socket cannot be established, the mqtt handler will hang the main loop with 3 retries of 1sec here https://github.com/fredlcore/BSB-LAN/blob/master/BSB_LAN/include/mqtt_handler.h#L223

This retry must be dropped. Next attempt will occur on next call to loop. To not flood the TCP connection with 0 length packets if server is not responding, a retry delay should be reached before retrying

To Reproduce Start device and monitor log

Log files - Bug reports without log files will be closed

Published status 'online' to topic 'BSB2/status'
[ 38796][D][WiFiClient.cpp:536] connected(): Disconnected: RES: 0, ERR: 128
Client ID: BSB-LAN2
Will topic: BSB2/status
[ 38797][E][WiFiGeneric.cpp:1584] hostByName(): DNS Failed for 
Failed to connect to MQTT broker, retrying...
[ 39801][E][WiFiGeneric.cpp:1584] hostByName(): DNS Failed for 
Failed to connect to MQTT broker, retrying...
[ 40801][E][WiFiGeneric.cpp:1584] hostByName(): DNS Failed for 
Failed to connect to MQTT broker, retrying...
Client ID: BSB-LAN2
Will topic: BSB2/status
Connect to MQTT broker, updating will topic

Expected behavior Try to connect without blocking on failure

Additional context Maybe this is caused by an unstable USB power supply. I'll try to improve connection by adding a small capacitor on power supply.

fredlcore commented 3 months ago

The bug report has a clearly defined structure and information required. You decided to remove several parts of it. Why?

fredlcore commented 3 months ago

Solved due to your PR.