Closed tbnobody closed 1 year ago
I don't know what could cause this. I have limited internet access at the moment so can't do much now.
One thing you can do to improve your code though is to do something with the return value of the connect-call. It returns true is successful of false if not. I don't think this wil solve the issue though.
I'll have a look when I'm back at my computer but I'm very busy with something else now.
Looking back at the connect-code, it can be improved though:
I'll have a look when I'm back at my computer but I'm very busy with something else now.
Not a problem. Take your time!
I will also try to generate a sample code with much less application code around to prevent any other issues in my code.
I can also help out here. @MichaelDvP ported our mqtt code for the EMS-ESP project from asyncmqttclient
to espMqttClient
and so far it's been great. We're still stress testing but when it goes into the next stable build of our project it will be used by thousands of end users, so a good source of user testing :)
@tbnobody You reconnect (line 79) only if you get the onDisconnect callback, but you only call connect() without checkiing if it connects. if connect() fails you ran into this situation. I've mainly used the code from examples and poll connection state in the loop every 2 seconds. Try to trigger the reconnect also in line 127 to
if (!mqttClient->connect()) {
mqttReconnectTimer.once( 2, +[](MqttSettingsClass* instance) { instance->performConnect(); }, this);
}
But i think polling the state is better.
Thank you for this hint! Will doublecheck later!
My current assumption was, that in case of a failure I will receive a onDisconnect event. If this is not the case, then indeed the polling method (or checking the return value of the connect method) is required.
The connect()
tries to send out a connect packet, if the packet is created it returns true, state goes to "connectingTCP1" and you'll get the the onDisconnect
if the connAck is missing. But if connect packet creation failed state is still "disconnected" and connect()
returns false. I think this happens to you. See line103-131
That would indicate a memory issue though. And in that case leaks are the first thing to wacht out for. (I valgrinded this lib but not each and every scenario)
I've just added some debug output in case that the connect method returns false. Now I am waiting... I am courious if this is the cause.
The underlying WiFiClient doesn't cleanup immediately after failing/closing so memory could become an issue. Eventually it recovers though.
Just a small response of my latest tests... It took ~8h before the reconnect stopped working. The connect()
method did NOT return false
on the last attempt.
My next step would be two write a smaller example and try to reproduce it there.
- the return value has to be initialized to false and set to true only when successful (bug)
- state has to be set before resuming the task (just-in-case-improvement)
Could you test with these changes? I can't prepare a PR or branch myself. Still away from computer.
I have just applied this patch:
diff --git a/src/MqttClient.cpp b/src/MqttClient.cpp
index bae592e..256fc2f 100644
--- a/src/MqttClient.cpp
+++ b/src/MqttClient.cpp
@@ -101,7 +101,7 @@ bool MqttClient::disconnected() const {
}
bool MqttClient::connect() {
- bool result = true;
+ bool result = false;
if (_state == State::disconnected) {
EMC_SEMAPHORE_TAKE();
if (_addPacketFront(_cleanSession,
@@ -114,17 +114,17 @@ bool MqttClient::connect() {
_willPayloadLength,
(uint16_t)(_keepAlive / 1000), // 32b to 16b doesn't overflow because it comes from 16b orignally
_clientId)) {
+ result = true;
+ _state = State::connectingTcp1;
#if defined(ARDUINO_ARCH_ESP32)
if (_useInternalTask == espMqttClientTypes::UseInternalTask::YES) {
vTaskResume(_taskHandle);
}
#endif
- _state = State::connectingTcp1;
} else {
EMC_SEMAPHORE_GIVE();
emc_log_e("Could not create CONNECT packet");
_onError(0, Error::OUT_OF_MEMORY);
- result = false;
}
EMC_SEMAPHORE_GIVE();
}
Now waiting for something to happen (or not) :-)
The reconnect is working since ~24h with this patch. IMHO, the return value does not have any effect on this issue. It should be related to the reordering of the vTaskResume
and _state = State::connectingTcp1;
But in any case, it seems to solve the problem!
That explains why we don't see this issue, we use UseInternalTask::NO
(to save memory), so the state is always set before the loop checks.
The reconnect is working since ~24h with this patch. IMHO, the return value does not have any effect on this issue. It should be related to the reordering of the
vTaskResume
and_state = State::connectingTcp1;
But in any case, it seems to solve the problem!
You're welcome to create a PR. 😉
~40hours working without problems... I will create two PR's as the change of the return value is theoretical a breaking change as it changes the behavior of the connect method when it is called again while the state is still not disconnected
You're right about the breaking change. It was unintended though.
I will merge both and create a new release as soon as I'm back at my computer.
Fixed in https://github.com/bertmelis/espMqttClient/releases/tag/v1.4.4
Please reopen if the fix doesn't hold.
Hi, I had the same observation due to an instable WiFi connection to an ESP32-CAM module. http Server was always reachable after WiFi reconnects, but the MQTT stops after some time.
Then I changed to this 1.4.4. version, but still have this problem. In the http page I added two links for ESP.restart() and for mqttClient.disconnect(). Both are working. But after some time, when MQTT is dead, the mqttClient.disconnect() is not helping.
I want not to open this issue as it is may be another reason, but MQTT is not stable.
Environment:
Dependency Graph
|-- espMqttClient @ 1.4.4
|-- AsyncElegantOTA @ 2.2.7
|-- AsyncTCP-esphome @ 2.0.0
|-- ESPAsyncWebServer-esphome @ 3.0.0
|-- SPIFFS @ 2.0.0
|-- AsyncTCP @ 1.1.1
|-- Ticker @ 2.0.0
|-- WiFi @ 2.0.0
Code snippets:
void onMqttDisconnect(espMqttClientTypes::DisconnectReason reason) {
#ifdef DEBUG
Serial.printf("Disconnected from MQTT: %u.\n", static_cast<uint8_t>(reason));
#endif
if (WiFi.isConnected()) {
reconnectMqtt = true;
lastReconnect = millis();
}
}
// Reconnect MQTT via Webinterface using disconnect() and auto-reconnect from async MQTT
httpserver.on("/MQTTrestart", HTTP_GET, [](AsyncWebServerRequest *request)
{
String htmlBody = htmlHead_pre_b + htmlHead + "<h1>MQTT restarting ...</h1>";
request->send(200, "text/html", htmlBody);
delay(500);
if (!mqttClient.disconnect()) {
#ifdef DEBUG
Serial.println("Disconnecting failed.");
#endif
} else {
#ifdef DEBUG
Serial.println("Disconnection done.");
#endif
}
});
Debugs: After ESP32 restart:
Connecting to MQTT...
Connection done.
Connected to MQTT broker: 192.168.xxx.xxx, port: 1883
************************************************
Session present: 0
Last Will -online- setting- QoS 1/retain false, packetId: 1
Subscribing at QoS 1, packetId: 2
Subscribing at QoS 1, packetId: 3
Subscribing at QoS 1, packetId: 4
Subscribing at QoS 1, packetId: 5
Subscribing at QoS 1, packetId: 6
Subscribing at QoS 1, packetId: 7
Subscribing at QoS 1, packetId: 8
Subscribing at QoS 1, packetId: 9
MQTT Publish succesful: 10
Publish acknowledged.
packetId: 1
Subscribe acknowledged.
packetId: 2
qos: 1
Subscribe acknowledged.
packetId: 3
qos: 1
Subscribe acknowledged.
packetId: 4
qos: 1
Subscribe acknowledged.
packetId: 5
qos: 1
Subscribe acknowledged.
packetId: 6
qos: 1
Subscribe acknowledged.
packetId: 7
qos: 1
Subscribe acknowledged.
packetId: 8
qos: 1
Subscribe acknowledged.
packetId: 9
qos: 1
Publish acknowledged.
packetId: 10
After mqttClient.disconnect():
Disconnection done.
Disconnected from MQTT: 0.
Connecting to MQTT...
Connection done.
Connected to MQTT broker: 192.168.xxx.xxx, port: 1883
************************************************
Session present: 0
Last Will -online- setting- QoS 1/retain false, packetId: 11
Subscribing at QoS 1, packetId: 12
Subscribing at QoS 1, packetId: 13
Subscribing at QoS 1, packetId: 14
Subscribing at QoS 1, packetId: 15
Subscribing at QoS 1, packetId: 16
Subscribing at QoS 1, packetId: 17
Subscribing at QoS 1, packetId: 18
Subscribing at QoS 1, packetId: 19
MQTT Publish succesful: 20
Publish acknowledged.
packetId: 11
Subscribe acknowledged.
packetId: 12
qos: 1
Subscribe acknowledged.
packetId: 13
qos: 1
Subscribe acknowledged.
packetId: 14
qos: 1
Subscribe acknowledged.
packetId: 15
qos: 1
Subscribe acknowledged.
packetId: 16
qos: 1
Subscribe acknowledged.
packetId: 17
qos: 1
Subscribe acknowledged.
packetId: 18
qos: 1
Subscribe acknowledged.
packetId: 19
qos: 1
Publish acknowledged.
packetId: 20
I currently face the problem, that if the broker goes offline the library tries to reconnect for about 1,5h (a second test took about 5 hours until it stopped) and then it stops reconnecting. It seems that the disconnect event is missing. I've implemented it as in your previous examples using a timer in the disconnect event. (Please see code samples below)
If I stop the machine with the broker I get the following output (as expected):
This repeats again and again... If I start the broker quick enough (lets say after 10minutes) everything works fine again. For testing purposes I just let the broker offline and got the following behavior:
Here the output just stops... no more reconnect attempts. The connection state returns false: https://github.com/tbnobody/OpenDTU/blob/9b0d2ff25f48fd39155894351d4069fba5900efa/src/MqttSettings.cpp#L150
Which platform, esp8266 or esp32? ESP32 Do you use TLS or not? NO TLS Do you use an IDE (Arduino, Platformio...)? PlatformIO Which version of the Arduino framework? espressif32@6.3.1
Example code Connect: https://github.com/tbnobody/OpenDTU/blob/9b0d2ff25f48fd39155894351d4069fba5900efa/src/MqttSettings.cpp#L127 Disconnect event: https://github.com/tbnobody/OpenDTU/blob/9b0d2ff25f48fd39155894351d4069fba5900efa/src/MqttSettings.cpp#L52-L81
Other Remarks Not sure why you removed the example with the timer in the disconnect event. I could also add a workaround to check wether the connection state is disconnected and the last connect attempt is older then e.g. 1 minute. I am just not sure what should be the right behavior.