256dpi / arduino-mqtt

MQTT library for Arduino
MIT License
1.01k stars 230 forks source link

Unreliable behaviour with reconnections #301

Open magmilo opened 1 year ago

magmilo commented 1 year ago

Hi, first of all thank you for creating and sharing this library.

I'm facing a problem using the arduino-mqtt library in combination with SSLClient and WiFi.h WiFiClient to connect to a HiveMQ server. The goal is to get a reliable reconnection setup of the client going using Arduino's Connection Handler to detect network disconnects. When starting the application I can publish messages without any issues. The frequency of published messages is once every two seconds, and the MQTT QoS is set to 1. As soon as the WiFi connection is lost, one of four things occur, seemingly random:

  1. The client keeps trying to publish messages, and reports a publish success.
  2. The client keeps trying to publish a message and reports a publish failure.
  3. The board stops responding to input and also doesn't print Serial outputs anymore.
  4. The connection handler recognizes the network disconnect and I stop trying to publish messages until the connection has been reestablished.

The development board in use is a Arduino Portenta H7.

The codebase in use is quite large, but I will try to replicate the behaviour using a minimal example to post the code in the following days. I'd appreciate any hints or proposals for a better solution.

magmilo commented 1 year ago

I created a minimal (working) example for the Arduino IDE:

#include <Arduino_ConnectionHandler.h>
#include <WiFi.h>
#include <SSLClient.h>
#include <MQTT.h>
#include "certificates.h"

WiFiClient tcpClient;
SSLClient sslClient = SSLClient(tcpClient, TAs, (size_t)TAs_NUM, A0, SSLClient::SSL_WARN);
MQTTClient* mqttClient = new MQTTClient(192);
WiFiConnectionHandler wifiConnectionHandler(ssid, password);
int lastPublish = 0;

void setup() {
  wifiConnectionHandler.addCallback(NetworkConnectionEvent::CONNECTED, onNetworkConnect);
  wifiConnectionHandler.addCallback(NetworkConnectionEvent::DISCONNECTED, onNetworkDisconnect);
  wifiConnectionHandler.addCallback(NetworkConnectionEvent::ERROR, onNetworkError);
  mqttClient->begin(HiveMQ_URL, 8883, sslClient);
}

void loop() {
  wifiConnectionHandler.check();
  if (mqttClient->connected()) {
    mqttClient->loop();
    int currentMillis = millis();
    if (currentMillis - lastPublish > 10000) {
      mqttClient->publish("topic", "log");
      lastPublish = currentMillis;
    }
  }
}

void onNetworkConnect() {
  Serial.println(">>>> CONNECTED to network");
  mqttClient->connect(clientId, username, password);
}

void onNetworkDisconnect() {
  Serial.println(">>>> DISCONNECTED from network");
}

void onNetworkError() {
  Serial.println(">>>> ERROR");
}

To make the example work, a certificates.h needs to be generated from the SSLClient library. If you are having problems with this and using HiveMQ to test, I can provide you with my generated file.

Hope someone can help me figure this out :)

Decezaris commented 1 year ago

Same problem here, I'm using TinyGSM with qos = 0 everithing works fine, even in reconnections... when I change to qos = 1 and a disconnection occurs, it becomes very unstable.

obs.: I'm using AWS IoT Core broker.

Decezaris commented 1 year ago

Hi @magmilo , updates here!

I solved this problem here for qos1 at least, because AWS IoT Core has no support for qos2, the solution was holding the next publications depending on PUBACK. I think it's something related to how to handle the PUBACK from broker, but doing that worked fine here. Including prepareDuplicated function with this adapted solution (#293) , everything works great!

@256dpi First, congratulations for this excellent job, if you can add this to the Readme I think it will help a lot!

Another thing that you could add is that your library can be installed with arduino-cli named as MQTT

Hi @256dpi I think I have a "kind of bug" in the way the duplicate is handled.

If you use the new function prepareDuplicate(duplicateID) it will publish as duplicate OK. But internally, it will not reset the class variable nextDupPacketID after publishing, thus every packet published afterwars will be marked as duplicate with the same ID unless you call prepareDuplicate(0) . Maybe it should be a good idea to reset nextDupPacketID after publish (successful or not)?