256dpi / arduino-mqtt

MQTT library for Arduino
MIT License
1.02k stars 236 forks source link

Boards hangs on QoS1, OK on QoS0 #184

Closed salvq closed 4 years ago

salvq commented 4 years ago

I have been doing some tests using QoS1 instead of QoS0 and getting boards freeze after 1 msg sent to broker.

I am using Arduino MKR GSM 1400.

  1. When using settings QoS0: no issues sending messages for hours (not being disconnected, no boards freeze...)
  2. When change from QoS0 to QoS1, boards freeze after first msg (msg is delivered & received by broker but then boards hangs up and broker disconnects client, units need be reseted by wdt timer or manually). Taking out If condition does not help either, same result.

Is there any settings wrong ? Msg sizes, buffers, timing etc. ? What to change to debug more ?

Thanks

Code

MQTTClient client(384);

***********************

  client.begin("XXX.azure-devices.net", 8883, sslClient);
  client.setWill(outTopic.c_str(), willMessage);
  client.setOptions(10, true, 1000);

***********************

     StaticJsonDocument<300> doc;  /* Reserve memory space for the Json buffer */

      doc["deviceId"] = deviceId;
      doc["messageId"] = messageCount++;
      doc["yellow"] = arrayData[0];
      doc["green"] = arrayData[1];
      doc["cycle"] = arrayData[2];
      doc["red"] = arrayData[3];
      doc["reject"] = arrayData[4];

      char buffer[256];  /* character string containing the JSON payload */
      serializeJson(doc, buffer);  /* convert to buffer, which is the string to be used as the message */

      Serial.print(">>>client.lastError: ");
      Serial.println(client.lastError());
      Serial.print(">>>client.returnCode: ");
      Serial.println(client.returnCode());

      if (client.publish(outTopic.c_str(), buffer, false, 1)) {
        Serial.println("---Message has been sent");
      } else {
        Serial.println("---Message has failed to send");
      }

      Serial.print("<<<client.lastError: ");
      Serial.println(client.lastError());
      Serial.print("<<<client.returnCode: ");
      Serial.println(client.returnCode());
      Serial.println("");

***********************

QoS0: no issue sending / receiving

11:49:51.748 -> >>>client.lastError: 0
11:49:51.748 -> >>>client.returnCode: 0
11:49:51.884 -> ---Message has been sent
11:49:51.884 -> <<<client.lastError: 0
11:49:51.884 -> <<<client.returnCode: 0
11:49:51.884 -> 
11:50:21.946 -> >>>client.lastError: 0
11:50:21.946 -> >>>client.returnCode: 0
11:50:22.083 -> ---Message has been sent
11:50:22.083 -> <<<client.lastError: 0
11:50:22.083 -> <<<client.returnCode: 0
11:50:22.083 -> 
11:50:52.074 -> >>>client.lastError: 0
11:50:52.074 -> >>>client.returnCode: 0
11:50:52.209 -> ---Message has been sent
11:50:52.209 -> <<<client.lastError: 0
11:50:52.209 -> <<<client.returnCode: 0

QoS1: 1st msg sent, received by broker but boards hangs up / freeze, no more msg

11:44:58.193 -> >>>client.lastError: 0
11:44:58.193 -> >>>client.returnCode: 0
diegoamayaw commented 4 years ago

Does the disconnection happen instantly after sending the message? I have a similar problem where I send a message with QoS 1 and even though the client is disconnected (error code should be -9 or something similar) returns with error code 0. My code is very similar to yours, but I am using an ESP32 board

salvq commented 4 years ago

Yes, it does. I have not debugged more.

I changed to different MQTT client and works perfect with QoS 1, no more freeze or any other issues.

salvq commented 4 years ago

I do not remember exactly the situation however I have had several different scenarios where error codes were not showing status that I expect to show and therefore I would not rely on conditioning with the error codes.

diegoamayaw commented 4 years ago

@salvq Great info, thanks

256dpi commented 4 years ago

Hi everyone! I just checked this with a ESP32 board, the latest Arduino IDE and latest ESP32 board plugin. QoS1 and QoS2 work as expected with "broker.shiftr.io" and "mqtt.eclipse.org". Despite the slow network I'm currently using, I see no freezes or delays. Hence, the issues you're describing must be caused by the underlying network stack of the board.

Unfortunately, I do not have access to a MRK GSM 1400 to debug this myself. Could you run one of the basic examples sketches (plus modifications for the GSM) and test if QoS 1 and 2 are working? At best also against the "broker.shiftr.io" and "mqtt.eclipse.org" brokers.

256dpi commented 4 years ago

@diegoamayaw If the client is disconnected it won't publish the message and just return. You'r essentially reading the error value of the last command that did do something, in this case client.disconnect().

diegoamayaw commented 4 years ago

@256dpi Thanks for the comment. So let me get this straight, if I send a message while disconnected I won't get the error code of the sent message, but rather the error code of the last executed command, which was client.disconnect()?

256dpi commented 4 years ago

Yes, that's because lastError returns just the last encountered LWMQTT error (hence the prefix "last"). The idea is to always check the returned bool to see if the commands was successful and then check the client.lastError for further info.

Here the current code for publish:

bool MQTTClient::publish(const char topic[], const char payload[], int length, bool retained, int qos) {
  // return immediately if not connected
  if (!this->connected()) {
    return false;
  }

  // prepare message
  lwmqtt_message_t message = lwmqtt_default_message;
  message.payload = (uint8_t *)payload;
  message.payload_len = (size_t)length;
  message.retained = retained;
  message.qos = lwmqtt_qos_t(qos);

  // publish message
  this->_lastError = lwmqtt_publish(&this->client, lwmqtt_string(topic), message, this->timeout);
  if (this->_lastError != LWMQTT_SUCCESS) {
    // close connection
    this->close();

    return false;
  }

  return true;
}

In the future I might redesign that into a general error API that always sets an appropriate error.

diegoamayaw commented 4 years ago

@256dpi Thanks for the info, it's all much clearer now

salvq commented 4 years ago

@256dpi there has been an issue with power distribution in MKRGSM 1400 board, issue. Therefore this issue with QoS1 I have been facing might be related to that. Let me try this when I get the new replaced board...

256dpi commented 4 years ago

I guess this issue is related to #188. Are you also calling client.publish in the messageCallback? If so, this will potentially deadlock the client with QoS > 0 and sometimes also even with QoS == 0.

256dpi commented 4 years ago

Closing for this now. Please reopen if problem persists.