Whem MQTT connection is closed from IoT Hub, the sendPacketItem function fails, but the response is not captured and thus no error is raised. This happends at the mqtt_client_dowork function inside mqtt_client file.
The mqtt_client_dowork function is the one below. It is responsible for doing a ping every timeSincePing (4 minutes in our case).
void mqtt_client_dowork(MQTT_CLIENT_HANDLE handle)
{
MQTT_CLIENT* mqtt_client = (MQTT_CLIENT*)handle;
/*Codes_SRS_MQTT_CLIENT_18_001: [If the client is disconnected, mqtt_client_dowork shall do nothing.]*/
/*Codes_SRS_MQTT_CLIENT_07_023: [If the parameter handle is NULL then mqtt_client_dowork shall do nothing.]*/
if (mqtt_client != NULL && mqtt_client->xioHandle != NULL)
{
if (mqtt_client->mqtt_status & MQTT_STATUS_PENDING_CLOSE)
{
close_connection(mqtt_client);
// turn off pending close
mqtt_client->mqtt_status &= ~MQTT_STATUS_PENDING_CLOSE;
}
else
{
/*Codes_SRS_MQTT_CLIENT_07_024: [mqtt_client_dowork shall call the xio_dowork function to complete operations.]*/
xio_dowork(mqtt_client->xioHandle);
/*Codes_SRS_MQTT_CLIENT_07_025: [mqtt_client_dowork shall retrieve the the last packet send value and ...]*/
if (mqtt_client->mqtt_status & MQTT_STATUS_SOCKET_CONNECTED &&
mqtt_client->mqtt_status & MQTT_STATUS_CLIENT_CONNECTED &&
mqtt_client->keepAliveInterval > 0)
{
tickcounter_ms_t current_ms;
if (tickcounter_get_current_ms(mqtt_client->packetTickCntr, ¤t_ms) != 0)
{
LogError("Error: tickcounter_get_current_ms failed");
}
else
{
/* Codes_SRS_MQTT_CLIENT_07_035: [If the timeSincePing has expired past the maxPingRespTime then mqtt_client_dowork shall call the Error Callback function with the message MQTT_CLIENT_NO_PING_RESPONSE] */
if (mqtt_client->timeSincePing > 0 && ((current_ms - mqtt_client->timeSincePing)/1000) > mqtt_client->maxPingRespTime)
{
// We haven't gotten a ping response in the alloted time
set_error_callback(mqtt_client, MQTT_CLIENT_NO_PING_RESPONSE);
mqtt_client->timeSincePing = 0;
mqtt_client->packetSendTimeMs = 0;
mqtt_client->packetState = UNKNOWN_TYPE;
}
else if (((current_ms - mqtt_client->packetSendTimeMs) / 1000) >= mqtt_client->keepAliveInterval)
{
/*Codes_SRS_MQTT_CLIENT_07_026: [if keepAliveInternal is > 0 and the send time is greater than the MQTT KeepAliveInterval then it shall construct an MQTT PINGREQ packet.]*/
BUFFER_HANDLE pingPacket = mqtt_codec_ping();
if (pingPacket != NULL)
{
size_t size = BUFFER_length(pingPacket);
(void)sendPacketItem(mqtt_client, BUFFER_u_char(pingPacket), size);
BUFFER_delete(pingPacket);
(void)tickcounter_get_current_ms(mqtt_client->packetTickCntr, &mqtt_client->timeSincePing);
if (is_trace_enabled(mqtt_client))
{
STRING_HANDLE trace_log = STRING_construct("PINGREQ");
log_outgoing_trace(mqtt_client, trace_log);
STRING_delete(trace_log);
}
}
}
}
}
}
}
}
To reproduce the issue we are finding in our production devices, we disable a device inside the IoT Hub and this forcefully closes the mqtt connection. Doing so produces the following traces:
As you can see, there is a trace that the mqtt connection was closed (esp-tls-mbedtls: read error : -0x004C) but the device is not aware of it. Then the periodic ping fails, but the connectionStatusCallback is not triggered.
This issue raises because in the following line of the mqtt_client_dowork function:
The result is not checked and thus when the ping fails, nothing happens. Actually, as the timeSinsPing is not updated, the sendPacketItem is called constantly and the timeout which triggers the connectionStatusCallback is never intercepted.
Can this issue be reviewed? If you have any questions we will be following this issue and provide all the necessary information. Can we expect a patch to be part of the next release?
Development Machine
ESP32
SDK Version
Release LTS_08_2024
Protocol
MQTT
Describe the Bug
Whem MQTT connection is closed from IoT Hub, the sendPacketItem function fails, but the response is not captured and thus no error is raised. This happends at the mqtt_client_dowork function inside mqtt_client file.
This issue is also mentioned in the following issues: https://github.com/Azure/azure-iot-sdk-c/issues/2627 https://github.com/espressif/esp-azure/issues/127
The first one was opened by us, but as we detected what the issue was, we are opening this new issue as a bug.
The mqtt_client.c file is the following: https://github.com/Azure/azure-umqtt-c/blob/f1cc7804df4213a8f4b87575b1272b2f16a27438/src/mqtt_client.c
The mqtt_client_dowork function is the one below. It is responsible for doing a ping every timeSincePing (4 minutes in our case).
To reproduce the issue we are finding in our production devices, we disable a device inside the IoT Hub and this forcefully closes the mqtt connection. Doing so produces the following traces:
As you can see, there is a trace that the mqtt connection was closed (esp-tls-mbedtls: read error : -0x004C) but the device is not aware of it. Then the periodic ping fails, but the connectionStatusCallback is not triggered.
This issue raises because in the following line of the mqtt_client_dowork function:
The result is not checked and thus when the ping fails, nothing happens. Actually, as the timeSinsPing is not updated, the sendPacketItem is called constantly and the timeout which triggers the connectionStatusCallback is never intercepted.
Temporary fix In order to resolve the issue, we did something like this patch: https://github.com/MvIt-Now/azure-umqtt-c/commit/bcb0b2387fd04cc32c2d53d4d65363c296188af8 As you can see, the result of the sendPacketItem is now catched and it correctly handles the failure.
Can this issue be reviewed? If you have any questions we will be following this issue and provide all the necessary information. Can we expect a patch to be part of the next release?
Thank you