Azure / azure-iot-sdk-c

A C99 SDK for connecting devices to Microsoft Azure IoT services
https://azure.github.io/azure-iot-sdk-c
Other
588 stars 737 forks source link

[BUG]: MQTT_CLIENT_NO_PING_RESPONSE not triggered on Failure sending control packet data #2649

Open pcoll-jcm opened 1 month ago

pcoll-jcm commented 1 month ago

Development Machine

ESP32

SDK Version

Release LTS_08_2024

Protocol

MQTT

Describe the Bug

Whem MQTT connection is closed from IoT Hub, the sendPacketItem function fails, but the response is not captured and thus no error is raised. This happends at the mqtt_client_dowork function inside mqtt_client file.

This issue is also mentioned in the following issues: https://github.com/Azure/azure-iot-sdk-c/issues/2627 https://github.com/espressif/esp-azure/issues/127

The first one was opened by us, but as we detected what the issue was, we are opening this new issue as a bug.

The mqtt_client.c file is the following: https://github.com/Azure/azure-umqtt-c/blob/f1cc7804df4213a8f4b87575b1272b2f16a27438/src/mqtt_client.c

The mqtt_client_dowork function is the one below. It is responsible for doing a ping every timeSincePing (4 minutes in our case).

void mqtt_client_dowork(MQTT_CLIENT_HANDLE handle)
{
    MQTT_CLIENT* mqtt_client = (MQTT_CLIENT*)handle;
    /*Codes_SRS_MQTT_CLIENT_18_001: [If the client is disconnected, mqtt_client_dowork shall do nothing.]*/
    /*Codes_SRS_MQTT_CLIENT_07_023: [If the parameter handle is NULL then mqtt_client_dowork shall do nothing.]*/
    if (mqtt_client != NULL && mqtt_client->xioHandle != NULL)
    {
        if (mqtt_client->mqtt_status & MQTT_STATUS_PENDING_CLOSE)
        {
            close_connection(mqtt_client);
            // turn off pending close
            mqtt_client->mqtt_status &= ~MQTT_STATUS_PENDING_CLOSE;
        }
        else
        {
            /*Codes_SRS_MQTT_CLIENT_07_024: [mqtt_client_dowork shall call the xio_dowork function to complete operations.]*/
            xio_dowork(mqtt_client->xioHandle);

            /*Codes_SRS_MQTT_CLIENT_07_025: [mqtt_client_dowork shall retrieve the the last packet send value and ...]*/
            if (mqtt_client->mqtt_status & MQTT_STATUS_SOCKET_CONNECTED &&
                mqtt_client->mqtt_status & MQTT_STATUS_CLIENT_CONNECTED &&
                mqtt_client->keepAliveInterval > 0)
            {
                tickcounter_ms_t current_ms;
                if (tickcounter_get_current_ms(mqtt_client->packetTickCntr, &current_ms) != 0)
                {
                    LogError("Error: tickcounter_get_current_ms failed");
                }
                else
                {
                    /* Codes_SRS_MQTT_CLIENT_07_035: [If the timeSincePing has expired past the maxPingRespTime then mqtt_client_dowork shall call the Error Callback function with the message MQTT_CLIENT_NO_PING_RESPONSE] */
                    if (mqtt_client->timeSincePing > 0 && ((current_ms - mqtt_client->timeSincePing)/1000) > mqtt_client->maxPingRespTime)
                    {
                        // We haven't gotten a ping response in the alloted time
                        set_error_callback(mqtt_client, MQTT_CLIENT_NO_PING_RESPONSE);
                        mqtt_client->timeSincePing = 0;
                        mqtt_client->packetSendTimeMs = 0;
                        mqtt_client->packetState = UNKNOWN_TYPE;
                    }
                    else if (((current_ms - mqtt_client->packetSendTimeMs) / 1000) >= mqtt_client->keepAliveInterval)
                    {
                        /*Codes_SRS_MQTT_CLIENT_07_026: [if keepAliveInternal is > 0 and the send time is greater than the MQTT KeepAliveInterval then it shall construct an MQTT PINGREQ packet.]*/
                        BUFFER_HANDLE pingPacket = mqtt_codec_ping();
                        if (pingPacket != NULL)
                        {
                            size_t size = BUFFER_length(pingPacket);
                            (void)sendPacketItem(mqtt_client, BUFFER_u_char(pingPacket), size);
                            BUFFER_delete(pingPacket);
                            (void)tickcounter_get_current_ms(mqtt_client->packetTickCntr, &mqtt_client->timeSincePing);

                            if (is_trace_enabled(mqtt_client))
                            {
                                STRING_HANDLE trace_log = STRING_construct("PINGREQ");
                                log_outgoing_trace(mqtt_client, trace_log);
                                STRING_delete(trace_log);
                            }
                        }
                    }
                }
            }
        }
    }
}

To reproduce the issue we are finding in our production devices, we disable a device inside the IoT Hub and this forcefully closes the mqtt connection. Doing so produces the following traces:

image (1)

As you can see, there is a trace that the mqtt connection was closed (esp-tls-mbedtls: read error : -0x004C) but the device is not aware of it. Then the periodic ping fails, but the connectionStatusCallback is not triggered.

This issue raises because in the following line of the mqtt_client_dowork function:

(void)sendPacketItem(mqtt_client, BUFFER_u_char(pingPacket), size);

The result is not checked and thus when the ping fails, nothing happens. Actually, as the timeSinsPing is not updated, the sendPacketItem is called constantly and the timeout which triggers the connectionStatusCallback is never intercepted.

Temporary fix In order to resolve the issue, we did something like this patch: https://github.com/MvIt-Now/azure-umqtt-c/commit/bcb0b2387fd04cc32c2d53d4d65363c296188af8 As you can see, the result of the sendPacketItem is now catched and it correctly handles the failure.

Can this issue be reviewed? If you have any questions we will be following this issue and provide all the necessary information. Can we expect a patch to be part of the next release?

Thank you