Azure / azure-iot-sdk-c

A C99 SDK for connecting devices to Microsoft Azure IoT services
https://azure.github.io/azure-iot-sdk-c
Other
588 stars 737 forks source link

AMQP_WS Stack corruption or double free in iothubtransport_amqp_connection.c over network disconnection/ reconnection #167

Closed jkriba closed 7 years ago

jkriba commented 7 years ago

Description of the issue:

Microsoft Device SDK's AMQP_WS Stack is Crashing in iothubtransport_amqp_connection.c when the network is getting disconnected and reconnected. This issue is 100% reproducible with the Microsoft Device SDK Sample code.

We were able to test this in both Windows / Linux Platforms and by removing the Ethernet manually or toggling the Wi-Fi .

Additionally we were able to link this particular issue with a backlog issue which was already there, https://github.com/Azure/azure-iot-sdk-c/issues/117 Any help will be really appreciated since this is impacting the reliability of the application.

Code sample exhibiting the issue:

We have used the Azure IoT Device sample "iothub_client_sample_amqp_websockets.c"

include

include

include "azure_c_shared_utility/platform.h"

include "azure_c_shared_utility/threadapi.h"

include "azure_c_shared_utility/crt_abstractions.h"

include "iothub_client.h"

include "iothub_message.h"

include "iothubtransportamqp_websockets.h"

//#include "certs.h"

//static const char connectionString = "[device connection string]"; static const char connectionString = "<>";

static int callbackCounter;

static char msgText[1024]; static char propText[1024];

define MESSAGE_COUNT 20

IOTHUB_CLIENT_HANDLE iotHubClientHandle;

typedef struct EVENT_INSTANCE_TAG { IOTHUB_MESSAGE_HANDLE messageHandle; int messageTrackingId; // For tracking the messages within the user callback. } EVENT_INSTANCE;

EVENT_INSTANCE messages[MESSAGE_COUNT];

static IOTHUBMESSAGE_DISPOSITION_RESULT ReceiveMessageCallback(IOTHUB_MESSAGE_HANDLE message, void userContextCallback) { int counter = (int)userContextCallback; const unsigned char buffer = NULL; size_t size = 0; const char messageId; const char correlationId;

// Message properties
if ((messageId = IoTHubMessage_GetMessageId(message)) == NULL)
{
    messageId = "<null>";
}

if ((correlationId = IoTHubMessage_GetCorrelationId(message)) == NULL)
{
    correlationId = "<null>";
}

// Message content
IOTHUBMESSAGE_CONTENT_TYPE contentType = IoTHubMessage_GetContentType(message);

if (contentType == IOTHUBMESSAGE_BYTEARRAY)
{
    if (IoTHubMessage_GetByteArray(message, &buffer, &size) == IOTHUB_MESSAGE_OK)
    {
        (void)printf("Received Message [%d]\r\n Message ID: %s\r\n Correlation ID: %s\r\n BINARY Data: <<<%.*s>>> & Size=%d\r\n", *counter, messageId, correlationId, (int)size, buffer, (int)size);
    }
    else
    {
        (void)printf("Failed getting the BINARY body of the message received.\r\n");
    }
}
else if (contentType == IOTHUBMESSAGE_STRING)
{
    if ((buffer = (const unsigned char*)IoTHubMessage_GetString(message)) != NULL && (size = strlen((const char*)buffer)) > 0)
    {
        (void)printf("Received Message [%d]\r\n Message ID: %s\r\n Correlation ID: %s\r\n STRING Data: <<<%.*s>>> & Size=%d\r\n", *counter, messageId, correlationId, (int)size, buffer, (int)size);
    }
    else
    {
        (void)printf("Failed getting the STRING body of the message received.\r\n");
    }
}
else
{
    (void)printf("Failed getting the body of the message received (type %i).\r\n", contentType);
}

// Retrieve properties from the message
MAP_HANDLE mapProperties = IoTHubMessage_Properties(message);
if (mapProperties != NULL)
{
    const char*const* keys;
    const char*const* values;
    size_t propertyCount = 0;
    if (Map_GetInternals(mapProperties, &keys, &values, &propertyCount) == MAP_OK)
    {
        if (propertyCount > 0)
        {
            size_t index;

            printf(" Message Properties:\r\n");
            for (index = 0; index < propertyCount; index++)
            {
                printf("\tKey: %s Value: %s\r\n", keys[index], values[index]);
            }
            printf("\r\n");
        }
    }
}

(*counter)++;

/* Some device specific action code goes here... */
return IOTHUBMESSAGE_ACCEPTED;

}

static void SendConfirmationCallback(IOTHUB_CLIENT_CONFIRMATION_RESULT result, void userContextCallback) { EVENT_INSTANCE eventInstance = (EVENT_INSTANCE)userContextCallback; (void)printf("Confirmation[%d] received for message tracking id = %d with result = %s\r\n", callbackCounter, eventInstance->messageTrackingId, ENUM_TO_STRING(IOTHUB_CLIENT_CONFIRMATION_RESULT, result)); / Some device specific action code goes here... */ callbackCounter++; IoTHubMessage_Destroy(eventInstance->messageHandle); }

void InitPlatform() { int receiveContext = 0;; if (platform_init() != 0) { (void)printf("ERROR: failed initializing the platform.\r\n"); } else if ((iotHubClientHandle = IoTHubClient_CreateFromConnectionString(connectionString, AMQP_Protocol_over_WebSocketsTls)) == NULL) { (void)printf("ERROR: iotHubClientHandle is NULL!\r\n"); platform_deinit(); }

/* Setting Message call back, so we can receive Commands. */
if (IoTHubClient_SetMessageCallback(iotHubClientHandle, ReceiveMessageCallback, &receiveContext) != IOTHUB_CLIENT_OK)
{
    (void)printf("ERROR: IoTHubClient_SetMessageCallback..........FAILED!\r\n");
}

}

void iothub_client_sample_amqp_websockets_run(void) {

srand((unsigned int)time(NULL));
double avgWindSpeed = 10.0;
double minTemperature = 20.0;
double minHumidity = 60.0;
double temperature = 0;
double humidity = 0;

callbackCounter = 0;

(void)printf("Starting the IoTHub client sample AMQP over WebSockets...\r\n");

(void)printf("IoTHubClient_SetMessageCallback...successful.\r\n");

/* Now that we are ready to receive commands, let's send some messages */
for (size_t i = 0; i < MESSAGE_COUNT; i++)
{
    temperature = minTemperature + (rand() % 10);
    humidity = minHumidity +  (rand() % 20);
    sprintf_s(msgText, sizeof(msgText), "{\"deviceId\":\"myFirstDevice\",\"windSpeed\":%.2f,\"temperature\":%.2f,\"humidity\":%.2f}", avgWindSpeed + (rand() % 4 + 2), temperature, humidity);
    if ((messages[i].messageHandle = IoTHubMessage_CreateFromByteArray((const unsigned char*)msgText, strlen(msgText))) == NULL)
    {
        (void)printf("ERROR: iotHubMessageHandle is NULL!\r\n");
    }
    else
    {
        messages[i].messageTrackingId = (int)i;

        MAP_HANDLE propMap = IoTHubMessage_Properties(messages[i].messageHandle);
        (void)sprintf_s(propText, sizeof(propText), temperature > 28 ? "true" : "false");
        if (Map_AddOrUpdate(propMap, "temperatureAlert", propText) != MAP_OK)
        {
            (void)printf("ERROR: Map_AddOrUpdate Failed!\r\n");
        }

        if (IoTHubClient_SendEventAsync(iotHubClientHandle, messages[i].messageHandle, SendConfirmationCallback, &messages[i]) != IOTHUB_CLIENT_OK)
        {
            (void)printf("ERROR: IoTHubClient_SendEventAsync..........FAILED!\r\n");
        }
        else
        {
            (void)printf("IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.\r\n");
        }
    }

    //IoTHubClient_Destroy(iotHubClientHandle);
    //platform_deinit();
}

/* Wait for Commands. */
(void)printf("Press any key to exit the application. \r\n");
(void)getchar();

}

Console log of the issue:

Hello World!!! Info: IoT Hub SDK for C, version 1.1.17

Info: Retry policy set (5, timeout = 0)

Starting the IoTHub client sample AMQP over WebSockets...

IoTHubClient_SetMessageCallback...successful.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

Press any key to exit the application.

Confirmation[0] received for message tracking id = 0 with result = IOTHUB_CLIENT_CONFIRMATION_OK

Confirmation[1] received for message tracking id = 1 with result = IOTHUB_CLIENT_CONFIRMATION_OK

Confirmation[2] received for message tracking id = 2 with result = IOTHUB_CLIENT_CONFIRMATION_OK

Confirmation[3] received for message tracking id = 3 with result = IOTHUB_CLIENT_CONFIRMATION_OK

Confirmation[4] received for message tracking id = 4 with result = IOTHUB_CLIENT_CONFIRMATION_OK

Starting the IoTHub client sample AMQP over WebSockets...

IoTHubClient_SetMessageCallback...successful.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

Press any key to exit the application.

Starting the IoTHub client sample AMQP over WebSockets... Error in `/home/avenger/AzureSample_LLAPi/AzureSDKBase': double free or corruption (fasttop): 0x00007f71fc019970

IoTHubClient_SetMessageCallback...successful.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.

Press any key to exit the application.

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/c-utility/src/wsio.c Func:on_underlying_ws_error Line:448 on_underlying_ws_error called with error code 3

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/iothub_client/src/iothubtransport_amqp_common.c Func:on_amqp_connection_state_changed Line:622 Transport received an ERROR from the amqp_connection (state changed AMQP_CONNECTION_STATE_OPENED -> AMQP_CONNECTION_STATE_ERROR); it will be flagged for connection retry.

Info: Transport state changed from AMQP_TRANSPORT_STATE_CONNECTED to AMQP_TRANSPORT_STATE_RECONNECTION_REQUIRED

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/iothub_client/src/iothubtransport_amqp_connection.c Func:on_cbs_error Line:156 CBS Error occured

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/c-utility/adapters/socketio_berkeley.c Func:socketio_send Line:814 Failure: sending socket failed. errno=107 (Transport endpoint is not connected).

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/c-utility/src/tlsio_openssl.c Func:write_outgoing_bytes Line:566 Error in xio_send.

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/c-utility/src/tlsio_openssl.c Func:tlsio_openssl_send Line:1284 Error in write_outgoing_bytes.

Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/c-utility/src/uws_client.c Func:uws_client_send_frame_async Line:1773 Could not send bytes through the underlying IO

markrad commented 7 years ago

Investigation in progress.

AndWass commented 7 years ago

I also see the same behaviour but with a slightly different log output:

Error: Time:Wed Jul 5 12:09:01 2017 File:C:\offline\azure-iot-sdk-c\uamqp\src\message_sender.c Func:remove_pending_message_by_index Line:51 Removing pending message at index 0 testdevice1 Client connection status callback Message sent with sequence 5896 Error: Time:Wed Jul 5 12:09:23 2017 File:C:\offline\azure-iot-sdk-c\c-utility\adapters\socketio_win32.c Func:socketio_dowork Line:535 Socketio_Failure: Recieving data from endpoint: 10054. Error: Time:Wed Jul 5 12:0 9:23 2017 File:C:\offline\azure-iot-sdk-c\iothub_client\src\iothubtransport_amqp_common.c Func:on_amqp_connection_state_changed Line:624 Transport received an ERROR from the amqp_connection (state changed AMQP_CONNECTION_STATE_OPENED -> AMQP_CONNECTION_STATE_ERROR); it will be flagged for connection retry. Info: Transport state changed from AMQP_TRANSPORT_STATE_CONNECTED to AMQP_TRANSPORT_STATE_RECONNECTION_REQUIRED Error: Time:Wed Jul 5 12:09:23 2017 File:C:\offline\azure-iot-sdk-c\iothub_client\src\iothubtransport_amqp_connection.c Func:on_cbs_error Line:156 CBS Error occured Error: Time:Wed Jul 5 12:09:23 2017 File:C:\offline\azure-iot-sdk-c\uamqp\src\message.c Func:message_destroy Line:330 NULL

Backtrace also attached (Thread 4 is the crashing one) backtrace.txt

AndWass commented 7 years ago

Found a fix you might want to try as well.

In message_sender.c in function on_link_state_changed a call will be made to the function indicate_all_messages_as_error. This function frees all the messages and their associated memory. After this (a little while on) on_session_state_changed (in link.c) will be called with the new state set to SESSION_STATE_ERROR. This will call remove_all_pending_deliveries which will loop through all pending deliveries and call the on_delivery_settled callback. This callback will then touch and attempt to use the previously freed memory and finally also free the message again.

Simply removing all calls to indicate_all_messages_as_error in on_link_state_change (around lines 560 in message_sender.c in uamqp) will make the crashing go away but I don't know if this causes other problems though.

tameraw commented 7 years ago

Linking to issue in uamqp: https://github.com/Azure/azure-uamqp-c/issues/156

ewertons commented 7 years ago

This has been verified using the latest azure-iot-sdk-c code from master, and the issue no longer repros. The fix in uamqp resolved the issue. No regressions were observed so far on the iothub_client.

ewertons commented 7 years ago

If you have the opportunity of verifying the fix on your solution as well, please let us know the result for consistency.

Thanks for using the Azure IoT SDKs.

ewertons commented 7 years ago

So the issue is still occurring. However, one important clarification: This issue is not related to #160 nor Azure/azure-uamqp-c#156 The comments from @Shakti213 do not apply to this issue.

I'll provide a fix soon.

ewertons commented 7 years ago

A fix for this issue has been committed and is available in the master branch of Azure IoT C SDK.

Please try running your verification using the latest bits from master and let us know your results.

Thanks, Microsoft Azure IoT

ewertons commented 7 years ago

The fix has been validated by the customer.

Thanks for contributing to the Azure IoT SDKs,