Closed jkriba closed 7 years ago
Investigation in progress.
I also see the same behaviour but with a slightly different log output:
Error: Time:Wed Jul 5 12:09:01 2017 File:C:\offline\azure-iot-sdk-c\uamqp\src\message_sender.c Func:remove_pending_message_by_index Line:51 Removing pending message at index 0 testdevice1 Client connection status callback Message sent with sequence 5896 Error: Time:Wed Jul 5 12:09:23 2017 File:C:\offline\azure-iot-sdk-c\c-utility\adapters\socketio_win32.c Func:socketio_dowork Line:535 Socketio_Failure: Recieving data from endpoint: 10054. Error: Time:Wed Jul 5 12:0 9:23 2017 File:C:\offline\azure-iot-sdk-c\iothub_client\src\iothubtransport_amqp_common.c Func:on_amqp_connection_state_changed Line:624 Transport received an ERROR from the amqp_connection (state changed AMQP_CONNECTION_STATE_OPENED -> AMQP_CONNECTION_STATE_ERROR); it will be flagged for connection retry. Info: Transport state changed from AMQP_TRANSPORT_STATE_CONNECTED to AMQP_TRANSPORT_STATE_RECONNECTION_REQUIRED Error: Time:Wed Jul 5 12:09:23 2017 File:C:\offline\azure-iot-sdk-c\iothub_client\src\iothubtransport_amqp_connection.c Func:on_cbs_error Line:156 CBS Error occured Error: Time:Wed Jul 5 12:09:23 2017 File:C:\offline\azure-iot-sdk-c\uamqp\src\message.c Func:message_destroy Line:330 NULL
Backtrace also attached (Thread 4 is the crashing one) backtrace.txt
Found a fix you might want to try as well.
In message_sender.c in function on_link_state_changed a call will be made to the function indicate_all_messages_as_error
. This function frees all the messages and their associated memory. After this (a little while on) on_session_state_changed
(in link.c) will be called with the new state set to SESSION_STATE_ERROR. This will call remove_all_pending_deliveries
which will loop through all pending deliveries and call the on_delivery_settled callback
. This callback will then touch and attempt to use the previously freed memory and finally also free the message again.
Simply removing all calls to indicate_all_messages_as_error
in on_link_state_change
(around lines 560 in message_sender.c in uamqp) will make the crashing go away but I don't know if this causes other problems though.
Linking to issue in uamqp: https://github.com/Azure/azure-uamqp-c/issues/156
This has been verified using the latest azure-iot-sdk-c code from master, and the issue no longer repros. The fix in uamqp resolved the issue. No regressions were observed so far on the iothub_client.
If you have the opportunity of verifying the fix on your solution as well, please let us know the result for consistency.
Thanks for using the Azure IoT SDKs.
So the issue is still occurring. However, one important clarification: This issue is not related to #160 nor Azure/azure-uamqp-c#156 The comments from @Shakti213 do not apply to this issue.
I'll provide a fix soon.
A fix for this issue has been committed and is available in the master branch of Azure IoT C SDK.
Please try running your verification using the latest bits from master and let us know your results.
Thanks, Microsoft Azure IoT
The fix has been validated by the customer.
Thanks for contributing to the Azure IoT SDKs,
OS and version used:Windows 7, Ubuntu 14.04
SDK version used:Device SDK for C 20170616
Description of the issue:
Microsoft Device SDK's AMQP_WS Stack is Crashing in iothubtransport_amqp_connection.c when the network is getting disconnected and reconnected. This issue is 100% reproducible with the Microsoft Device SDK Sample code.
We were able to test this in both Windows / Linux Platforms and by removing the Ethernet manually or toggling the Wi-Fi .
Additionally we were able to link this particular issue with a backlog issue which was already there, https://github.com/Azure/azure-iot-sdk-c/issues/117 Any help will be really appreciated since this is impacting the reliability of the application.
Code sample exhibiting the issue:
We have used the Azure IoT Device sample "iothub_client_sample_amqp_websockets.c"
include
include
include "azure_c_shared_utility/platform.h"
include "azure_c_shared_utility/threadapi.h"
include "azure_c_shared_utility/crt_abstractions.h"
include "iothub_client.h"
include "iothub_message.h"
include "iothubtransportamqp_websockets.h"
//#include "certs.h"
//static const char connectionString = "[device connection string]"; static const char connectionString = "<>";
static int callbackCounter;
static char msgText[1024]; static char propText[1024];
define MESSAGE_COUNT 20
IOTHUB_CLIENT_HANDLE iotHubClientHandle;
typedef struct EVENT_INSTANCE_TAG { IOTHUB_MESSAGE_HANDLE messageHandle; int messageTrackingId; // For tracking the messages within the user callback. } EVENT_INSTANCE;
EVENT_INSTANCE messages[MESSAGE_COUNT];
static IOTHUBMESSAGE_DISPOSITION_RESULT ReceiveMessageCallback(IOTHUB_MESSAGE_HANDLE message, void userContextCallback) { int counter = (int)userContextCallback; const unsigned char buffer = NULL; size_t size = 0; const char messageId; const char correlationId;
}
static void SendConfirmationCallback(IOTHUB_CLIENT_CONFIRMATION_RESULT result, void userContextCallback) { EVENT_INSTANCE eventInstance = (EVENT_INSTANCE)userContextCallback; (void)printf("Confirmation[%d] received for message tracking id = %d with result = %s\r\n", callbackCounter, eventInstance->messageTrackingId, ENUM_TO_STRING(IOTHUB_CLIENT_CONFIRMATION_RESULT, result)); / Some device specific action code goes here... */ callbackCounter++; IoTHubMessage_Destroy(eventInstance->messageHandle); }
void InitPlatform() { int receiveContext = 0;; if (platform_init() != 0) { (void)printf("ERROR: failed initializing the platform.\r\n"); } else if ((iotHubClientHandle = IoTHubClient_CreateFromConnectionString(connectionString, AMQP_Protocol_over_WebSocketsTls)) == NULL) { (void)printf("ERROR: iotHubClientHandle is NULL!\r\n"); platform_deinit(); }
}
void iothub_client_sample_amqp_websockets_run(void) {
}
Console log of the issue:
Hello World!!! Info: IoT Hub SDK for C, version 1.1.17
Info: Retry policy set (5, timeout = 0)
Starting the IoTHub client sample AMQP over WebSockets...
IoTHubClient_SetMessageCallback...successful.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
Press any key to exit the application.
Confirmation[0] received for message tracking id = 0 with result = IOTHUB_CLIENT_CONFIRMATION_OK
Confirmation[1] received for message tracking id = 1 with result = IOTHUB_CLIENT_CONFIRMATION_OK
Confirmation[2] received for message tracking id = 2 with result = IOTHUB_CLIENT_CONFIRMATION_OK
Confirmation[3] received for message tracking id = 3 with result = IOTHUB_CLIENT_CONFIRMATION_OK
Confirmation[4] received for message tracking id = 4 with result = IOTHUB_CLIENT_CONFIRMATION_OK
Starting the IoTHub client sample AMQP over WebSockets...
IoTHubClient_SetMessageCallback...successful.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
Press any key to exit the application.
Starting the IoTHub client sample AMQP over WebSockets... Error in `/home/avenger/AzureSample_LLAPi/AzureSDKBase': double free or corruption (fasttop): 0x00007f71fc019970
IoTHubClient_SetMessageCallback...successful.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
IoTHubClient_SendEventAsync accepted data for transmission to IoT Hub.
Press any key to exit the application.
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/c-utility/src/wsio.c Func:on_underlying_ws_error Line:448 on_underlying_ws_error called with error code 3
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/iothub_client/src/iothubtransport_amqp_common.c Func:on_amqp_connection_state_changed Line:622 Transport received an ERROR from the amqp_connection (state changed AMQP_CONNECTION_STATE_OPENED -> AMQP_CONNECTION_STATE_ERROR); it will be flagged for connection retry.
Info: Transport state changed from AMQP_TRANSPORT_STATE_CONNECTED to AMQP_TRANSPORT_STATE_RECONNECTION_REQUIRED
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/iothub_client/src/iothubtransport_amqp_connection.c Func:on_cbs_error Line:156 CBS Error occured
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/uamqp/src/message.c Func:message_destroy Line:330 NULL message
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/c-utility/adapters/socketio_berkeley.c Func:socketio_send Line:814 Failure: sending socket failed. errno=107 (Transport endpoint is not connected).
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/c-utility/src/tlsio_openssl.c Func:write_outgoing_bytes Line:566 Error in xio_send.
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/c-utility/src/tlsio_openssl.c Func:tlsio_openssl_send Line:1284 Error in write_outgoing_bytes.
Error: Time:Wed Jun 28 13:36:29 2017 File:/home/avenger/AzureSDK_Latest/azure-iot-sdk-c/c-utility/src/uws_client.c Func:uws_client_send_frame_async Line:1773 Could not send bytes through the underlying IO