Azure / azure-iot-sdk-c

A C99 SDK for connecting devices to Microsoft Azure IoT services
https://azure.github.io/azure-iot-sdk-c
Other
589 stars 737 forks source link

Segmentation fault during device to cloud message #2592

Closed VishalKosana closed 7 months ago

VishalKosana commented 8 months ago

Hi, I am utilizing IoTHubDeviceClient_SendEventAsync to transmit messages to the Azure IoT Hub whenever a message is received from my sensor. However, I've observed instances where multiple message requests are generated for the same message. When this occurs, it often leads to segmentation fault errors. This issue doesn't occur consistently but tends to happen more frequently when multiple messages are received from the sensor simultaneously. Below are the logs:

2024-03-08 06:55:23 - fri - TRACE   - Processing Message no 3:
-> 06:55:23 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/iotdevice2/messages/events/MessageDirection=2&MessageType=1&SourceDeviceID=sensor&DestinationDeviceID= | PACKET_ID: 5 | PAYLOAD_LEN: 278
-> 06:55:23 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/iotdevice2/messages/events/MessageDirection=2&MessageType=1&SourceDeviceID=sensor&DestinationDeviceID= | PACKET_ID: 6 | PAYLOAD_LEN: 278

<- 06:55:23 PUBACK | PACKET_ID: 5
Message result : IOTHUB_CLIENT_CONFIRMATION_OK

2024-03-08 06:55:23 - fri - DEBUG   - IoTOutgoingHandlerThreadFunc:  Received Message Len is 266
2024-03-08 06:55:23 - fri - TRACE   - Processing Message no 4:
-> 06:55:23 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/iotdevice2/messages/events/MessageDirection=2&MessageType=1&SourceDeviceID=sensor&DestinationDeviceID= | PACKET_ID: 7 | PAYLOAD_LEN: 266
<- 06:55:24 PUBACK | PACKET_ID: 6
Segmentation fault

When checked the gdb backtrace,

#0  0x00000000000402b9 in ?? ()
#1  0x0000aaaaaaaf2b08 in IoTHubClientCore_LL_SendComplete ()
#2  0x0000aaaaaaafe89c in notifyApplicationOfSendMessageComplete ()
#3  0x0000aaaaaab01ebc in mqttOperationCompleteCallback ()
#4  0x0000aaaaaab22d10 in recvCompleteCallback ()
#5  0x0000aaaaaab25494 in completePacketData ()
#6  0x0000aaaaaab26190 in mqtt_codec_bytesReceived ()
#7  0x0000aaaaaab21cd8 in onBytesReceived ()
#8  0x0000aaaaaab41fd8 in decode_ssl_received_bytes ()
#9  0x0000aaaaaab42090 in on_underlying_io_bytes_received ()
#10 0x0000aaaaaab488f4 in socketio_dowork ()
#11 0x0000aaaaaab342a4 in xio_dowork ()
#12 0x0000aaaaaab434c4 in tlsio_openssl_dowork ()
#13 0x0000aaaaaab342a4 in xio_dowork ()
#14 0x0000aaaaaab240e0 in mqtt_client_dowork ()
#15 0x0000aaaaaab059a8 in IoTHubTransport_MQTT_Common_DoWork ()
#16 0x0000aaaaaaafdaf8 in IoTHubTransportMqtt_DoWork ()
#17 0x0000aaaaaaaf68b8 in IoTHubClientCore_LL_DoWork ()
#18 0x0000aaaaaaaf92b4 in IoTHubDeviceClient_LL_DoWork ()
#19 0x0000aaaaaaadf028 in IoTOutgoingHandlerThreadFunc (pParam=<optimized out>) at src/iot.c:529
#20 0x0000fffff7df5648 in start_thread (arg=0xffffdfffea50) at pthread_create.c:477
#21 0x0000fffff7d3401c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

While checking the code, void IoTHubClientCore_LL_SendComplete() function:

        PDLIST_ENTRY oldest;
        while ((oldest = DList_RemoveHeadList(completed)) != completed)
        {
            IOTHUB_MESSAGE_LIST* messageList = (IOTHUB_MESSAGE_LIST*)containingRecord(oldest, IOTHUB_MESSAGE_LIST, entry);
            if (messageList->callback != NULL)
            {
                messageList->callback(result, messageList->context);
            }
            IoTHubMessage_Destroy(messageList->messageHandle);
            free(messageList);

Segmentation fault is happening every time at the line messageList->callback(result, messageList->context);

Requesting your help on same. Thanks

ericwolz commented 8 months ago

What version of the SDK is this? You are using non-multi threaded APIs and there are no resource locking on these. What are you doing on the callback for packet number 5.

VishalKosana commented 8 months ago
  1. I am using the latest SDK.
  2. I am simply sending the message to cloud and printing the confirmation message through the callback function. Once confirmation received, I am trying to send next message in my Queue.
  3. I am unclear how two publish requests are sent for the single message. And everytime this happens, I am getting segmentation fault.
-> 06:55:23 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/iotdevice/messages/events/MessageDirection=2&MessageType=1&SourceDeviceID=sensor&DestinationDeviceID= | PACKET_ID: 5 | PAYLOAD_LEN: 278
-> 06:55:23 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/iotdevice2/messages/events/MessageDirection=2&MessageType=1&SourceDeviceID=sensor&DestinationDeviceID= | PACKET_ID: 6 | PAYLOAD_LEN: 278
ericwolz commented 8 months ago

Why are there two different devices? iotdeviceand iotdevice2

VishalKosana commented 8 months ago

My apologies for the mistake in my typing. To clarify, it is a single IoT device (iotdevice2) and my application is trying to send a D2C message to that particular IoT device. I am calling IoTHubDeviceClient_SendEventAsync only once for each message. However, I've noticed that two publish requests are being created simultaneously.

ericwolz commented 8 months ago

Can you try the previous release version?

https://github.com/Azure/azure-iot-sdk-c/releases/tag/LTS_08_2023

ewertons commented 8 months ago

Also, could you please share some more details?

  1. What is the platform you are running your application on? Linux? Which flavor?
  2. How frequently are you calling IoTHubDeviceClient_SendEventAsync when the issue (crash) occurs?
  3. For how long had your application been running when you experience the crash? How many messages were sent before the crash?
  4. How often have you observed this issue in your application?
  5. Have you tried previous versions of the azure-iot-sdk-c? If yes, did you experience this crash as well?
VishalKosana commented 8 months ago

What is the platform you are running your application on? Linux? Which flavor? Application is running on Linux debian flavor.

How frequently are you calling IoTHubDeviceClient_SendEventAsync when the issue (crash) occurs? With a delay of ThreadAPI_Sleep(10), If I have multiple messages available.

For how long had your application been running when you experience the crash? How many messages were sent before the crash? The number of messages sent are not triggering this fault. It can happen even for the first message or 100th message.

How often have you observed this issue in your application? It is random in nature, but everytime it happens, I can see two MQTT publish requests are going. So maybe this is what triggering the issue.

Have you tried previous versions of the azure-iot-sdk-c? If yes, did you experience this crash as well? Yes, I faced the same issue with that also.

ericwolz commented 8 months ago

You will have to provide a sample that reproduces this issue.

ewertons commented 7 months ago

We have tried reproducing this issue in house and were not able to. We will close this for now, but if you would like to follow up and provide a sample that reproduces the issue please feel free to reopen this GH issue. Thanks, Azure IoT SDK Team.