Azure / azure-iot-sdk-c

A C99 SDK for connecting devices to Microsoft Azure IoT services
https://azure.github.io/azure-iot-sdk-c
Other
587 stars 739 forks source link

Duplicate messages seen in Azure IoT Hub #2514

Closed acode-x closed 1 year ago

acode-x commented 1 year ago

Hi,

I am using IoTHubDeviceClient_SendEventAsync to push messages to Azure IoT Hub every 1min.

IoTHubMessage_CreateFromString(content.c_str());
(void)IoTHubMessage_SetContentTypeSystemProperty(static_cast<MY_HANDLE>(msg), "application%2Fjson");
(void)IoTHubMessage_SetContentEncodingSystemProperty(static_cast<MY_HANDLE>(msg), "utf-8");

We notice that some messages are sent multiple times, and there are duplicate entries created in Azure Hub for the same message. Also, it reconnects quite often (not sure if bandwidth is a problem here as CONNACK and SUBACK is quickly received). This scenario occurs multiple times a day for quite some time. Is seeing the duplicate messages in Azure Hub expected behaviour when we use QoS 1? If so, could you guide me how to use QoS 2?

Attaching logs for reference.

-> 01:30:07 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 128 | USERNAME: aaa/temperature/?api-version=2020-09-30&DeviceClientType=iothubclient%2f1.10.0%20(native%3b%20Linux%3b%20x86_64) | CLEAN: 0
<- 01:30:07 CONNACK | SESSION_PRESENT: true | RETURN_CODE: 0x0
-> 01:30:07 SUBSCRIBE | PACKET_ID: 22534 | TOPIC_NAME: $iothub/twin/res/# | QOS: 0 | TOPIC_NAME: $iothub/twin/PATCH/properties/desired/# | QOS: 0
-> 01:30:07 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 21852 | PAYLOAD_LEN: 688479
-> 01:30:07 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 21853 | PAYLOAD_LEN: 188830
-> 01:30:07 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 21854 | PAYLOAD_LEN: 10177
-> 01:30:07 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/PATCH/properties/reported/?$rid=22535 | PAYLOAD_LEN: 33
<- 01:30:07 SUBACK | PACKET_ID: 22534 | RETURN_CODE: 0 | RETURN_CODE: 0
-> 01:30:07 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/GET/?$rid=22536
-> 01:30:07 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 128 | USERNAME: aaa/temperature/?api-version=2020-09-30&DeviceClientType=iothubclient%2f1.10.0%20(native%3b%20Linux%3b%20x86_64) | CLEAN: 0
<- 01:30:07 CONNACK | SESSION_PRESENT: true | RETURN_CODE: 0x0
-> 01:30:07 SUBSCRIBE | PACKET_ID: 22537 | TOPIC_NAME: $iothub/twin/res/# | QOS: 0 | TOPIC_NAME: $iothub/twin/PATCH/properties/desired/# | QOS: 0
-> 01:30:07 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 21852 | PAYLOAD_LEN: 688479
-> 01:30:07 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 21853 | PAYLOAD_LEN: 188830
-> 01:30:07 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 21854 | PAYLOAD_LEN: 10177
-> 01:30:07 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/PATCH/properties/reported/?$rid=22538 | PAYLOAD_LEN: 33
<- 01:30:07 SUBACK | PACKET_ID: 22537 | RETURN_CODE: 0 | RETURN_CODE: 0
-> 01:30:07 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/GET/?$rid=22539
-> 01:30:08 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 128 | USERNAME: aaa/temperature/?api-version=2020-09-30&DeviceClientType=iothubclient%2f1.10.0%20(native%3b%20Linux%3b%20x86_64) | CLEAN: 0
<- 01:30:08 CONNACK | SESSION_PRESENT: true | RETURN_CODE: 0x0
-> 01:30:08 SUBSCRIBE | PACKET_ID: 22540 | TOPIC_NAME: $iothub/twin/res/# | QOS: 0 | TOPIC_NAME: $iothub/twin/PATCH/properties/desired/# | QOS: 0
-> 01:30:08 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 21852 | PAYLOAD_LEN: 688479
-> 01:30:08 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 21853 | PAYLOAD_LEN: 188830
-> 01:30:08 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 21854 | PAYLOAD_LEN: 10177
-> 01:30:08 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/PATCH/properties/reported/?$rid=22541 | PAYLOAD_LEN: 33
<- 01:30:08 SUBACK | PACKET_ID: 22540 | RETURN_CODE: 0 | RETURN_CODE: 0
-> 01:30:08 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/GET/?$rid=22542
-> 01:30:08 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 128 | USERNAME: aaa/temperature/?api-version=2020-09-30&DeviceClientType=iothubclient%2f1.10.0%20(native%3b%20Linux%3b%20x86_64) | CLEAN: 0
Error: Time:Mon Jul 31 01:30:08 2023 File:/tmp/ci-aF7DJiyScz/azure-iot-sdk-c/iothub_client/src/iothubtransport_mqtt_common.c Func:ProcessPendingTelemetryMessages Line:2483 Disconnecting MQTT connection because message PUBACK (21852) timeout.
[2023-07-31 01:30:08.596841] WARNING SDK(1021): Error/timeout on IoTHub client confirmation - Dropping message. Delivery failed due to timeout. (skipped 4 similar messsages)
Error: Time:Mon Jul 31 01:30:08 2023 File:/tmp/ci-aF7DJiyScz/azure-iot-sdk-c/iothub_client/src/iothubtransport_mqtt_common.c Func:ProcessPendingTelemetryMessages Line:2483 Disconnecting MQTT connection because message PUBACK (21853) timeout.
Error: Time:Mon Jul 31 01:30:08 2023 File:/tmp/ci-aF7DJiyScz/azure-iot-sdk-c/iothub_client/src/iothubtransport_mqtt_common.c Func:ProcessPendingTelemetryMessages Line:2483 Disconnecting MQTT connection because message PUBACK (21854) timeout.
Error: Time:Mon Jul 31 01:30:09 2023 File:/tmp/ci-aF7DJiyScz/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:onOpenComplete Line:452 Error: failure opening connection to endpoint
Error: Time:Mon Jul 31 01:30:40 2023 File:/tmp/ci-aF7DJiyScz/azure-iot-sdk-c/iothub_client/src/iothubtransport_mqtt_common.c Func:UpdateMqttConnectionStateIfNeeded Line:2844 mqtt_client timed out waiting for CONNACK: disconnecting MQTT connection
-> 01:30:40 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 128 | USERNAME: aaa/temperature/?api-version=2020-09-30&DeviceClientType=iothubclient%2f1.10.0%20(native%3b%20Linux%3b%20x86_64) | CLEAN: 0
<- 01:30:40 CONNACK | SESSION_PRESENT: true | RETURN_CODE: 0x0
-> 01:30:40 SUBSCRIBE | PACKET_ID: 22543 | TOPIC_NAME: $iothub/twin/res/# | QOS: 0 | TOPIC_NAME: $iothub/twin/PATCH/properties/desired/# | QOS: 0
-> 01:30:40 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/PATCH/properties/reported/?$rid=22544 | PAYLOAD_LEN: 33
-> 01:30:40 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22545 | PAYLOAD_LEN: 9794
-> 01:30:40 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22546 | PAYLOAD_LEN: 9794
<- 01:30:40 SUBACK | PACKET_ID: 22543 | RETURN_CODE: 0 | RETURN_CODE: 0
-> 01:30:40 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/GET/?$rid=22547
<- 01:30:40 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/res/200/?$rid=22547 | PAYLOAD_LEN: 87
-> 01:30:40 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22545 | PAYLOAD_LEN: 9794
-> 01:30:40 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22546 | PAYLOAD_LEN: 9794
-> 01:30:40 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 128 | USERNAME: aaa/temperature/?api-version=2020-09-30&DeviceClientType=iothubclient%2f1.10.0%20(native%3b%20Linux%3b%20x86_64) | CLEAN: 0
<- 01:30:40 CONNACK | SESSION_PRESENT: true | RETURN_CODE: 0x0
-> 01:30:40 SUBSCRIBE | PACKET_ID: 22548 | TOPIC_NAME: $iothub/twin/res/# | QOS: 0 | TOPIC_NAME: $iothub/twin/PATCH/properties/desired/# | QOS: 0
-> 01:30:40 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22545 | PAYLOAD_LEN: 9794
-> 01:30:40 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22546 | PAYLOAD_LEN: 9794
-> 01:30:40 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/PATCH/properties/reported/?$rid=22549 | PAYLOAD_LEN: 33
<- 01:30:41 SUBACK | PACKET_ID: 22548 | RETURN_CODE: 0 | RETURN_CODE: 0
-> 01:30:41 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/GET/?$rid=22550
<- 01:30:41 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/res/200/?$rid=22550 | PAYLOAD_LEN: 87
-> 01:30:41 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22545 | PAYLOAD_LEN: 9794
-> 01:30:41 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22546 | PAYLOAD_LEN: 9794
-> 01:30:41 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 128 | USERNAME: aaa/temperature/?api-version=2020-09-30&DeviceClientType=iothubclient%2f1.10.0%20(native%3b%20Linux%3b%20x86_64) | CLEAN: 0
<- 01:30:41 CONNACK | SESSION_PRESENT: true | RETURN_CODE: 0x0
-> 01:30:41 SUBSCRIBE | PACKET_ID: 22551 | TOPIC_NAME: $iothub/twin/res/# | QOS: 0 | TOPIC_NAME: $iothub/twin/PATCH/properties/desired/# | QOS: 0
-> 01:30:41 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22545 | PAYLOAD_LEN: 9794
-> 01:30:41 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22546 | PAYLOAD_LEN: 9794
-> 01:30:41 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/PATCH/properties/reported/?$rid=22552 | PAYLOAD_LEN: 33
<- 01:30:41 SUBACK | PACKET_ID: 22551 | RETURN_CODE: 0 | RETURN_CODE: 0
-> 01:30:41 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/GET/?$rid=22553
<- 01:30:41 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/res/200/?$rid=22553 | PAYLOAD_LEN: 87
-> 01:30:41 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22545 | PAYLOAD_LEN: 9794
-> 01:30:41 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/temperature/messages/events/%24.ct=application%2Fjson&%24.ce=utf-8 | PACKET_ID: 22546 | PAYLOAD_LEN: 9794

Note: The packetID doesn't change, unless there are some message timeouts.

Requesting your help on same. Thanks

acode-x commented 1 year ago

Summarising all my questions again:

  1. Although we get CONNACK and SUBACK quickly, why is there a problem with PUBACK?
  2. If Azure SDK is re-trying to publish the same packet on reconnect, why is the IS_DUP property not set to true?
  3. We notice duplicate packets (sometimes even 20) in azure hub. Is this expected? How do we fix this, so I don't store duplicate packets in Azure Hub?
acode-x commented 1 year ago

IOTHUB_CLIENT_CONNECTION_NO_NETWORK is the error we get in re-connect attempt failure.

gopicisco commented 1 year ago

This issue is seen frequently in our customer setup and causing a major problem of message quota getting filled up quickly due to burst of same messages delivered multiple times when any reconnects are seen frequently. When the same packet id is resent, not sure why azure c sdk client is sending with IS_DUP=false and server side should be able to detect that it is a duplicate pkt by using this packet id ?

Kindly help us in solving this issue. Thanks

CIPop commented 1 year ago

Is seeing the duplicate messages in Azure Hub expected behaviour when we use QoS 1? If so, could you guide me how to use QoS 2?

IoT Hub does not support QoS 2.

Based on the logs, it appears we do have a problem setting the DUP flag. @ewertons @ericwol-msft From the spec:

The DUP flag MUST be set to 1 by the Client or Server when it attempts to re-deliver a PUBLISH Packet [MQTT-3.3.1.-1]. The DUP flag MUST be set to 0 for all QoS 0 messages [MQTT-3.3.1-2].

@acode-x , @gopicisco The main issue is caused by rapid reconnects. I see 4 reconnects / second:

 01:30:07 CONNECT
 01:30:07 CONNECT
 01:30:08 CONNECT
 01:30:08 CONNECT

One immediate mitigation while we work on a fix, would be to switch the protocol from MQTT to AMQP and check if you still see duplicates.

We need more information to repro/analyze why this is happening:

  1. This is usually caused by two separate processes or devices using the same identity when connecting to hub. By design, the last one connecting will kick out all other instances. Please verify a single process/device connects with this identity at any point in time.
  2. Please follow https://github.com/Azure/azure-iot-sdk-c#support to open a support ticket from your Azure subscription. We would need private information (such as IoT Hub, device ID, etc. that must not be shared on GitHub) in order to check server-side logs.

We will keep this issue active to track MQTT DUP handling.

CIPop commented 1 year ago

@acode-x No more information for the last 2 weeks. Closing for now. Please feel free to add more information here and we can re-open the issue.

ewertons commented 1 year ago

The lack of DUP flag on resent telemetry messages is now fixed in the azure-iot-sdk-c (@main). If you still see issues with duplicate messages on event hub, please file a service request as recommended by @CIPop

gopicisco commented 1 year ago

Thanks. May I know when this fix will be available and in which official release? We will plan accordingly. Thanks

ericwolz commented 1 year ago

The next LTS is Jan 2024.

https://github.com/Azure/azure-iot-sdk-c#lts-schedule

AnilChoudhury-Eaton commented 1 month ago

@ericwolz @ewertons We are observing this issue now for AMQP also for 2019-05-16 version of the azure-iot-sdk-c. We see some messages even after receiving successful from sdk, are getting pushed multiple times (2 to 50s) in over 8-hour duration. I can't see similar fix in AMQP code. Is it really fixed with latest version of SDK, so that we can update our sdk to get it fixed?