Azure / azure-iot-sdk-c

A C99 SDK for connecting devices to Microsoft Azure IoT services
https://azure.github.io/azure-iot-sdk-c
Other
584 stars 739 forks source link

Azure SDK unreliably connecting to IoT Hub. Error reason: IOTHUB_CLIENT_CONNECTION_NO_NETWORK #2571

Closed YrnehEpoc closed 3 months ago

YrnehEpoc commented 6 months ago

Development Machine, OS, Compiler (and Other Relevant Toolchain Info)

nRF52840 Cross Compiled on Windows 10 for MBED OS using IAR Arm 8.50.4

SDK Version (Please Give Commit SHA if Manually Compiling)

Release version 1.2.7

Protocol

MQTT over websockets

Describe the Bug

Potentially related to this issue however creating a new device/ regenerating SAS keys did not fix the issue

I'm trying to connect to my IoT hub using an nRF52840. I've loaded the Azure IoT C SDK version 1.2.7 onto it to send telemetry to the cloud. I'm using the SAS authorization to connect. My device also has the DigiCert Baltimore Root, Microsoft RSA Root Certificate Authority 2017, DigiCert Global Root G2, Microsoft ECC Root Certificate Authority 2017, and DigiCert Global Root G3 certificates for authentication. I'm able to connect my device to a WIFI Hotspot and attempt to send telemetry. Sometimes, telemetry is sent with no issue to my iotHub, which I verify by using IoT Explorer to view the telemetry. However, most of the time the device attempts to send telemetry and instead receives an NSAPI error of -3016 "Connection Lost", which shows up as connection not authenticated with the reason being IOTHUB_CLIENT_CONNECTION_NO_NETWORK. I'm unsure of what's causing this but I believe it's the IoT hub shutting the connection to my device for an unknown reason.

I've tried running the latest version of the telemetry sample (1.11.0) as well as the an equivalent version (1.2.7) on my windows laptop. It builds and connects to my IoT Hub without issue. I then connect my laptop to the WIFI hotspot and rerun the telemetry sample, which connects to my IoT Hub without issue. If I then use the nRF device to send telemetry using the WIFI Hotspot, it is able to send telemetry without issue as well. However, after power cycling the hotspot and trying to connect to my IoT Hub with my nRF device, I only get unreliable connections to the hub.

I'm unable to provide code as this is one component of a larger system and relies on threading and a messaging system to coordinate components

Console Logs Example of failing to connect the device via wifi hotspot: AZURE, Connection string: HostName=xxxxx.azure-devices.net;DeviceId=xxxxx;SharedAccessKey=xxxxxxxxx socket io_state (CREATED)->(CLOSED) AZURE, MQTT SM transitioning to send telemetry.. AZURE, Queue 2 active AZURE, Number of messages to be sent: 6 AZURE, Number of messages sent: 1 AZURE, MQTT SM transitioning to wait for response.. socketio_close: 2. socket io_state (CLOSED)->(DESTROYED) socket io_state (CREATED)->(CLOSED) socket io_state (CLOSED)->(OPEN) socketio_dowork: 3, send_result: -3016 socket io_state (OPEN)->(ERROR) socket io_state (ERROR)->(CLOSED) socketio_close: 4. AZURE, Connection not authenticated AZURE, reason is IOTHUB_CLIENT_CONNECTION_NO_NETWORK socketio_close: 2. socket io_state (CLOSED)->(DESTROYED)

Example of succeeding to connect the device via WiFI hotspot:

AZURE, RSSI is -35 AZURE, Connection string: HostName=xxxxxxxx.azure-devices.net;DeviceId=xxxx;SharedAccessKey=xxxxxxx socket io_state (CREATED)->(CLOSED) AZURE, MQTT SM transitioning to send telemetry.. AZURE, Queue 2 active AZURE, Number of messages to be sent: 6 AZURE, Number of messages sent: 1 AZURE, MQTT SM transitioning to wait for response.. socketio_close: 2. socket io_state (CLOSED)->(DESTROYED) socket io_state (CREATED)->(CLOSED) psocket port: 8883socket io_state (CLOSED)->(OPEN) -> 18:09:53 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 192 | USERNAME: xxxxx-Test.azure-devices.net/test/?api-version=2017-11-08-preview&DeviceClientType=iothubclient%2f1.2.7%20(native%3b%20mbed%20v5.x%3b%20undefined) | PWD: XXXX | CLEAN: 0 <- 18:09:53 CONNACK | SESSION_PRESENT: true | RETURN_CODE: 0x0 -> 18:09:54 SUBSCRIBE | PACKET_ID: 2 | TOPIC_NAME: devices/test/messages/devicebound/# | QOS: 1 <- 18:09:54 SUBACK | PACKET_ID: 2 | RETURN_CODE: 1 -> 18:09:54 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/test/messages/events/%24.cid=CORE_ID&%24.mid=MSG_ID&%24.ct=application%252Fjson&%24.ce=utf-8 | PACKET_ID: 3 | PAYLOAD_LEN: 11 <- 18:09:55 PUBACK | PACKET_ID: 3 AZURE, Confirmation received AZURE, Number of confirmations received: 1 AZURE, MQTT SM transitioning to send telemetry.. AZURE, Number of messages sent: 2 AZURE, MQTT SM transitioning to wait for response.. -> 18:09:55 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/test/messages/events/%24.cid=CORE_ID&%24.mid=MSG_ID&%24.ct=application%252Fjson&%24.ce=utf-8 | PACKET_ID: 4 | PAYLOAD_LEN: 11 <- 18:09:56 PUBACK | PACKET_ID: 4 AZURE, Confirmation received AZURE, Number of confirmations received: 2 AZURE, MQTT SM transitioning to send telemetry.. AZURE, Number of messages sent: 3 AZURE, MQTT SM transitioning to wait for response.. -> 18:09:56 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/test/messages/events/%24.cid=CORE_ID&%24.mid=MSG_ID&%24.ct=application%252Fjson&%24.ce=utf-8 | PACKET_ID: 5 | PAYLOAD_LEN: 11 <- 18:09:56 PUBACK | PACKET_ID: 5 AZURE, Confirmation received AZURE, Number of confirmations received: 3 AZURE, MQTT SM transitioning to send telemetry.. AZURE, Number of messages sent: 4 AZURE, MQTT SM transitioning to wait for response.. -> 18:09:57 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/test/messages/events/%24.cid=CORE_ID&%24.mid=MSG_ID&%24.ct=application%252Fjson&%24.ce=utf-8 | PACKET_ID: 6 | PAYLOAD_LEN: 11 <- 18:09:57 PUBACK | PACKET_ID: 6 AZURE, Confirmation received AZURE, Number of confirmations received: 4 AZURE, MQTT SM transitioning to send telemetry.. AZURE, Number of messages sent: 5 AZURE, MQTT SM transitioning to wait for response.. -> 18:09:58 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/test/messages/events/%24.cid=CORE_ID&%24.mid=MSG_ID&%24.ct=application%252Fjson&%24.ce=utf-8 | PACKET_ID: 7 | PAYLOAD_LEN: 11 <- 18:09:58 PUBACK | PACKET_ID: 7 AZURE, Confirmation received AZURE, Number of confirmations received: 5 AZURE, MQTT SM transitioning to send telemetry.. AZURE, Number of messages sent: 6 AZURE, MQTT SM transitioning to wait for response.. -> 18:09:58 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/test/messages/events/%24.cid=CORE_ID&%24.mid=MSG_ID&%24.ct=application%252Fjson&%24.ce=utf-8 | PACKET_ID: 8 | PAYLOAD_LEN: 11 <- 18:09:58 PUBACK | PACKET_ID: 8 AZURE, Confirmation received AZURE, Number of confirmations received: 6 socket io_state (OPEN)->(CLOSED) -> 18:09:59 DISCONNECT socketio_close: 2. socket io_state (CLOSED)->(DESTROYED)

Example of succeeding to connect the device via WiFi only to have it lose connection mid message exchange:

AZURE, RSSI is -43 AZURE, Connection string: HostName=xxxxxx.azure-devices.net;DeviceId=xxxxx;SharedAccessKey=xxxxxx socket io_state (CREATED)->(CLOSED) AZURE, MQTT SM transitioning to send telemetry.. AZURE, Queue 2 active AZURE, Number of messages to be sent: 18 AZURE, Number of messages sent: 1 AZURE, MQTT SM transitioning to wait for response.. socketio_close: 2. socket io_state (CLOSED)->(DESTROYED) socket io_state (CREATED)->(CLOSED) socket io_state (CLOSED)->(OPEN) -> 00:00:23 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 192 | USERNAME: xxxxxxx.azure-devices.net/test/?api-version=2017-11-08-preview&DeviceClientType=iothubclient%2f1.2.7%20(native%3b%20mbed%20v5.x%3b%20undefined) | PWD: XXXX | CLEAN: 0 <- 00:00:23 CONNACK | SESSION_PRESENT: true | RETURN_CODE: 0x0 -> 00:00:23 SUBSCRIBE | PACKET_ID: 2 | TOPIC_NAME: devices/test/messages/devicebound/# | QOS: 1 <- 00:00:24 SUBACK | PACKET_ID: 2 | RETURN_CODE: 1 -> 00:00:24 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/test/messages/events/%24.cid=CORE_ID&%24.mid=MSG_ID&%24.ct=application%252fjson&%24.ce=utf-8 | PACKET_ID: 3 | PAYLOAD_LEN: 7 <- 00:00:25 PUBACK | PACKET_ID: 3 AZURE, Confirmation received AZURE, Number of confirmations received: 1 AZURE, MQTT SM transitioning to send telemetry.. AZURE, Number of messages sent: 2 AZURE, MQTT SM transitioning to wait for response.. -> 00:00:25 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/test/messages/events/%24.cid=CORE_ID&%24.mid=MSG_ID&%24.ct=application%252fjson&%24.ce=utf-8 | PACKET_ID: 4 | PAYLOAD_LEN: 7 <- 00:00:25 PUBACK | PACKET_ID: 4 AZURE, Confirmation received AZURE, Number of confirmations received: 2 AZURE, MQTT SM transitioning to send telemetry.. AZURE, Number of messages sent: 3 AZURE, MQTT SM transitioning to wait for response.. -> 00:00:28 PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/test/messages/events/%24.cid=CORE_ID&%24.mid=MSG_ID&%24.ct=application%252fjson&%24.ce=utf-8 | PACKET_ID: 5 | PAYLOAD_LEN: 7 socketio_dowork: 3, send_result: -3016 socket io_state (OPEN)->(ERROR) AZURE, Connection not authenticated AZURE, reason is IOTHUB_CLIENT_CONNECTION_NO_NETWORK Error: Time:Mon Jan 1 00:00:32 1900 File:mqtt_client.c Func:sendPacketItem Line:344 540: Failure sending control packet data Error: Time:Mon Jan 1 00:00:32 1900 File:mqtt_client.c Func:mqtt_client_disconnect Line:1196 Error: mqtt_client_disconnect send failed socket io_state (ERROR)->(CLOSED) socket io_state (CLOSED)->(CLOSED) socketio_close: 4. socket io_state (CLOSED)->(DESTROYED)

ericwolz commented 6 months ago

This usually happens if you have two devices using the same device ID. Only one device with a specified device ID can be connected to a hub. If a new connection is established, the old connection is terminated by the IoT hub.

YrnehEpoc commented 6 months ago

Hmm, I only have one device that's attempting to connect, and looking through the code it only calls IoTHubClient_LL_CreateFromConnectionString() once before running into the IOTHUB_CLIENT_CONNECTION_NO_NETWORK error.

YrnehEpoc commented 6 months ago

I think I may have discovered the issue. according to the ReadMe, mqtt-ws only supports X-509 CA at the moment. I'm currently using SaS to authenticate which would explain the issue

YrnehEpoc commented 6 months ago

Disregard that, I tested with the IoT Hub Client telemetry sample using MQTT over Websockets with SaS and it was able to connect. Back to square one

ewertons commented 3 months ago

Hi @YrnehEpoc , have you figured out this issue? Sorry for the delay.

Based on your description, since you were able to connect and send at least some telemetry, it means your setup is nominally correct. Now to the IOTHUB_CLIENT_CONNECTION_NO_NETWORK error, since you are using MQTT transport it can only come from here: https://github.com/Azure/azure-iot-sdk-c/blob/main/iothub_client/src/iothubtransport_mqtt_common.c#L2277

processErrorCallback is called from uMQTT (the underlying mqtt-client support library), which will be called if:

The NSAPI error -3016 is indicative that the network connection was either flaky or not available at all. Have you tried connecting to other resources in the internet and seeing if the connection holds up?

Now unfortunately your logs above are old and we no longer have server logs. If you still have the problem and can get a repro, please share the verbose logs with time stamps and the hub name and we will verify for you.

ewertons commented 3 months ago

Hi @YrnehEpoc , we will close this issue for now, but please feel free to reopen it if you would like to follow up. Thanks, Azure IoT SDK Team