Azure / azure-iot-sdk-node

A Node.js SDK for connecting devices to Microsoft Azure IoT services
https://docs.microsoft.com/en-us/azure/iot-hub/
Other
261 stars 227 forks source link

[Technical Question] File upload notifications gets long delay to received by client #919

Closed Zachery2008 closed 3 years ago

Zachery2008 commented 3 years ago

We have a weird case that the file notification was created at local time 2021-01-13 09:18:55(UTC: 2021-01-13T17:18:55), but our service was not getting this file notification. However our service still seems good to get other devices file upload notifications. Around local time 13:29, I restarted the service, then that notification message was delivered to the service, which was at local time 2021-01-13 13:29:07. During that time , there was no error report, such as the connection was losing.

{
 "deviceId":"xxxxx",
"blobUri":"https://xxxxx.blob.core.windows.net/xxxxx/xxxx_xxx.txt",
"blobName":"xxxxx/xxxx_xxx.txt",
"lastUpdatedTime":"2021-01-13T17:18:55+00:00",
"blobSizeInBytes":9567724,
"enqueuedTimeUtc":"2021-01-13T17:19:25.1967744Z"
} 

The timestamp getting this message was :Jan 13, 2021 @ 13:29:07.275

How to explain this strange case? Can anyone help? Thank you!

anthonyvercolano commented 3 years ago

Was there any indication that the connection for the receiver had a disconnection at some point? Is this repeatable? What version of the service (iothub-client) client are you using?

Zachery2008 commented 3 years ago

Was there any indication that the connection for the receiver had a disconnection at some point? Is this repeatable? What version of the service (iothub-client) client are you using?

Thank for your reply. Is there any method to check the connection or disconnection? I also suspect the connection was lost, but how I can check it in code? What I did is just: create a client, use open() method to open the connection. Then use getFileNotificationReceiver() to receive messages. The following is the code.

const iotHubNotificationClient = Client.fromConnectionString(connectionString)
  iotHubNotificationClient.open((err) => {
    if (err) {
      Logger.emergency(`IoTHub connection failed. ${err.message}`)
    }
    iotHubNotificationClient.getFileNotificationReceiver((err, receiver) => {
      if (err) {
        Logger.emergency(`Failed to start IoTHub notification receiver. ${err.message}`)
      }
      receiver?.on('message', async(msg) => {
        Logger.info(`Received file upload notification from IoTHub: ${msg.data.toString()}`)

        receiver.complete(msg, async(err) => {
          if (err) {
            Logger.error(`Failed to complete file upload notification from IoTHub. ${err.message}`)
          } else {
            doing something here
          }
        })
      })
    })
  })

Do I need to add `receiver.on('errorReceived', ..) method? like this

  receiver?.on('errorReceived', async(err) => {
    Logger.error(`Error received from IoT Hub file upload notification. ${err}`)
  })
anthonyvercolano commented 3 years ago

Which version of the service client package (azure-iothub) are you using? Version 1.13.1 is the most current.

Zachery2008 commented 3 years ago

Which version of the service client package (azure-iothub) are you using? Version 1.13.1 is the most current.

Yes, we're using version 1.13.1.

vishnureddy17 commented 3 years ago

Also, what version of Node are you using?

Zachery2008 commented 3 years ago

Also, what version of Node are you using?

We are using node version "node:11-alpine" for Kubernetes container. Could this be the reason?

anthonyvercolano commented 3 years ago

In the above post, the code is all working with a single IoT Hub with a single consumer of the notification messages. Since the service was receiving notifications before and after when the missing notification should have been received, it would seem to indicate that a disconnect did NOT occur.

BTW you should be listening for 'error' and 'disconnect'.

This sorta leaves us with the possibility that the missing notification had not in fact been sent. Is there some other interaction between your service and the file upload client such that terminating the service would provide some sort of kick to the file upload code?

As for your version of node. (I am not a container expert, but I do know folk who are pretty adept.). I think the :11 indicates that you are using a container based on version 11 of node. Aren't these odd number versions not really ever "stable"? As for the alpine part, at least internally with our own testing infrastructure, alpine leaves a bit too be desired. To quote a colleague, "I recommend using images named “node”. Look for tags that have a suffix with the Debian build you like. Add “slim” to the tags to make them smaller."

One other point. There are a few places in the post where you are defining a function as async and specifying it as an event listener. These .on methods are not built to take callbacks that return promises. Specifically in this case: https://nodejs.org/api/events.html#events_capture_rejections_of_promises

But getting back to the code: What do you think of the possibility of the file uploader not completing?

Zachery2008 commented 3 years ago

@anthonyvercolano

Thank you for your suggestion. Last Friday, we added listeners for 'error' and 'disconnect'., and re-connect if those two happens. And yesterday, we do monitored a 'disconnect' event happened, then re-connect again.

It seems working fine now, the service may disconnect, then we just re-connect it.

vishnureddy17 commented 3 years ago

@Zachery2008 Glad we could help!