Azure / azure-iot-sdk-csharp

A C# SDK for connecting devices to Microsoft Azure IoT services
Other
464 stars 493 forks source link

[Bug Report] [v2] Opened connection on a device consumes my message #3335

Closed bastyuchenko closed 1 year ago

bastyuchenko commented 1 year ago

Hello I implemented communication between a device and a backend server. I use previews/v2 SDK

My code for a device:

static async Task Main(string[] args)
        {
            X509Certificate2 x509Certificate = Helper.LoadProvisioningCertificate();
            var security = new AuthenticationProviderX509(x509Certificate);

            var provClient = new ProvisioningDeviceClient(
                "global.azure-devices-provisioning.net",
                 Helper.IdScope,
                security,
                options: new ProvisioningClientOptions(new ProvisioningClientMqttSettings()));

            var result = await provClient.RegisterAsync();

            IAuthenticationMethod auth = new ClientAuthenticationWithX509Certificate(x509Certificate, security.GetRegistrationId());

            var deviceClient = new IotHubDeviceClient(result.AssignedHub, auth, new IotHubClientOptions(new IotHubClientMqttSettings()));
            deviceClient.ConnectionStatusChangeCallback = (ConnectionStatusInfo info) =>
            {
                Console.WriteLine($"Device Status {info.Status}; Change reason: {info.ChangeReason}; Recommended action {info.RecommendedAction}.");
            };
            await deviceClient!.OpenAsync(CancellationToken.None);

            await Task.Delay(-1);
        }

Device (uses Self-signed X509 Certificate Auth type, MQTT protocol for communication) Backend service (uses AMQP protocol for communication)

Steps to reproduce:

  1. Device: is attested in DPS
  2. Device: connects (create client and open connection) to assigned IoT Hub (from step 1)
  3. Backend service: sends a message
  4. Expected result: the message is stored in IoT Hub and is waiting until the device starts a listener (deviceClient.SetIncomingMessageCallbackAsync(...)) Actual result: Backend service: receives message feedback.
    {
    "EnqueuedOnUtc": "2023-05-29T20:49:52.395+00:00",
    "Records": [
    {
      "originalMessageId": "b2db3ca4-6f37-4b27-bd25-5c1d90a61943",
      "deviceGenerationId": "638156339749833913",
      "deviceId": "iothubx509device1",
      "enqueuedTimeUtc": "2023-05-29T20:49:48.2826815+00:00",
      "statusCode": "Success",
      "description": "Success"
    }
    ]
    }

It looks like something on the device side receives messages, but I haven't started any incoming message callback.

Steps to reproduce (another approach):

  1. Backend service: sends a message to the device Result: the message is still stored in an IoT Hub (see Azure Portal) image
  2. Device: has been attested in DPS Result: the message is still stored in an IoT Hub (see Azure Portal) image
  3. Device: connects to assigned IoT Hub (from step 2) Expected result: the message is stored in an IoT Hub Actual result: the message has disappeared from an IoT Hub (see Azure Portal) image Backend service: receives message feedback.
    {
    "EnqueuedOnUtc": "2023-05-30T05:19:25.598+00:00",
    "Records": [
    {
      "originalMessageId": "24434c54-1177-428d-9a44-2bd986c95a6e",
      "deviceGenerationId": "638156339749833913",
      "deviceId": "iothubx509device1",
      "enqueuedTimeUtc": "2023-05-30T05:16:05.3227162+00:00",
      "statusCode": "Success",
      "description": "Success"
    }
    ]
    }

It looks like the opening connection consumes the message. But I expect that deviceClient.SetIncomingMessageCallbackAsync(...) will consume the message when I set one.

From my point of view, it's a bug. Let's imagine a real situation, Backend sends a message to Device but Device is offline or established on a vehicle that moves through the different areas with bad or absent mobile internet connection and always reconnects. It means that the sent message will be lost.

As proof that my code works at all ... Steps to reproduce:

  1. Device: is attested in DPS
  2. Device: connects (create client and open connection) to assigned IoT Hub (from step 1)
  3. Device: starts listener deviceClient.SetIncomingMessageCallbackAsync(...)
  4. Backend service: sends a message
  5. Device: receives and handles the message in deviceClient.SetIncomingMessageCallbackAsync(...) as expected
bastyuchenko commented 1 year ago

Another way to reproduce the issue from Azure Portal:

Success (message appears in the list in Azure Portal) if no device functionality code (mentioned above) launched

  1. Send a message from Azure Portal image

image

  1. As expected the message appeared in the devices identities list image

Fail (message disappears in the list in Azure Portal) if no device functionality code (mentioned above) launched

  1. Start a console app with the code mentioned in the issue description above
  2. Send a message from Azure Portal image

image

  1. The message doesn't appear in the devices identities list. image

It seems the code that was run in the console app somehow consumed this message. But how? I've only opened the connection to IoT Hub, with no handler/listener/IncommingMessageCallback functionality in my code.

bastyuchenko commented 1 year ago

There is no such issue if I use AMQP instead of MQTT (IotHubClientAmqpSettings instead of IotHubClientMqttSettings). But I really need to use MQTT, I cannot replace it with AMQP.

tmahmood-microsoft commented 1 year ago

Hi @bastyuchenko, the changes to fix this behavior have been merged. Please let me know if you face any further issues.

bastyuchenko commented 1 year ago

Hi @tmahmood-microsoft , Currently I see a very weird behavior. I'm running the same code mentioned in this topic and

So, I'm going to investigate the issue on my side and return to you with additional details. Please don't close this issue for awhile.

bastyuchenko commented 1 year ago

Hi @tmahmood-microsoft I pulled the latest version of previews/v2 branch after your changes and tried to debug Azure IoT SDK .NET from my solution to understand how SDK consume messages and highlight the cases when SDk consumes messages implicitly.

  1. I observe different behavior for different cases.
    • IF my device connects to Azure IoT Hub without deviceClient.SetIncomingMessageCallbackAsync(NotNullCallbackMethod), and the incoming message callback has never been set before (during the previous connections from my device to IoT Hub) THEN messages are NOT consumed from IoT Hub and NOT saved in "offline queue". I assume this is because there is no open session for my device's clientId and the device is not receiving any session-related messages. There is no session that could remain from a previous connection.
    • IF my device connects to Azure IoT Hub without deviceClient.SetIncomingMessageCallbackAsync(NotNullCallbackMethod), and the incoming message callback has been "turned off" (means deviceClient.SetIncomingMessageCallbackAsync(null) called) in previous connections of my device to IoT Hub THEN messages are NOT consumed from IoT Hub and NOT saved in "offline queue". I suppose deviceClient.SetIncomingMessageCallbackAsync(null) close the session and it behaves in such a way due to the same reason as in the previous case.
    • IF my device set an incoming message callback and doesn't disable callback in previous connections and my application connects to Azure IoT Hub without deviceClient.SetIncomingMessageCallbackAsync(NotNullCallbackMethod), THEN messages disappear from IoT Hub and saved in "offline queue" _messageQueue.Enqueue(receivedCloudToDeviceMessage) of SDK. I suppose new connection uses the session from the previous connection because CleanSession=false specified by default.

Thus, it looks like there are the tricky cases that are not obvious for a developer, and it requires for the developer to know how MQTT protocol works at all.

bastyuchenko commented 1 year ago

Also, there is a bug described here. I want to understand is it an Azure IoT Hub bug or a .NET SDK bug? Why .NET SDK cannot work with Azure IoT Hub if "Feature" is not GWV2?

tmahmood-microsoft commented 1 year ago

Hi @bastyuchenko I appreciate your time in testing different scenarios using MQTT. The scenarios you described are actually correct and that's how the protocol is expected to behave.

You can find the documentation for CleanSession within the SDK here You can also find detailed description of how cloud-to-device messages are received using MQTT here

Also, I really appreciate your feedback on how the current SDK documentation can be insufficient for understanding these tricky cases and will update the documentation accordingly.

tmahmood-microsoft commented 1 year ago

Also, there is a bug described here. I want to understand is it an Azure IoT Hub bug or a .NET SDK bug? Why .NET SDK cannot work with Azure IoT Hub if "Feature" is not GWV2?

Regarding this bug, I am currently looking into it and will update you on it soon.

tmahmood-microsoft commented 1 year ago

Hi @bastyuchenko, regarding the issue with MQTT message being rejected, I am able to reproduce it in GWv1 but not in GWv2. Are you seeing the same behavior? Also, I have created a ticket with the gateway team to look further into this.

bastyuchenko commented 1 year ago

Hi @tmahmood-microsoft ,

I am able to reproduce it in GWv1 but not in GWv2. Are you seeing the same behavior? Yes, the same in my case. MQTT message is rejected if all 3 conditions met:

  • a connection is opened from .NET SDK side
  • there is no open session in Azure IoT Hub for the client I have opened connection for
  • Azure IoT Hub does not have GWv2 in Features. (I don't see GWv1 in Features of Azure IoT Hub explicitly maybe because it is default value).

I described my experience regarding this issue here.

I have created a ticket with the gateway team to look further into this.

Could you give me the link to this issue if it is possible, to follow a future possible discussion?

tmahmood-microsoft commented 1 year ago

@bastyuchenko this seems like a possible bug with GWv1. GWv2 is latest version for IoT Hub service and Microsoft is currently in process of migrating all v1 hubs to v2. Unfortunately the tickets are created on internal use only platforms. I will keep you updated on any progress on it.

tmahmood-microsoft commented 1 year ago

Hi @bastyuchenko, I have confirmed that this issue is a known bug in GWv1 and has been fixed in GWv2. Currently, I would highly recommend you switching to GWv2 for any of your future testing and development. GWv2 has fixes for few other known bugs in v1 and Microsoft is already in process of migrating all IoT hubs to GWv2. Please let me know if you have any questions or are facing any more issues.

TYLOGIC commented 5 months ago

Not sure how this is resolved or how others are using the MQTT protocol successfully when a device is offline. Our hub has been upgraded to GWv2 and we are using the 2.0 preview SDK. If the queue has a pending C2D message and the device connects, 90% of the time those messages are lost. The SDK doesn't allow us to set the callback prior to opening the connection.

Please advise. We have an active case open with support trying to resolve this issue.

bastyuchenko commented 5 months ago

Hi @TYLOGIC Did you read my description of different scenarious above? - https://github.com/Azure/azure-iot-sdk-csharp/issues/3335#issuecomment-1601048849

and this discussion about the root cause why it is implemented in such a way? - https://github.com/Azure/azure-iot-sdk-csharp/pull/3336