Lock Token Exception in MqttTransporthandler.cs

lofi96 commented 3 years ago

Since some weeks we are facing the problem of the following Exception: "Lock token is stale or never existed. The message will be redelivered, please discard this lock token and do not retry operation." -> MqttTransportHandler.cs

Looking at the Code you see that this Exception is thrown when ever the lock token does not start with the current _generationId. So we ask us what caused the regeneration of the _generationId -> Cause: new initialization of the MqttTransportHandler class And this is caused of the following statement: public TransportState State => (TransportState)Volatile.Read(ref _state); (line 77 - MqttTransportHandler)

We had a look into the documentation of this Method coming from the System.Thread library but this did not really help.

In our business case we were running 2 or more Programs using the Azure IoT SDK and the MqttTransporthandler class and this causes a new initialization of the MqttTransporthandler every few seconds. So we get the Problem that we cannot Complete the Message on the Azure IoT Hub and getting the same Message again and again. But we also got this Problem with only one Program running.

Pls give me an advice how to solf this issue.

I'm also asking me why it is necessary to check if the MqttTransporthandler is the same when Completing the Message as the MqttTransporthandler I used to get the message from the hub.

Ilustration of the Problem: grafik

Thanks for your Feedback!

abhipsaMisra commented 3 years ago

While completing a message the generation Id is matched to ensure that you're completing a message from a client to which the message was sent (i.e., if you have 2 different devices receiving messages, you can complete the message intended for device 1 only from device 1).

In our business case we were running 2 or more Programs using the Azure IoT SDK and the MqttTransporthandler class and this causes a new initialization of the MqttTransporthandler every few seconds.

Could you expand a bit on this statement? When you say "2 or more Programs using the Azure IoT SDK", do you mean that all of these programs initialize the same device client instance, or are they different device client instances? The reason I ask is, we never recommend that you initialize multiple instances of the same device client instance, since this will cause the transport layer to fight for the same resources, and will cause the devices to go into a disconnection - connection - disconnection loop.

You've also mentioned that you see this with a single program running - would you be able to collect the sdk logs using these instructions, and share them with us?

vinagesh commented 3 years ago

@lofi96 - Let us know if you have any logs that could help. Also as Abhipsa mentioned above, we would like to understand more about your scenario and what your are trying to do.

az-iot-builder-01 commented 3 years ago

@lofi96, @abhipsaMisra, @vinagesh, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey

lofi96 commented 3 years ago

Could you expand a bit on this statement? When you say "2 or more Programs using the Azure IoT SDK", do you mean that all of these programs initialize the same device client instance, or are they different device client instances?

What I mean is, that we have developed a project that is receiving IoT Messages from the Azure Hub. We compiled the project and installed it on a Server of our Customer 2 times. One Project is connected to the Integration IoT Hub of the Customer. The other one to the Production Hub. So there should be different device client instances.

The reason I ask is, we never recommend that you initialize multiple instances of the same device client instance, since this will cause the transport layer to fight for the same resources, and will cause the devices to go into a disconnection - connection - disconnection loop.

I'm not sure here: Do you we initialize "multiple instances of the same device client instance" when we run the project multiple times? If yes, what would be the best practice here? Because as mentioned obove - the clients are connected to different IoT Hubs..

You've also mentioned that you see this with a single program running - would you be able to collect the sdk logs using these instructions, and share them with us?

Thanks for this! I will implement it and provide the logs as fast a possible!

drwill-ms commented 3 years ago

@lofi96

The key here is that only 1 DeviceClient instance can be active for any single device in a hub at a time.

If 2 or more DeviceClient instances for the same device in a hub are active, they will fight with each other over the connection, the hub will disconnect them, their retry logic will cause reconnection, which will cause hub to disconnect another, and so on in a vicious cycle.

abhipsaMisra commented 3 years ago

I'm not sure here: Do you we initialize "multiple instances of the same device client instance" when we run the project multiple times? If yes, what would be the best practice here? Because as mentioned obove - the clients are connected to different IoT Hubs..

It depends on which endpoint your application is connecting to. If you are talking to two different hubs, then that means that you have two different device instances and hence, two different device identities. This means, if connecting using sas tokens, that you would be using two different connection strings in your application. This is fine to do, and you should not see any error. However, this would also mean that the c2d messages are being sent to two different hubs, received on two different devices, and hence you should not be losing any lock token reference.

The device sdk logs from a single program run should help us figure out why you are losing the lock token reference.

abhipsaMisra commented 3 years ago

@lofi96 Were you able to reproduce this issue and collect the logs?

abhipsaMisra commented 3 years ago

Were you able to collect the logs for this @lofi96 ?

chmarti commented 3 years ago

I'm seeing this problem, but only when calling CompleteAsync / RejectAsync from within a callback set up via SetReceiveMessageHandlerAsync. If I use ReceiveAsync to get messages, I do not get the exception when I call CompleteAsync / RejectAsync.

This works:

IAuthenticationMethod auth = new DeviceAuthenticationWithRegistrySymmetricKey(_params.Id, security.GetPrimaryKey());
var deviceClient = DeviceClient.Create(_params.HubHostName, auth, TransportType.Mqtt);
var sentMessage = await deviceClient.ReceiveAsync();
await deviceClient.CompleteAsync(sentMessage.LockToken);

This throws the exception mentioned in this issue

IAuthenticationMethod auth = new DeviceAuthenticationWithRegistrySymmetricKey(_params.Id, security.GetPrimaryKey());
var deviceClient = DeviceClient.Create(_params.HubHostName, auth, TransportType.Mqtt);
await deviceClient.SetReceiveMessageHandlerAsync(OnC2dMessageReceived, null);

Where OnC2dMessageReceived only contains the same call to CompleteAsync(sentMessage.LockToken)

abhipsaMisra commented 3 years ago

@lofi96 - Were you able to reproduce this issue with a single device instance, and collect the sdk logs?

@chmarti - Completing the message within the receive message callback should not throw a lock token lost exception, unless you're referencing an invalid token, or are attempting to complete a message after 1 min of its delivery (lock tokens are valid for 1 min): https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-messages-c2d#the-cloud-to-device-message-life-cycle If you feel that you are following the specified pattern but still seeing an error, please share a small working sample or sdk logs reproducing the issue.

lofi96 commented 3 years ago

@abhipsaMisra thanks for all your replies! I was busy some other projects and was not able to collect the logs. I will do it within the next days and provice them to you :)

abhipsaMisra commented 3 years ago

Any update on this @lofi96 ?

lofi96 commented 3 years ago

@abhipsaMisra Thank you for your patience! I have now tested the whole scenario again in more detail and could not reproduce the error from back then. I still got the LockToken error when I connected 2 times to the same IoT Hub. However, I also get the exception sometimes when there are not 2 connections to a hub. In that case, what would be your best practice to handle this? At the moment I implement a functionality that recreate the DeviceClient (to get a new generationId) and then retrys to to receive and Complete the Message.

abhipsaMisra commented 3 years ago

You do not need to recreate device clients in order to avoid a LockToken error, instead you'll simply need to ensure that:

the message that you're trying to complete was received by the same device.
the message that you're trying to complete was received within the last minute.

Each connection from a device client application to an IoT Hub represents a unique device (unless your application has some sort of a gateway scenario implemented). For this particular device, you should be able to receive and complete C2D messages over MQTT without having to continuously recreate the device clients, barring the device getting into a "disconnected" state. We have a sample here that demonstrates the recommended usage pattern. We also have another sample here that demonstrates the different ways (polling vs callback) that you can receive C2D messages.

Let me know if you have any further questions.

abhipsaMisra commented 3 years ago

@lofi96 Are you unblocked on this issue or do you still need some input from us?

abhipsaMisra commented 3 years ago

I am closing this issue due to lack of activity. @lofi96 Feel free to re-open this issue if you still need any inputs from us!

Azure / azure-iot-sdk-csharp

Lock Token Exception in MqttTransporthandler.cs #1698