[v2 - bug fix] Queue MQTT C2D messages if callback not set

tmahmood-microsoft commented 1 year ago

Fix for https://github.com/Azure/azure-iot-sdk-csharp/issues/3335

In v2, if incoming message callback has not been set, messages are still received because cleanSession = false (default) and the client subscribes to topic from last session. This causes messages to be received but the callback has not been set and the message is eventually completed, since MQTT messages cannot be rejected or abandoned. This PR adds changes where we queue the received messages if callback has not been set and processes them later when callback has been set.

timtay-microsoft commented 1 year ago

Could we just ask users to set the callback before calling open? Or is this to handle a case other than explicit open?

abhipsaMisra commented 1 year ago

Could we just ask users to set the callback before calling open? Or is this to handle a case other than explicit open?

I don't think the SDK lets you do anything until the client is opened.

timtay-microsoft commented 1 year ago

Could we just ask users to set the callback before calling open? Or is this to handle a case other than explicit open?

I don't think the SDK lets you do anything until the client is opened.

So we would have this same problem with twin/direct method messages, right? I'd prefer if we allow users to set the callbacks before opening the client, but that may be a more involved fix than we need right now. Because of that, I won't push back if this is the approach we take.

Alternatively, we could opt to make cleanSession = true the default behavior. That way users can open their clients, then set their callbacks (which subscribe to the twin/c2d/methods messages) without risk of losing any. It makes things a bit more tedious for users since they'd either have to set their callback after each open call or opt-in to cleanSession=false, but I don't have any other issues with it. Scratch that. Reconnection scenarios would be a nuisance as well

abhipsaMisra commented 1 year ago

This is technically a workaround for service's current behavior. Ideally, a c2d message would have been published with a QoS of 1, and the client would choose to not ack it and service would redeliver them at some frequency. Once the user sets the callback, those messages could then be ack'ed. Since service doesn't queue and redeliver unack'ed messages currently, the client needs to do that.

Alternatively, yeah, we could ask users to set the callback before opening the clients. I think that is a reasonable ask but yeah, it will require a bigger change.

Direct methods inherently require there to be an opened client (or, soon to-be opened client), so it isn't that big of an issue there. Deliver once, if there is no ack then fail immediately. Nothing gets queued. Twin updates aren't persisted either. Hub recommends that you call GetTwin explicitly to retrieve the last known state. So, this isn't really an issue there either.

bastyuchenko commented 1 year ago

Hello, As the issue #3335 creator I would like to share my thoughts:

this PR suggests having an offline queue, but we already have IoT Hub that works as a queue. If I need to have offline queue, I can create it by myself in my solution. I expect that SDK is a wrapper for an existing Azure service but the PR suggests an implementation of its own "message broker" logic with "offline queue" that is limited by MessageQueueSize and is not obvious for developers who will consume this SDK.
set callback before opening connection can be the solution but.... currently it works in the following way

send a few c2d messages to IoT Hub Actual Result: the messages have appeared in IoT Hub
open a connection from a device side Actual Result: the messages were removed
send new c2d messages to IoT Hub - the messages removed only because the connection was opened, and no callback was specified Actual Result: the messages were removed

Let's imagine I want only communicate with Device Twin. I have to open connection to communicate with DeviceTwin. It means, behind-the-scenes, I always open such a 'black hole' that swallows all my telemetry messages only because I've opened connection to access to Device Twin. It's not what I expected, it is "side effect", it is not obvious, it obliges me to set a callback even I don't want to listen IoT Hub messages right now.

Ideally, a c2d message would have been published with a QoS of 1, and the client would choose to not ack it and service would redeliver them at some frequency. Once the user sets the callback, those messages could then be ack'ed. Since service doesn't queue and redeliver unack'ed messages currently, the client needs to do that.

I'm sorry in advance, maybe I didn't understand your point of view correctly but it sounds for me a little bit as an "override of cleanSession property behavior". Moreover, does it mean that an implementation of message feedback callback will become mandatory on service side?

Is it possible to open a connection but do not consume messages from IoT Hub if MQTT is used? BTW, how it works in .NET SDK v1? It looks .NET SDK v1 doesn't have such a problem. I open connection but messages do not disappear from IoT Hub (yes, I know that it is not necessary to open connection before set callback in .NET SDK v1).

Is it possible to check how it implemented in other SDKs (i.e. Python or C SDK). It would be great to have some consistency among SDKs.

abhipsaMisra commented 1 year ago

Thanks for sharing your inputs, @bastyuchenko , we appreciate it.

this PR suggests having an offline queue, but we already have IoT Hub that works as a queue. If I need to have offline queue, I can create it by myself in my solution. I expect that SDK is a wrapper for an existing Azure service but the PR suggests an implementation of its own "message broker" logic with "offline queue" that is limited by MessageQueueSize and is not obvious for developers who will consume this SDK.

IoT Hub is a message queue that will store messages for an offline client if the client had previously connected with CleanSession set to false. It will do so until the client reconnects again with CleanSession set to true. This happens as per contract today. Now, let us consider the subsequent steps for this client:

once the client connects, all pending messages will be delivered to it immediately. IoT Hub will not wait until it subscribes to C2D messages to deliver the offline messages to the client. This is because it has connected with CleanSession set to false.
for the SDK, it can process these messages only if the user can a callback set. Otherwise, it doesn't have anything to do with these messages. Ideally, the client would not ack these messages, and these being sent with QoS 1 they'd eventually get redelivered. This is the bit that doesn't happen today. IoT hub will send the C2D messages immediately on reconnection, and then never resend them (irrespective of if they were ack'ed or not. This is something the hub team is aware of, and they have it on their plan to fix).
since the messages were delivered as soon as the client connected, and they won't be redelivered by the service again, there are two options:
- force the client to subscribe to C2D messages if it wants to be able to process messages sent when offline, OR
- maintain an internal queue with these offline messages that can be delivered once the client subscribes to C2D messages

I agree that both approaches can have their cons:

for approach (1) - you are forced to subscribe to and process C2D messages as soon as the client connects, even if that is not what you want. One point of view is that MQTT spec defines that when clients connect with CleanSession set to false they should be ready to process offline messages as soon as they come online.
for approach (2) - I agree that this implements a client-side queue on top of service's queue. We'd like to avoid this as well, however, with service's current behavior we don't have many more options.

maybe I didn't understand your point of view correctly but it sounds for me a little bit as an "override of cleanSession property behavior". Moreover, does it mean that an implementation of message feedback callback will become mandatory on service side?

Could you explain what you meant by these two statements? I'm not sure if the above explanation satisfies your question here, but if it doesn't, I'd like to understand the question a bit better.

Is it possible to open a connection but do not consume messages from IoT Hub if MQTT is used?

The CleanSession property will direct hub to deliver offline messages as soon as the client reconnects. You can set CleanSession to true and forgo offline message queuing, but there isn't a way to store offline C2D messages and then deliver them after you subscribe to the topic.

BTW, how it works in .NET SDK v1? It looks .NET SDK v1 doesn't have such a problem. I open connection but messages do not disappear from IoT Hub (yes, I know that it is not necessary to open connection before set callback in .NET SDK v1).

The .NET SDK v1 also has this internal message queue in place, similar to the implementation in this PR.

bastyuchenko commented 1 year ago

Thanks for the detailed explanation, @abhipsaMisra

Could you explain what you meant by these two statements? I'm not sure if the above explanation satisfies your question here, but if it doesn't, I'd like to understand the question a bit better.

yes, above explanation fully satisfies my question here.

@tmahmood-microsoft, from https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-messages-c2d

When a device wants to receive a message, the IoT hub locks the message by setting the state to Invisible. This state allows other threads on the device to start receiving other messages.

All messages that are stored in "local offline queue" you implemented in this PR will be marked as Completed in IoT Hub or just stay Invisible (means returns after lock time-out)?

tmahmood-microsoft commented 1 year ago

Hi @bastyuchenko, all the messages that are stored in "local offline queue" implemented in this PR will be marked as Completed in IoT Hub.

tmahmood-microsoft commented 1 year ago

/azp run

azure-pipelines[bot] commented 1 year ago

Azure Pipelines successfully started running 1 pipeline(s).

tmahmood-microsoft commented 1 year ago

/azp run

azure-pipelines[bot] commented 1 year ago

Azure Pipelines successfully started running 1 pipeline(s).

tmahmood-microsoft commented 1 year ago

/azp run

azure-pipelines[bot] commented 1 year ago

Azure Pipelines successfully started running 1 pipeline(s).

Azure / azure-iot-sdk-csharp

[v2 - bug fix] Queue MQTT C2D messages if callback not set #3336