Challenging Task: find a way to propagate command acknowledgement to downstream applications

eclipse-hono / hono

Eclipse Hono™ Project

https://eclipse.dev/hono

Eclipse Public License 2.0

449 stars 137 forks source link

Challenging Task: find a way to propagate command acknowledgement to downstream applications #3585

Open julian-sotec opened 10 months ago

julian-sotec commented 10 months ago

Disclaimer: We are aware that this scenario is covered by the request / response command in hono - but we still think the acknowledgement propagation could be beneficial for downstream applications

MQTT Scenario:

a mqtt connected client connects to the mqtt broker and subscribes on the command topic - c///req/# - with QOS 1
a downstream application sends commands to the device eigther using the messaging infrastructure directly or via an api connected to the command router
the command is a oneway command - not a request / response command

In Hono the command is propagated via the command router to the mqtt adapter which forwards it to the connected device. The standard behaviour of mqtt clients is to acknowledge the command message via puback.

Now the challenging part: As Hono internally does know about the acknowledgement of the command message - it does not share this infromation with downstream applications. As a workarround we listen to the error topic and see if something goes wrong. However we cannot wait forever with this and this is - as I said only a workarround. As part of this issue we have seen cases where a client subscribing with QOS1 to c///req/# did not receive the command - which also raises questions:

Shouldn't there be a retry mechanism that retires sending a command to a device that has subscribed with QOS1?
Wouldn't it beneficial if the acknowledgement of a command message is published on the messaging infrastructure so client applications can return a positive answer?

We see two option of implementing this: 1.) Implement the acknowledgement propagation for both oneway commands and request / response commands Downsight - request / response commands have two ways of acknowledgement 2.) Implement the acknowledgement propagation for oneway commands only. In this way the downstream application can benefit from the ack.

Additional Info: we have read thru this Issue here - https://github.com/eclipse-hono/hono/issues/2273 - and think that there were similar discussions going back and forth when implementing the Command Acknowledgement.

Please let us know your thoughts on this. Any input is welcome.

calohmn commented 10 months ago

We see two option of implementing this: 1.) Implement the acknowledgement propagation for both oneway commands and request / response commands Downsight - request / response commands have two ways of acknowledgement 2.) Implement the acknowledgement propagation for oneway commands only. In this way the downstream application can benefit from the ack.

As we already have negative acknowledgments by means of command-response messages, I could imagine having option 2 implemented - positive ack for one-way commands only, sent via a command-response message with a special Hono content-type.

I would see this as something to be enabled on the one-way command message via a specific header (to be documented in the headers table here).

The documentation should then point out the caveats on how a positive ack can be interpreted, like it is done in the AMQP-based Command & Control API:

An attempt has been made to deliver the command to the device. However, it is unclear if the device has received (or processed) the command.

The device has acknowledged the reception of the command but has not processed the command yet.

The device has received and processed the command.

julian-sotec commented 10 months ago

@calohmn Thanks for your fast feedback. We will come up with a PR referencing this issue.

sophokles73 commented 7 months ago

Having taken a look at #3612 I would like to step back to make sure that we are on the same page regarding the use case and requirements.

My understanding is, that in general, there is not need for this at all, because Hono already provides means to solve the issue, which is the Reques/Response Command where a downstream application sends an application level command to a device and the device sends back an application level response. This is completely independent from the underlying transport protocol(s) being involved. And providing this kind of abstraction has been the main goal of Hono from the very beginning.

Now I also can guess where this issue is coming from: there are some devices that connect via MQTT and subscribe to a particular topic (using QoS1) on which they expect to receive a particular message which they then acknowledge using a PUBACK. I also assume that these devices cannot and/or should not be altered (anymore).

However, for the sake of understanding the requirements here, can we agree that if we were able to change these devices' firmware, then the right thing to do would actually be to have them support a Request/Response Command, i.e. to not only use a PUBACK to acknowledge the command but also to publish a Response message back to Hono?

sophokles73 commented 7 months ago

Some comments to what you posted

As a workaround we listen to the error topic and see if something goes wrong. However we cannot wait forever with this and this is - as I said only a workaround.

How would that be different if your application would now need to wait for a positive acknowledgement? Even when using a Request/Response Command, your application would eventually need to give up (waiting for the response) and consider the exchange to have failed. The problem is that there is a multitude of possible reasons for not getting back what you expect: errors a the transport layer (which is what you are addressing here) are one cause, implementation errors at the application level in the device and/or downstream application are another one.

This brings us to

Shouldn't there be a retry mechanism that retries sending a command to a device that has subscribed with QOS1?

Hono deliberately chose to not implement retries nor deferred delivery of buffered messages because the semantics of doing so would open a whole can of worms that would need to be considered per application. Thus the responsibility for implementing this lies with the application. Hono only provides means to better assess when an exchange could be considered successful or not.

mattkaem commented 7 months ago

Having taken a look at #3612 I would like to step back to make sure that we are on the same page regarding the use case and requirements.

My understanding is, that in general, there is not need for this at all, because Hono already provides means to solve the issue, which is the Reques/Response Command where a downstream application sends an application level command to a device and the device sends back an application level response. This is completely independent from the underlying transport protocol(s) being involved. And providing this kind of abstraction has been the main goal of Hono from the very beginning.

Now I also can guess where this issue is coming from: there are some devices that connect via MQTT and subscribe to a particular topic (using QoS1) on which they expect to receive a particular message which they then acknowledge using a PUBACK. I also assume that these devices cannot and/or should not be altered (anymore).

However, for the sake of understanding the requirements here, can we agree that if we were able to change these devices' firmware, then the right thing to do would actually be to have them support a Request/Response Command, i.e. to not only use a PUBACK to acknowledge the command but also to publish a Response message back to Hono?

As discussed during our recent open hour meeting, your assumption is correct that this initiative is primarily intended for devices whose firmware cannot or should not be altered. If this were not be the case, than supporting the Request/Response Command would definitively be the preferred solution. Therefor I agree with your comment in the related PR #3612 that this should probably be implemented as an extension of the Request/Response Command, rather than the one-way Command. This way the adapter can send the command response on behalf of the device, for devices not capable to do it on their own.

mattkaem commented 7 months ago

Some comments to what you posted

As a workaround we listen to the error topic and see if something goes wrong. However we cannot wait forever with this and this is - as I said only a workaround.

How would that be different if your application would now need to wait for a positive acknowledgement? Even when using a Request/Response Command, your application would eventually need to give up (waiting for the response) and consider the exchange to have failed. The problem is that there is a multitude of possible reasons for not getting back what you expect: errors a the transport layer (which is what you are addressing here) are one cause, implementation errors at the application level in the device and/or downstream application are another one.

This brings us to

Shouldn't there be a retry mechanism that retries sending a command to a device that has subscribed with QOS1?

Hono deliberately chose to not implement retries nor deferred delivery of buffered messages because the semantics of doing so would open a whole can of worms that would need to be considered per application. Thus the responsibility for implementing this lies with the application. Hono only provides means to better assess when an exchange could be considered successful or not.

Yes, in both scenarios, we must establish a maximum waiting period for either the positive or negative (error) acknowledgment. However, from my point of view, the distinction lies in the average waiting times associated with each type of acknowledgment.

With positive acknowledgments, we typically only have to wait a few milliseconds to receive confirmation that the message was successfully received by the device, which should be the norm. We only have to wait the full predefined waiting period in the event of a failure, which should be a rare occurrence.

In contrast, with negative acknowledgments, we must wait the full duration for every successful command.

Furthermore, both methods introduce an element of uncertainty. However, this uncertainty is arguably better placed in the positive acknowledgment scenario. Here, we can be confident of successful messages and only have uncertainty for failed messages (when the predefined waiting time is exceeded).

For negative acknowledgments, it’s the opposite. We can never be entirely sure that a message was successfully received.

sophokles73 commented 6 months ago

@calohmn WDYT?

calohmn commented 6 months ago

For this type of scenario, having devices that only send confirmation on the protocol level (PUBACK for MQTT) and can't be changed, I see the usefulness in providing this acknowledgment to the downstream application, to have at least certainty there that the message has reached the device.

As a workaround we listen to the error topic and see if something goes wrong. However we cannot wait forever with this and this is - as I said only a workaround.

How would that be different if your application would now need to wait for a positive acknowledgement?

As pointed out by @mattkaem above, I also think the average waiting time associated with each type of acknowledgment makes a difference here, the time being shorter when the device sends a positive acknowledgement compared for example with the case that the device is busy and a timeout period has to elapse for the negative response to be sent.

Coming to the implementation of this:

Therefor I agree with your comment in the related PR #3612 that this should probably be implemented as an extension of the Request/Response Command, rather than the one-way Command.

As also mentioned in my PR comment, I think this point is debatable.

From the perspective of the device, such an ack-required command message is a message that the device is not supposed to send a response to (otherwise there would be the device response and the response from the adapter). In that sense, the message is a one-way message for the device. From the perspective of the downstream application, the command message with ack-required is indeed a request/response message, with the response coming from the adapter.

Therefore, I think in the protocol adapter implementation, we should forward such ack-required commands as one-way commands to the device. In the Command & Control API documentation, providing the bigger picture, this ack-required property should in my view then be added to the Send a (Request/Response) Command chapter. The ack-required property would be added there alongside response-required, with either one or the other (XOR) required to be true.

sophokles73 commented 6 months ago

@calohmn just to be sure:

Therefore, I think in the protocol adapter implementation, we should forward such ack-required commands as one-way commands to the device.

still means that the application will send a request/response command, but based on the ack-required property, the protocol adapter will then forward the command as one-way to the device and send back the ack to the application (if successful), right?

That would sound reasonable to me. @mattkaem Would that work for you as well?

calohmn commented 6 months ago

still means that the application will send a request/response command, but based on the ack-required property, the protocol adapter will then forward the command as one-way to the device and send back the ack to the application (if successful), right?

@sophokles73 yes

mattkaem commented 6 months ago

If I understand this correctly, to trigger the adapter ack mechanism the application would need to send a command with response-required = false (or omitted), ack-required = true and a correlation ID. The MQTT adapter would than send this command as a one-way command (without command request ID) to the device and, upon receiving the puback, send a command response on behalf of the device. Is this your understanding as well @calohmn @sophokles73?

mattkaem commented 6 months ago

The ack-required property would be added there alongside response-required, with either one or the other (XOR) required to be true.

FMPOV the two attributes don't need to be necessarily mutually exclusive (XOR). Both attributes could be set to true for example for scenarios where the sending application requires:

Immediate acknowledgement: Confirmation that the device received the command.
Later response: Information from the device after processing the command, which may take time depending on the task.

What are your thoughts @calohmn @sophokles73 ?

sophokles73 commented 6 months ago

My understanding of the adapter's behavior is, When an application sends a request/response command:

`response-required`	`ack-required`	Command cannot be sent to device	Command can be sent to device
`false`	`false`	Command times out	Adapter forwards command as-is to device, App waits for response from device
`true`	`false`	Adapter sends error response to App on behalf of device	Adapter forwards command as-is to device, App waits for response from device
`false`	`true`	Command times out	Adapter forwards command as one-way command to device and sends positive ack to App on behalf of device
`true`	`true`	Adapter sends error response to App on behalf of device	Adapter forwards command as one-way command to device and sends positive ack to App on behalf of device

I currently do not see a reason why the properties would need to be used mutually exclusively.

@calohmn can you provide some more insight into your thinking?

calohmn commented 6 months ago

@sophokles73 Regarding the first 3 rows in the above table, my view differs in the fields marked with (*):

`response-required`	`ack-required`	Command cannot be sent to device	Command can be sent to device
`false`	`false`	(*) No response sent to App	() Adapter forwards command as-is* (i.e. as one-way command) to device, App won't get any response
`true`	`false`	Adapter sends error response to App on behalf of device	Adapter forwards command as-is (i.e. as request/response command) to device, App waits for response from device
`false`	`true`	(*) Adapter sends error response to App on behalf of device	Adapter forwards command as one-way command to device and sends positive ack to App on behalf of device

On response/ack: false, the App will never get any response. On response-required: false, ack-required: true, a delivery failure should be sent as error response to the ~device~ (EDIT:) app, I think. The app is anyway waiting for a kind of delivery status response, so, we might as well send the negative ack here in case the adapter encounters any delivery failure (e.g. also if there is no PUBACK from an MQTT device after the configured sendMessageToDeviceTimeout).

Concerning the response-required: true, ack-required: true case: Forwarding the command as one-way command in this case would go against how I understand the meaning of the existing response-required option. In the docs, we have described it as

response-required - MUST be set with a value of true [for a request/response command], meaning that the device is required to send a response for the command.

With response-required: true, in my view the adapter would still have to forward the command as is - i.e. as request/response command - so that the device knows it should respond. Together with ack-required: true, this would mean, that, if no error occurs, the adapter would send 2 response messages to the application - one ack response and then the response from the device.

`response-required`	`ack-required`	Command cannot be sent to device	Command can be sent to device
`true`	`true`	Adapter sends error response to App on behalf of device	Adapter forwards command as-is (i.e. as request/response command) to device and sends positive ack to App on behalf of device and later sends 2nd response to App with response from device.

Like @mattkaem mentioned above:

Immediate acknowledgement: Confirmation that the device received the command.

Later response: Information from the device after processing the command, which may take time depending on the task.

So, this wouldn't be request-response but rather request-response-response :) (but only in the non-error case). To me, this looks a bit odd and would complicate the handling in the application. It would also be a feature going beyond the intention of this feature request here, which is about addressing scenarios where devices can't be altered to send proper responses.

sophokles73 commented 5 months ago

On response/ack: false, the App will never get any response.

We mean the same thing. You seem to take the perspective of the Command Router while I tried to describe the application's view. In any case. this is the case where no special behavior is expected from Command Router/Adapter. The application waits for a response but never gets one.

On response-required: false, ack-required: true, a delivery failure should be sent as error response to the device, I think.

Assuming that you mean

a delivery failure should be sent as error response to the App

then I can live with that.

sophokles73 commented 5 months ago

With response-required: true, in my view the adapter would still have to forward the command as is - i.e. as request/response command - so that the device knows it should respond.

The whole point of introducing ack-required is to let an App tell Hono that the device is not able to respond on itself. Forwarding the request/response command therefore makes no sense in this case.

@mattkaem WDYT?

mattkaem commented 5 months ago

On response-required: false, ack-required: true, a delivery failure should be sent as error response to the ~device~ (EDIT:) app, I think. The app is anyway waiting for a kind of delivery status response, so, we might as well send the negative ack here in case the adapter encounters any delivery failure (e.g. also if there is no PUBACK from an MQTT device after the configured sendMessageToDeviceTimeout).

I think this definitively makes sense.

The whole point of introducing ack-required is to let an App tell Hono that the device is not able to respond on itself. Forwarding the request/response command therefore makes no sense in this case.

That is true. However, in my opinion it is like @calohmn said:

Forwarding the command as one-way command in this case would go against how I understand the meaning of the existing response-required option.

Additionally, if I am not mistaken, this would result in ack-required = true, response-required = false and ack-required = true, response-required = true having the exact same behavior. FMPOV this would be very unintuitive. So I currently see the following two options:

We extend the planned functionality, allowing the device to send a response in addition to the ack response by the adapter.
We make ack-required and response-required mutually exclusive. In this case we would still need to figure out what the behavior should be if an app where to send a command with response-required and ack-required set to true, regardless.

@calohmn @sophokles73 Do you see any other options or what are your thoughts?

sophokles73 commented 5 months ago

Forwarding the command as one-way command in this case would go against how I understand the meaning of the existing response-required option. In the docs, we have described it as

response-required - MUST be set with a value of true [for a request/response command], meaning that the device is required to send a response for the command.

With response-required: true, in my view the adapter would still have to forward the command as is - i.e. as request/response command - so that the device knows it should respond. Together with ack-required: true, this would mean, that, if no error occurs, the adapter would send 2 response messages to the application - one ack response and then the response from the device.

My understanding of what the docs intend to say is:

response-required - MUST be set with a value of true [for a request/response command], meaning that the App expects to receive a response to its command, even if the device is not connected/available.

The whole thing had been introduced to allow the adapter and/or command router to send a (negative) response on behalf of the device to the Application when using Kafka as the messaging infrastructure. So it definitely was not meant to require that the command is being forwarded to the device under all circumstances.

I therefore still do not see any problem with what I had suggested for the true/true case.

calohmn commented 5 months ago

My understanding of what the docs intend to say is:

response-required - MUST be set with a value of true [for a request/response command], meaning that the App expects to receive a response to its command, even if the device is not connected/available.

My problem with this understanding is, that it introduces a dependency on the ack-required property in terms of the resulting behaviour. With the above understanding, the specific effect of the response-required property alone stays rather abstract. The definition would be: "response-required means the app requires some kind of response - the specifics whether this shall be a proper response message from the device or a response based on the device-ack depend on the ack-required property." But, it's certainly very important for the app to explicitly define that it requires an actual device-response message with payload. With the above approach, this would have to be defined indirectly - setting response-required and not setting ack-required.

The alternative would be: [device-]response-[message-]required - means that the app requests a proper device response message. The adapter will forward the message to the device indicating that it shall return a response message. ack-required - means that the app wants a protocol level ack from the device.

With the latter approach, the [device-]response-[message-]required property would clearly control how the message is forwarded to the device (as request-response message if set or otherwise one-way message). The ack-required defines whether a device-ack will be forwarded back to the app. Both properties can be configured rather independently of each other. The "true/true" combination would also be possible in theory - although I would favor not supporting that in order not to complicate things with a request-response-response pattern.

2. We make ack-required and response-required mutually exclusive. In this case we would still need to figure out what the behavior should be if an app where to send a command with response-required and ack-required set to true, regardless.

When both response-required and ack-required are set to true, the Command.isValid() method could just return false, letting this be handled then like other invalid commands. Specifically, I would forward this error as "BAD_REQUEST" status delivery-failure command-response message to the app.

sophokles73 commented 5 months ago

@calohmn after a discussion during yesterday's Hono Open Hour with @mattkaem I now believe that I understand your point: if an Application sets both ack-required and response-required and the protocol adapter would send a positive ack back to the app while still (successfully) forwarding the command as a request/response command to the device, the device might indeed also send back its response later on. The app would need to handle two responses in this case.

We should prevent this from happening and I therefore also believe that we should make response-required and ack-required mutually exclusive.

@mattkaem In the API documentation we need to properly describe the two different use cases that we want to support, e.g. also providing an example flow.