Allow consumers to acknowledge received messages themselves

NickDarvey commented 6 months ago

Describe the feature

This is #130 as a feature request.

This client library automatically PUBACKs received messages, not allowing consumers (handlers, implementers of on_publish_received) to control ackwowledgement themselves. This shortcoming makes setting QoS 1 on messages published by the server seem mostly pointless. If the client is doing anything useful with the message, there's a chance it will fail.

The current workaround is to do these acknowledgements of success at the application level, which is a shame when it's a feature of the protocol.

Use Case

[There are cases where the] client was unable to either fully process the message or store it for later processing. For example, the client may be acting as a broker for other services on a device or network, and the message is effectively not "received" unless the broker was able to store it or forward it and receive an acknowledgement in turn.

This is important for QoS 1 messages which must not be lost at any point in the chain of applications which process the message. Other mqtt client libraries allow the client application to dictate when the PUBACK is sent. For example, mqtt.js (and therefore the v1 AWS IoT JavaScript Device SDK) passes a callback to the client's handleMessage function which the consumer calls to acknowledge receipt of the message.

By @aquark

Proposed Solution

I have seen libraries which assume synchronous processing let the handler return true or false to indicate if the message should be acknowledged. The drawback of synchronous acknowledgement is that it limits the throughput of receiving messages. E.g. if the listener thread has to wait for a QoS 1 message to be written to disk, this is wasting time it could be processing QoS 0 messages. By allowing messages to be acknowledged asynchronously via a callback, the QoS 1 message can be passed to another thread to persist and acknowledge the message. This is also more convenient for applications which try to do all io asynchronously.

By @aquark

Other Information

No response

Acknowledgements

[ ] I may be able to implement this feature request
[ ] This feature might incur a breaking change

bretambrose commented 6 months ago

This may be like the fourth or so request we've had for this.

I'm sympathetic to the idea of adding bridging support, but have some reservations, primarily with respect to the 311 client.

The 311 client is a brittle, difficult-to-maintain shared-state-with-locking code base. New features are only added when absolutely necessary and I would best describe my feelings when going about doing such work as "reluctance tinged with a hint of dread."

Making things worse, the publish received callbacks in 311 are not structured in a way that would allow us to add bridging in a backwards compatible manner. The callbacks take the topic and payload as parameters; bridging support would require an entirely new callback type signature which in turn would require us to support multiple different callback signatures as well as expose controls to the user for selection. My desire to do this is pretty low, to put it mildly.

On the other hand, the 5 client's callbacks are all structure-wrapped, both at the native level and in all the CRT libraries that bind the client out to other languages. Adding bridging support to the 5 client (and all of its various bindings) would not break existing users nor would it require a new callback type + controls.

The downside is that the 5 client is relatively new and it's likely that existing users interested in this functionality may be using the 311 client and have no desire to upgrade because the two clients have very different public APIs (intentional, as the 311 API contains a number of frustrating design mistakes).

Regardless of which clients we add it to, I could see bridging (if we add it) working as follows:

The publish received callbacks include an additional field/parameter that is an opaque "factory"-like object with a single API that one-time-only returns an AckInvoker object. The client checks the state of the factory after the callback returns and sends the puback if the AckInvoker wasn't created.

The AckInvoker is a first-class object that supports a one-time call that will send the underlying puback for the publish received; users are free to pass it around and keep it as long as necessary. It would have a strong reference to the MQTT client, so failing to use it properly would cause a reference/memory leak. While an AckInvoker for a particular packet id was outstanding, no further ones would be created for that packet id (IoT Core will resend publishes that don't get acked within certain intervals and having multiple AckInvokers alive that refer to the same packet id could lead to all sorts of bad behavior).

NickDarvey commented 6 months ago

The downside is that the 5 client is relatively new and it's likely that existing users interested in this functionality may be using the 311 client and have no desire to upgrade because the two clients have very different public APIs (intentional, as the 311 API contains a number of frustrating design mistakes).

Surely it would be reasonable to add new features like this to the new client only. (FWIW I'm using the 5 client only.)

bretambrose commented 6 months ago

No timeframe or commitment atm, but this request was well-received by the team and I added a backlog entry for it.

awslabs / aws-c-mqtt