Puback iscalled too soon on incoming publish message

tomasol commented 5 years ago

With QoS 1 I expect that when my app crashes while processing incoming message, after my app starts again I will receive the message again. At least that's how I understand "at least once" delivery. After my testing I looked at the code: connection.rs#L546 and realized this is not the case. Puback is sent right after publish packet is received, so there is no chance the app can persist and not lose the packet in case of crash. I believe this is wrong. One way would be to call puback(pkid) only after on_message handler finishes. Other way would be to give user a way to call puback(pkid) explicitly.

tekjar commented 5 years ago

@tomasol When Mqtt protocol says At least once, that doesn't mean mqtt broker has to hold the message until client is done processing the message. I just means that broker has to hold the message until client receives the message. This helps the client to be robust in bad networks. At least that's what I understood.

I feel there is a reason why mqtt is this way

What if your code always panics while handling the message? Broker has to hold the message forever because the user code is wrong?

tekjar commented 5 years ago

Maybe it's debatable why not but it'll be a different protocol. Spec clearly says message arrives at the receiver

http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html#_Toc398718101

tomasol commented 5 years ago

Thanks for response.

What if your code always panics while handling the message? Broker has to hold the message forever because the user code is wrong?

There is already a window of opportunity for that, e.g. if processing incoming pub packet fails (I mean in any library, not yours in particular). QoS cannot protect you from bugs, so I think that is out of scope.

It should however protect from losing messages due to power/network outage. If the outage happens before puback is sent & received, broker just resends the message. If however the outage happens while I try to persist the data, I may never be able to recover it. There are applications where it does not matter, but most applications would not work correctly under these circumstances. Spec says:

In the QoS 1 delivery protocol, the Receiver MUST respond with a PUBACK Packet containing the Packet Identifier from the incoming PUBLISH Packet, having accepted ownership of the Application Message

IMO this is clear rerefence of owning the data and being responsible for persisting it if needed.

It is even more obvious for QoS2 but let's not get into that. Also, there are mqtt libraries that have persistence for QoS right in their design, e.g. paho java client

tekjar commented 5 years ago

@tomasol Thanks for bringing this up. These are useful insights. I'll try to put my thoughts here both to answer your question and for my clarity in the design of next version I'm currently working on (still private)

There is already a window of opportunity for that, e.g. if processing incoming pub packet fails (I mean in any library, not yours in particular). QoS cannot protect you from bugs, so I think that is out of scope.

Broker doesn't have to hold messages due to bugs in libraries other than rumqtt if we send ack immediately after receiving without waiting for application logic to finish. Broker holding messages because of bugs in mqtt library is a much smaller window than broker holding messages because of bugs in application logic which can be very dynamic depending on the use case. Usually there'll only exist few common client libraries connecting to the broker. But I also kinda think it's not big deal for broker to hold messages for a longer period of time given maximum inflight messages in the broker are honoured.

It should however protect from losing messages due to power/network outage

My understanding is that mqtt as a protocol defines robustness across network outages, not power outages and application crashes. Below notes from the spec makes me arrive at this understanding (especially the second point which is very concrete)

This quality of service ensures that the message arrives at the receiver at least once
The receiver is not required to complete delivery of the Application Message before sending the PUBACK (Below Fig 4.2)
While receiving QoS2, Spec says I can either store the message and send PUBREC or store Packet Id, send message to application and send PUBREC. If I choose to wait for the message to be processed by the application layer before sending PUBREC and I loose power right after I'm done with application logic and before I send PUBREC, broker won't receive PUBREC and will send the Publish again and we end up processing a duplicate message (unless we persist packet ids) which is a clear violation of QoS2

My conclusion after reading the spec is that, Mqtt (protocol) doesn't care about application crashes/reboots or disk persistence is compulsory to implement QoS2. I choose the first option as I don't remember reading anything about disk persistence in the spec and this might not be a viable option for microcontrollers. Please correct me if I'm wrong here.

MUST respond with a PUBACK Packet containing the Packet Identifier from the incoming PUBLISH Packet, having accepted ownership of the Application Message

To me having accepted ownership of the Application Message means that client has received the message and hence broker is free to discard it. Finding out the opinion of other people might help here.

But does that mean clients can't do better? They should. But there'll be cost. As you have pointed, some of the client libraries are implementing persistence layer. With this method, assuming disk write speeds doesn't vary too much, the client doesn't have to wait on application logic to finish. But is this cost of always writing to disk worth it?

There are applications where it does not matter, but most applications would not work correctly under these circumstances

I feel it's the other way around. Lot of client applications will be okay to loose data during power outages/crashes but for some it's not an option

But it's very sensible to improve the robustness of QoS1 receive by providing a compile time feature flag to send acks after processing the message and document it well explaining non concurrent long running taks might fill up inflight message buffer in the broker leading to disconnections.

tomasol commented 5 years ago

Thank you for your response. We can probably agree that the spec leaves some room for interpretation, ultimately it is the application's responsibility to decide what's the correct behavior for its use case.

I feel it's the other way around. Lot of client applications will be okay to loose data during power outages/crashes but for some it's not an option

It's hard to say what percentage of apps tolerate data loss, I just think it would be nice to have the ability to decide when the ack should be sent. Some libraries support persistence, some don't. I like the way the java client handles it - not only is there the mentioned persistence api, but you can use a callback that gets called before puback is sent. Also it is possible to turn on manual acks for even finer control.

Either one of those options would be sufficient and would give apps more robustness and protect them from data loss.

tekjar commented 5 years ago

@tomasol Agreed. I'll leave this issue open and try to cater this in the new design. Maybe not immediately but eventually I'll add the support to send acks after handling

AtherEnergy / rumqtt

Puback iscalled too soon on incoming publish message #94