256dpi / arduino-mqtt

MQTT library for Arduino
MIT License
1.01k stars 232 forks source link

Re-transmit publish message on QoS1 #238

Closed hamidrasyid closed 1 year ago

hamidrasyid commented 3 years ago

Hi, thank you for you awesome library.

I just migrated to your mqtt library since I have to change my device to esp32, previously I use Paho with python in Nano Pi device.

I want to ask about implementation of QoS 1, is there any way when publish was failed bcs of connection (or etc), then the message will try publish later after the connection back online?

I was able implement this in Paho-mqtt python with only declare QoS1 on publish message, and they keep trying publish in loop if something wrong. Or is there any limitation with your library regarding this?

Thanks in advance.

256dpi commented 3 years ago

The client will not attempt to send a message again after loosing the connection as we do not have a persistent store (session) for messages. You need to arrange for that on your side. The publish function returns a boolean indicating if it was successful and in the case of QOS1 if a ack has been received. However, this way a retransmit has a different packet id.

1simone0 commented 3 years ago

Hello @256dpi and thanks also from me for your library!

In the previous answer you talk about a client disconnection which implies persistent session as you say, but what happens if the client just doesn't receive PUBACK packet back. Assuming connection is ok but something went wrong and my client didn't receive back a PUBACK will it try to publish again the message with the same packet ID?

Same question if is the broker to publish a message and doesn't receive PUBACK from client, he will try to publish again with same packet ID. How can I manage that duplicate message? From your Arduino API library I cannot access the packet ID I guess.

Do you think is a good point for an enhancement? I'm not really an MQTT expert and I hope my questions are good and that I am not off track.

BR, Simone

256dpi commented 3 years ago

Since MQTT is running on TCP a packet cannot just go missing as TCP guarantees a stable stream. Therefore, correct MQTT clients only attempt to resend unacked packets when reconnecting. In the case of this library, as there is no local packet store no packets are retransmitted upon reconnection. However, you can use the return value of publish to check if a message has been acked (QoS 1/2) and just send it again if it failed. But, as you noticed, the packet will get a new packet ID and this may cause double delivery if the last ack from the broker to the client went missing. To get 100% correct we would need to implement a local packet store that should even persist packets among reboots. However, I consider that out of scope for this project.

philbowles commented 3 years ago

The client will not attempt to send a message again after loosing the connection as we do not have a persistent store (session) for messages. You need to arrange for that on your side. The publish function returns a boolean indicating if it was successful and in the case of QOS1 if a ack has been received. However, this way a retransmit has a different packet id.

In other words, you do not fully implement MQTT 3.1.1 as that REQUIRES maintaining a stored session and client MUST retransmit failed QoS1 and 2: image

pv3mqtt

pv2mqtt

Thus your code clearly breaches those QoS1 and 2 promises, so not only is is incorrect to claim MQTT 3.1.1 compliance, but QoS 1 and 2 simply do not work and I think this should be made clear in the documentation. QoS is only QoS if it can recover messages - thats what QoS means - you have deliberately designed your code so that this is impossible.

Furthermore your glib " You need to arrange for that on your side" is not even possible for several reasons:

1)

image

The API contains no reference at all to DUP. Therefore how can the user set it? ONLY this library can do that, so delegation of the responsibility to the user - who has no way of doing it - is not only lazy but also prevents Q0S1 from working, even if the session recovery was implemented. Again PUBLISH with qos field set to 1 but without retransmission is absolutely not QoS1, its not anything except "broken".

2) Does the user also receive a message for QoS2 fail? For failure of PUBREC or PUBREL ? He needs to know since if you also delegate this to him, his actions are different depending on which part of the QoS2 handshake fails:

image

Which parts of this does your code do? Unless the answer is "all of it" then QoS2 cannot work either.

3) Of course there are numerous other responsibilities for receiving QoS1 and 2 retransmissions ... in these cases the user would need ther server packetID (totally different and unrelated to the client packetID - mxing the two will cause a serious mess and again QoS1 & 2 failure - I'm assuming you knew the difference between the two?) Where does your code pass ther server packetID to the user?

Theres is SO much more to QoS 1 and 2 than just sticking a 1 or 2 in the packet header... as an absolute minimum it requires:

If any of the above are absent, quite simply QoS 1 and 2 CANNOT work Your library has none of them.

Don't take my word for it, read the specification for yourself to fully understand what is required to implement it: MQTT 3.1.1 spec. Or you can look at a library that implements it fully on ESP8266 and ESP32 (and allows packets incoming and outgoing of up to 1/2 the free heap, and is asynchronous) PangolinMQTT

For any who does not follow all this, you can prove it yourself quite simply: Write some code that does the following:

  1. setCleanSession(false) and connect
  2. subscribe to topic A @ QoS1
  3. publish to topic A @Qos1 - say 1x per second with an increasing ID of your own as the payload
  4. each time you publish, add to a list of "sent IDs" so you know what has gone out
  5. on each incoming message, take the payload ID out of the list

at this point - if nothing goes wrong - the list will always be empty as you will always get the same value back in the message that you sent out

Now break the connection. You may have to do this many times to get it to break at the "right" point which is in between the publish and its corresponding subscribe. When testing my own library, I put a forced disconnect after the publish that happens - say - 5% of the time. You can try randomly stopping your own server if you run e.g. mosquitte , but you will only see the QoS1 fail if you get the timing exactly right

Assume you just sent your ID of 236 when the connection breaks.

The correct behaviour will be that on re-connection (when you should reset your ID to 1) before you even get the chance to send the first ID 1 of the reconnected session, the server will resend the failed PUBLISH for your ID 13 - this is what Qos1 "at least once" promise means. If that promise fails, QoS1 fails. At this point, you will remove it from your list (as you would have done had it succeeded in the previous session) and restore the balance of zero unmatched pairs of pub/sub.

It should look something like this (from testing my own library)

sr1p

IF the above does not happen then Qos1 has failed: MQTT3.1.1 you will have delivered the message to your subscribe function exactly zero times, thereby breaking the "at least once" promise

If at any point after your onMessage function has completed, the number of items in the list is non-zero - this is 100% proof of QoS1 failure. and will look something like this, taken when I was testing yet another library that totally misunderstands and/or fails to implement QoS1 and 2 (but still claims MQTT 3.1.1 compliance :) )

sr2a

Having looked at the code and seen the authors response, I can tell you now, you will never see that incoming restransmit form the server, and if you get your timing right you will see a gradually growing list of ll the IDs that failed QoS1

Anbody who is seriously interested in this quite complex topic may like to read an article I wrote about The challenges of MQTT on embedded systems

People: if you genuinely need QoS1 and 2 you cannot use this library - you will almost certainly lose or duplicate (@ QoS2) data on loss of connection: absolutely the opposite of why you wanted QoS1 or 2 in the first place. I hope this helps anyone who might have beeen having problems.

256dpi commented 1 year ago

Closing in favour of #294.