LiamBindle / MQTT-C

A portable MQTT C client for embedded systems and PCs alike.
https://liambindle.ca/MQTT-C
MIT License
768 stars 270 forks source link

No error in the case a QoS 1 message sending is timed out #119

Open vpetrigo opened 3 years ago

vpetrigo commented 3 years ago

Hello,

I have met an issue that the library right now does not report an error if a message with QoS 1 (that should be valid for QoS 2 as well, but I have not checked that) sending meets command timeout. As we can see there in the case of command timeout __mqtt_send() just marks a message to be resent in the future with no error indication to the caller. I would like to describe a case where it may introduce some problems:

Same effect may be observer with keep-alive pings that have not been confirmed with a PINGRESP from the broker.

Note 1 Right now it seems there are no way to explicitly wait for a message delivery with ACK. Potential workaround there is to call mqtt_sync() after a QoS 1 message send and wait for response_timeout to see whether client.number_of_timeouts increments or not.

Current behaviour example

LiamBindle commented 3 years ago

Hi @vpetrigo, thanks for raising this. I'm not sure if I quite follow though. If I understand correctly, the problem you've identified is MQTT-C won't generate an error code if a QoS>0 message goes unacknowledged. Is that correct?

If so, I believe that's true---no error code will be generated, and unacknowledged QoS>0 messages will just loop through timeout+resending. I'm struggling to see how the client could identify such an even though. Under this event (an unacknownledged QoS>0 message) the client is supposed to continue resending the message until it hears back from the broker. Therefore, if I've understood correctly, I think this event would have to be handled at the application level. Have I understood correctly? If not, could you clarify what error event is missed/what the expected way to handle it is?

vpetrigo commented 3 years ago

Hello @LiamBindle,

MQTT-C won't generate an error code if a QoS>0 message goes unacknowledged. Is that correct? Yes, that is correct. I'd say that keep-alive messages also have such an issue, but we may put off focusing on that.

I try to provide as many details as I can right now:

So, I think it is a good idea to silently push a QoS = 0 messages into client's queue and forget about that, while it should not be the case for QoS > 0 messages. We have explicit configuration to be set - response_timeout. For me it seems that in the case of no response in the given time, the client should return back to a caller and notify with an error instead of silently put them back into client's queue.

Handling of that case may be done on the app side, but it seems as duplicating responsibilities. We already have response_timeout tracking that just only increments the number_of_timeouts counter. The first straightforward way to handle that in the application with the current implementation is:

int rc = mqtt_publish(p_client, ...);
unsigned n_timeouts = p_client->number_of_timeouts;
unsigned start_timestamp = now();

while (now() - start_timestamp < response_timeout && p_client->number_of_timeouts ==n_timeouts)
{
   rc = mqtt_sync(p_client, ...);

   if (rc != MQTT_OK)
   {
      // handle error
   }
}

That leads us to wait for the whole defined response_timeout to be completely sure that a message we published has been delivered. That would increase delays in the app though.

Impact on keep-alive pings That issue also relates to keep-alive pings. Right now we have explicit keep-alive interval we have to define during the client initialization. The issue is that as with QoS 1 there are no explicit error code upon calling mqtt_sync about keep-alive timeout. Usually there is a recommendation specified by MQTT services like AWS IoT, Azure IoT Hub, etc. to specify that as 80% of the keep-alive setting on the server side to avoid too frequent pinging as well as keeping connection alive. In the case I specified in my first post - if a device can deliver messages to a broker, but does not receive back anything, the MQTT client keeps silently put all keep-alive ping messages back in the queue. I have the case with such behaviour and only the internet connection reset helps. If we miss a keep-alive ping delivery, most likely that the broker will disconnect a device. So, it seems to me that there is no need to issue new keep-alives into client's queue. Current observed behaviour - in the case of no responses from the broker, keep-alive messages are created till the TX buffer is full and then mqtt_sync raises an TX_BUFFER_IS_FULL error.

On of the options I see right now:

struct message_context { uint16_t pid; }

mqtt_publish(struct mqtt_client client, const char topic_name, const void application_message, size_t application_message_size, uint8_t publish_flags struct message_context msg_ctx)


- for a QoS > 0 messages call `publish_ack_callback` that may have the same signature `publish_response_callback` when an ACK  is received for the published message:

```c

struct message_context on_the_fly_message;
// ...

void publish_ack_callback(..., uint16_t pid)
{
   if (pid == on_the_fly_message.pid)
   {
       // set `ack_arrive` or notify awaiting task somehow
   }
}

int rc = mqtt_publish(p_client, ..., &on_the_fly_message);
unsigned n_timeouts = p_client->number_of_timeouts;
unsigned start_timestamp = now();

while (now() - start_timestamp < response_timeout && ack_arrived)
{
   rc = mqtt_sync(p_client, ...);

   if (rc != MQTT_ACK_TIMEOUT_ERROR)
   {
      // handle error
   }
}
LiamBindle commented 3 years ago

@vpetrigo I would be in favor of a PR that added:

I think those two would provide the necessary flexibility. What do you think? Would you be able to make a PR for this (or similar)?

LiamBindle commented 3 years ago

Closing due to inactivity. Please feel free to reopen.

arthurkomatsu commented 3 years ago

Up on this as it would be very useful to know if a QOS 1 message timed out.

LiamBindle commented 3 years ago

I'll reopen this.

I'm happy to merge a PR that implements this.

vpetrigo commented 2 years ago

Hello everyone, Sorry for huge delay, have too many things on my plate. Get back to that feature as soon as I have some spare time.

LiamBindle commented 2 years ago

@vpetrigo No worries at all---I totally understand. Same goes for me right now.

When you have something ready, I'm happy to review and merge it.