Closed harrandt closed 6 months ago
I just had a look at the source code and think there might be a different explanation, but I have not understood it in full.
if self._sock is not None and (now - last_msg_out >= self._keepalive or now - last_msg_in >= self._keepalive):
https://github.com/eclipse/paho.mqtt.python/blob/master/src/paho/mqtt/client.py#L2582
I have a publisher-only client, so according to this the age of the last_msg_out
will trigger a PINGREQ if above keepalive when _loop()
is called.
So what's written here is actually true:
You may think that sending messages on a regular basis would stop the PING messages but it turns out that the keep alive interval applies to both publish and receive messages.
Ok, I think I am getting to the bottom of this, the PUBACK that I receive are for a new client instance with no memory of the previous session. So the PUBACK I receive are probably for different message IDs used in the previous session and so they might or might not be used in the current one.
The client does not know which PUBACK it receives, and it will start the message IDs for the current session at 1 counting up.
So, if a PUBACK arrives, it could be one from the previous session which is also used for this session and it might not be the PUBACK for a message in the current session, which we are waiting for.
So we might never get it for the current message, right? So will the current message be resent and receive "its" PUBACK then?
So we might never get it for the current message, right? So will the current message be resent and receive "its" PUBACK then?
Unfortunately this client currently only stores session information in memory. If you create a new Client
this info is lost.
This means that, in your example, the new Client
has no knowledge of the messages sent previously (the client should handle this correctly when using it's own reconnection functionality). When the client it receives a PUBACK
it will compare the ID to a map of those it has sent and, if it's found, the user code will be notified and the ID removed from the map. If the ID is unknown then the PUBACK
is effectively ignored.
Ad an aside; the fact that the server is sending all of the PUBACK's is interesting upon reconnection. My reading of the spec is that the V3 allows (but does not require) this, whereas V5 does not allow it. The V5 spec states that the server "MUST resend any unacknowledged PUBLISH packets (where QoS > 0) and PUBREL" and "Clients and Servers MUST NOT resend messages at any other time".
In terms of how this impacts keepalives; I don't believe it should. The timer is reset when any packet is received (regardless of any errors whilst processing it). Looking at the code that handles the PUBACK
it looks like the ACK is just ignored if the ID is not in the dict.
I can think of one potential cause for the loss of connection (but this requires that the client is receiving only, not publishing). As per the spec:
It is the responsibility of the Client to ensure that the interval between Control Packets being sent does not exceed the Keep Alive value. In the absence of sending any other Control Packets, the Client MUST send a PINGREQ Packet [MQTT-3.1.2-23].
So the following could happen:
_last_msg_out
) that no control packets have been sent within the required timeframe so sends a ping._last_msg_out
) that no control packets have been sent within the required timeframe; as no response has been received from the previous ping it drops the connection.This should not be the case in your situation (as you mention that the client resumes publishing). Unfortunately without access to logs it's going to be difficult to diagnose further.
Your assumption was correct. We were publishing towards a some customized MQTT broker and asked the provider about some internals. It is a customized Java Moquette with a stupid slow sync queue, so basically our PINGREQ got queued and delayed among all the other stuff that had to go to the database (slow...). That in combination with "unknown id" PUBACKs from the previous session led to the issue. But it is not an issue of Paho, so I will close this.
Basically our problem was, that the producer was to fast and the consumer to slow because of its blocking DB processing queue.
I see a strange behaviour using the Paho Python client (paho-mqtt 1.6.1) in combination with the Java Moquette broker.
The broker seems to queue the PINGREQ after the "old" messages, thus he will process the messages from the previos session first before answering it. And depending on the keepalive interval the client will drop the connection. Even though there were lots of PUBACKs from the broker.
The spec mentions
So my suspicion is, that the Paho client only considers PUBACK from the current session as Control Packets that will reset the keep alive period.