eclipse / paho.mqtt.python

paho.mqtt.python
Other
2.19k stars 722 forks source link

AWS Quota Exceeded Infinite Retry #760

Closed jamwest closed 2 months ago

jamwest commented 10 months ago

We run a number of IOT devices that use AWS IOT Core for the MQTT Message Broker.

An issue came up where we were using many GBs of cellular upload data per day!! Luckily this only occurred in specific cases but this could have been catastrophic for us.

The cause of this was AWS's quota limit of 128kb per message. AWS returns a DISCONNECT message when it receives a message exceeding the quota. The problem with this is that when you set the QOS to 1 and don't clean the session, paho.mqtt retries any unacknowledged messages on disconnect... So here we were sending messages that were larger than the limit every ~3 seconds infinitely (or until the devices were restarted).

We have found a quick solution which is to always check the size of the message before trying to send the message, and to add this to the on_disconnect function:

def on_disconnect(client, userdata, rc):
    if rc != 0:
        client._out_message_mutex.acquire()
        queue = client._out_messages
        queue.popitem(last=False)
        client._out_messages = queue
        client._out_message_mutex.release()
        print("Removed failing message from queue")

client = paho.Client(...)
client.on_disconnect = on_disconnect

...

Maybe there is a better way to handle this in the package? It is a gotcha with some potentially dire consequences.

Hopefully this helps someone in the future.

MattBrittan commented 9 months ago

The only real mechanism that MQTT v3 provides to deal with invalid messages is to drop the connection. This means there is no real way for the client to tell what the issue is.

If a Server implementation does not authorize a PUBLISH to be performed by a Client; it has no way of informing that Client. It MUST either make a positive acknowledgement, according to the normal QoS rules, or close the Network Connection [MQTT-3.3.5-2].

MQTT V5 provides the ability to return acknowledgments with Reason Codes indicating that the message has not been accepted. Perhaps consider moving to V5 (I believe IoT Core supports this and would guess it will return an acknowledgment indicating "Quota exceeded").

MattBrittan commented 2 months ago

Closing this as I believe I have answered the question.