adafruit / Adafruit_CircuitPython_MiniMQTT

MQTT Client Library for CircuitPython
Other
79 stars 50 forks source link

make sure to read the whole buffer in _sock_exact_recv() #133

Closed vladak closed 1 year ago

vladak commented 1 year ago

This change addresses the problem of returning incomplete buffer from _sock_exact_recv() in CPython.

Tested with:

#!/usr/bin/env python3

import adafruit_minimqtt.adafruit_minimqtt as MQTT
import socket
import ssl
import logging

def connect_hook(client, user_data, result, code):
    print(f"Connect: {user_data} {result} {code}")

def message_hook(client, topic, message):
    print(f"Message: topic='{topic}' message='{message}'")

broker = "test.mosquitto.org"
port = 1883

mqtt_client = MQTT.MQTT(
    broker=broker,
    port=port,
    socket_pool=socket,
    ssl_context=ssl.create_default_context(),
)

logging.basicConfig()
# mqtt_client.enable_logger(logging, log_level=logging.DEBUG)

# mqtt_client.on_connect = connect_hook
mqtt_client.on_message = message_hook

mqtt_client.connect()
mqtt_client.subscribe('#')

while True:
    try:
        mqtt_client.loop(5)
    except UnicodeDecodeError:
        pass

It survived the wild message traffic there for bunch of minutes.

vladak commented 1 year ago

I also used the above test code on QtPy, with the above code adapted for CircuitPython and WiFi, using the changed adafruit_minimqtt module and it ran fine for bunch of minutes. Eventually it got OSError (ETIMEDOUT) however that is probably not related to the changes. I made sure with inserted debug print the 'backward compat' branch in _sock_exact_recv() was taken. At first I thought that branch is used only by CPython, however seeing it is also used on microcontrollers, it seems this change will fix bunch of pre-existing issues. I am amazed how fast the program runs on the QtPy.

dhalbert commented 1 year ago

Eventually it got OSError (ETIMEDOUT) however that is probably not related to the changes.

What do you think this is related to? Do you think the server is doing rate limiting or something?

vladak commented 1 year ago

Eventually it got OSError (ETIMEDOUT) however that is probably not related to the changes.

What do you think this is related to? Do you think the server is doing rate limiting or something?

When I run this test with CPython on my laptop connected to WiFi hotspot on my phone that is using 4G to connect to the Internet, it just continuously runs until interrupted. Given that when the (slightly adapted) test code runs on the QtPy, using WiFi AP that is connected to super stable Ethernet based Internet upstream and considering it uses the same branch in _sock_exact_recv(), I'd attribute this to WiFi flakiness / network delays and/or to the lower layers (CircuitPython / the OS running on the chip). The way I look at this change is that this fixes a problem that allows to uncover more problems underneath.

dhalbert commented 1 year ago

@brentru Could you take a look at this?

brentru commented 1 year ago

@dhalbert is there urgency? I am out today, returning tomorrow.

dhalbert commented 1 year ago

@dhalbert is there urgency? I am out today, returning tomorrow.

NP, this was just hanging for a while.