adafruit / Adafruit_CircuitPython_MiniMQTT

MQTT Client Library for CircuitPython
Other
80 stars 49 forks source link

Memory allocation error on Raspberry Pi Pico W? #131

Closed ilikecake closed 1 year ago

ilikecake commented 1 year ago

I apologize if this is just an error on my part, but I am having trouble getting this library to work with a Raspberry Pi Pico W. Please see this thread for details. It appears that code is attempting to allocate a negative memory size, which is obviously a problem. I don't know enough about the MQTT protocol or debugging Wifi to find the issue.

Is this a bug in the MQTT library, or is my code the problem?

vladak commented 1 year ago

This is curious. It seems that the piece of code that is annotated with variable values in the forum in _wait_for_msg() should only be executed for the PUBLISH message type, i.e. when the server publishes a message to the client. Here, it seems that it is trying to interpret variable payload that is not consistent with the message type. The length 12320 corresponds to bytes 0x30 0x20. Anyway, it is wildly improbable that the server uses such a long topic length.

For connect(), assuming it got the CONNACK message type, it should return from _wait_for_msg() right here: https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/blob/4c0c7706b1aab9301c608ba0241953702b280ee1/adafruit_minimqtt/adafruit_minimqtt.py#L912-L913 i.e. before the code that lead the exception.

Note that the line numbers do not match as I am using the latest code, but you get the idea.

It would make sense to get the value of res[0] and ideally also complete traffic dump for the TCP connection between the MQTT client and the broker when this problem happens. Also, I wonder what kind of software is the MQTT broker running.

Anyhow, the code should be more resilient against internally inconsistent messages.

vladak commented 1 year ago

There is some sort of problem when a MQTT message spans multiple TCP segments - at least in CPython. Only the data from the first segment is read in a single _wait_for_msg() call, even though the correct size is passed to _sock_exact_recv(), the rest of the buffer is filled with zero bytes. The remaining segments of the message payload are read in the next _wait_for_msg() call and are interpreted as MQTT headers, causing all sorts of mayhem.

When connected to test.mosquitto.org and subscribed to the # topic, adafruit_minimqtt does not survive more than couple of seconds.

brentru commented 1 year ago

@vladak Did https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/pull/132 address this problem, can we close the issue?

vladak commented 1 year ago

Nope, #132 merely prevents the problem from propagating to the point where allocation is attempted.

vladak commented 1 year ago

Trying to debug the issue, I can see there are actually 2 functions named _sock_exact_recv in the MQTT class ! Only one of them is being used - the latter defined one.

vladak commented 1 year ago

If CircuitPython on the Raspberry uses backward compatible socket implementation, then the root cause is clear: https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/blob/0e5bb0c5a1e25c339131a9d7699f40f6b39fe190/adafruit_minimqtt/adafruit_minimqtt.py#L990-L993

Unlike the ESP32 part (or the other _sock_exact_recv() implementation that is not used anymore), it does not use the actual number of bytes read into the array. Hence, partial read leads to the observed problem.

vladak commented 1 year ago

There are some preexisting issues like #107 that share the symptoms (negative allocation count), however these seem to happen in the ESP32 branch in _sock_exact_recv().

vladak commented 1 year ago

For the record, the double definition of _sock_exact_recv is detected by mypy:

$ mypy adafruit_minimqtt/adafruit_minimqtt.py 
adafruit_minimqtt/adafruit_minimqtt.py:977: error: Name "_sock_exact_recv" already defined on line 316  [no-redef]
Found 1 error in 1 file (checked 1 source file)

so perhaps running this in Github action for PRs is in order.