Closed ilikecake closed 1 year ago
This is curious. It seems that the piece of code that is annotated with variable values in the forum in _wait_for_msg()
should only be executed for the PUBLISH
message type, i.e. when the server publishes a message to the client. Here, it seems that it is trying to interpret variable payload that is not consistent with the message type. The length 12320 corresponds to bytes 0x30 0x20. Anyway, it is wildly improbable that the server uses such a long topic length.
For connect()
, assuming it got the CONNACK
message type, it should return from _wait_for_msg()
right here: https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/blob/4c0c7706b1aab9301c608ba0241953702b280ee1/adafruit_minimqtt/adafruit_minimqtt.py#L912-L913 i.e. before the code that lead the exception.
Note that the line numbers do not match as I am using the latest code, but you get the idea.
It would make sense to get the value of res[0]
and ideally also complete traffic dump for the TCP connection between the MQTT client and the broker when this problem happens. Also, I wonder what kind of software is the MQTT broker running.
Anyhow, the code should be more resilient against internally inconsistent messages.
There is some sort of problem when a MQTT message spans multiple TCP segments - at least in CPython. Only the data from the first segment is read in a single _wait_for_msg()
call, even though the correct size is passed to _sock_exact_recv()
, the rest of the buffer is filled with zero bytes. The remaining segments of the message payload are read in the next _wait_for_msg()
call and are interpreted as MQTT headers, causing all sorts of mayhem.
When connected to test.mosquitto.org
and subscribed to the #
topic, adafruit_minimqtt does not survive more than couple of seconds.
@vladak Did https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/pull/132 address this problem, can we close the issue?
Nope, #132 merely prevents the problem from propagating to the point where allocation is attempted.
Trying to debug the issue, I can see there are actually 2 functions named _sock_exact_recv
in the MQTT class ! Only one of them is being used - the latter defined one.
If CircuitPython on the Raspberry uses backward compatible socket implementation, then the root cause is clear: https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/blob/0e5bb0c5a1e25c339131a9d7699f40f6b39fe190/adafruit_minimqtt/adafruit_minimqtt.py#L990-L993
Unlike the ESP32 part (or the other _sock_exact_recv()
implementation that is not used anymore), it does not use the actual number of bytes read into the array. Hence, partial read leads to the observed problem.
There are some preexisting issues like #107 that share the symptoms (negative allocation count), however these seem to happen in the ESP32 branch in _sock_exact_recv()
.
For the record, the double definition of _sock_exact_recv
is detected by mypy
:
$ mypy adafruit_minimqtt/adafruit_minimqtt.py
adafruit_minimqtt/adafruit_minimqtt.py:977: error: Name "_sock_exact_recv" already defined on line 316 [no-redef]
Found 1 error in 1 file (checked 1 source file)
so perhaps running this in Github action for PRs is in order.
I apologize if this is just an error on my part, but I am having trouble getting this library to work with a Raspberry Pi Pico W. Please see this thread for details. It appears that code is attempting to allocate a negative memory size, which is obviously a problem. I don't know enough about the MQTT protocol or debugging Wifi to find the issue.
Is this a bug in the MQTT library, or is my code the problem?