adafruit / Adafruit_CircuitPython_MiniMQTT

MQTT Client Library for CircuitPython
Other
80 stars 49 forks source link

Lengthy delays in .loop() #126

Closed jddmartin closed 1 year ago

jddmartin commented 1 year ago

There is a recent commit 48aba3c63fc3384f426bd08ededff7d6c2d4fa98 which causes a lengthy delay in calls to .loop() in one of my test programs, which did not occur prior to this commit.

EDIT: Apologies, I made a mistake in identifying a specific commit. The commit tagged 5.5.2 shows the lengthy delay, and commits tagged 5.5.1 and 5.5.0 do not show the problem.

I am using an ethernet featherwing attached to a rp2040 feather. Running the test code below (a modified version of example programs for minimqtt) before and after commit shows a lengthy (several second) delay introduced in calls to .loop() (the third last line in code). Whereas prior to this commit, loop returns quickly; i.e., on order of specified timeout.

I'm running circuitpython 7.3.3, but I get identical slow behaviour with 8.0.0 and most recent version of minimqtt.


import board
import busio
from digitalio import DigitalInOut

from adafruit_wiznet5k.adafruit_wiznet5k import WIZNET5K
import adafruit_wiznet5k.adafruit_wiznet5k_socket as socket
import adafruit_minimqtt.adafruit_minimqtt as MQTT

print("starting")

cs = DigitalInOut(board.D10)
spi_bus = busio.SPI(board.SCK, MOSI=board.MOSI, MISO=board.MISO)

print("Finished spi")

# Initialize ethernet interface with DHCP
eth = WIZNET5K(spi_bus, cs)

print("My IP address is:", eth.pretty_ip(eth.ip_address))

print("Finished ethernet")

# pylint: disable=unused-argument
def message(client, feed_id, payload):
    # Message function will be called when a subscribed feed has a new value.
    # The feed_id parameter identifies the feed, and the payload parameter has
    # the new value.
    print("Feed {0} received new value: {1}".format(feed_id, payload))

# Initialize MQTT interface with the ethernet interface
MQTT.set_socket(socket, eth)

# Initialize a new MQTT Client object
mqtt_client = MQTT.MQTT(
    broker="192.168.1.200",
    port = 1883,
    is_ssl = False
#    username=secrets["aio_username"],
#    password=secrets["aio_key"],
)

print("trying to connect")
mqtt_client.connect()

i=0
while True:
    mqtt_client.loop(timeout=0.01)
    i += 1
    print(i)
'''
rpavlik commented 1 year ago

I think I am also seeing this, https://discord.com/channels/327254708534116352/537365702651150357/1035672894761619547

dhalbert commented 1 year ago

@calcut Do you have some comments on this apparently new problem?

calcut commented 1 year ago

Hmm, well this bit is new

https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/blob/aec777753afc4460832dc943666de856da41e873/adafruit_minimqtt/adafruit_minimqtt.py#L864-L874

If the mqtt broker keeps providing a response to _wait_for_msg, then it will keep looping for ~10 seconds (_recv_timeout) before giving up.

You could try adding a logger, or at least some print statements to verify if that is what is happening.

calcut commented 1 year ago

have you tried with timeout = 0?

rpavlik commented 1 year ago

I tried with the default timeout of 0 as well as a few various values, though I'm not sure my issue is the same as the original one in this issue description

jddmartin commented 1 year ago

@calcut

Thanks for your response.

Yes, the duration of call is about 10 s (I should have been more precise in my original message).

1) If I set timeout = 0, the .loop() doesn't return at all

2) I have monitored tcp packets between mqtt server and the device. There are only a few packets sent every few iterations of the loop. Does that rule out mqtt broker keeps providing a response to _wait_for_ms?

Unfortunately my knowledge of the MQTT protocol is poor --- I tried to simplify my application code to give minimal working example, but I'm not 100% sure that it is adequate to diagnose problem (for example, I don't subscribe in client in this mwe, so maybe there shouldn't be any communication at all (?). I will try to understand how to add logging/print statements to diagnose, as you suggest.

calcut commented 1 year ago

interesting... I wouldn't call myself an expert either I've really just tested with the Adafruit IO broker, but perhaps other brokers behave differently. Maybe returning some sort of keep-alive or ping response, guessing here.

To debug... my strategy was essentially:

You could try printing all responses received, something like this:

 while True: 
     rc = self._wait_for_msg(timeout) 
     if rc is None: 
         break
     print(f"received: {rc}")
     if time.monotonic() - stamp > self._recv_timeout: 
         if self.logger is not None: 
             self.logger.debug( 
                 f"Loop timed out, message queue not empty after {self._recv_timeout}s" 
             ) 
         print(f"timed out after {self._recv_timeout}s")
         break 
     rcs.append(rc) 
rpavlik commented 1 year ago

I was using a locally-running mosquitto broker, if that helps

jinglemansweep commented 1 year ago

This is definitely a problem for me too. I can't seem to get any combination of minimqtt running on a Matrix Portal M4.

The loop call hangs for several seconds. Even with Asyncio, this blocks other coroutines and prevents updates to the RGB matrix.

This is pretty bad for a device that claims to be IoT friendly and doesn't even have a useable MQTT client.

calcut commented 1 year ago

I'll take another look today. What broker are you connecting to @jinglemansweep ? As I mentioned, I haven't had a problem using with AdafruitIO, so maybe other brokers behave differently.

jinglemansweep commented 1 year ago

I'm actually considering taking a look myself after work commitments today. I'm no expert on MQTT protocol though.

I'm using Mosquitto mainly because I want to integrate with Home Assistant. Having a self hosted broker is a must for my uses unfortunately.

Happy to provide any help if needed.

calcut commented 1 year ago

I've just tested with mosquitto running on a RPi, and also on a Mac. I didn't do any configuration of mosquitto except to turn off the security with

listener 1883 0.0.0.0
allow_anonymous true

I started with the simpletest: https://raw.githubusercontent.com/adafruit/Adafruit_CircuitPython_MiniMQTT/main/examples/minimqtt_simpletest.py

which appears to work fine (assuming the broker ip and port are specified in secrets.py)

I also added this, but didn't observe any delays in the loop() function

while True:
    mqtt_client.loop()
    print("loop done")
    time.sleep(0.5)

That was with ESP32-S2 and Circuitpython 7.3.1 I have a metro M4 somewhere, I'll try with that too.

calcut commented 1 year ago

" I have a metro M4 somewhere, I'll try with that too." Or not! that board doesn't have networking haha

calcut commented 1 year ago

interesting... I wouldn't call myself an expert either I've really just tested with the Adafruit IO broker, but perhaps other brokers behave differently. Maybe returning some sort of keep-alive or ping response, guessing here.

To debug... my strategy was essentially:

* remove the adafruit_minimqtt.mpy file from your CIRCUITPY/lib/adafruit_minimqtt folder (or rename it with a different extension)

* replace it with a .py version, e.g. https://raw.githubusercontent.com/adafruit/Adafruit_CircuitPython_MiniMQTT/aec777753afc4460832dc943666de856da41e873/adafruit_minimqtt/adafruit_minimqtt.py

* add print statements or other modifications

You could try printing all responses received, something like this:

 while True: 
     rc = self._wait_for_msg(timeout) 
     if rc is None: 
         break
     print(f"received: {rc}")
     if time.monotonic() - stamp > self._recv_timeout: 
         if self.logger is not None: 
             self.logger.debug( 
                 f"Loop timed out, message queue not empty after {self._recv_timeout}s" 
             ) 
         print(f"timed out after {self._recv_timeout}s")
         break 
     rcs.append(rc) 

@jinglemansweep Maybe you could try this... the broker may be returning something that causes this loop to continue

calcut commented 1 year ago

Is everyone who is having a problem using non-native networking I wonder?

This function executes quite differently for "ESP32SPI", which will be what the M4 Matrix Portal uses.

https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/blob/aec777753afc4460832dc943666de856da41e873/adafruit_minimqtt/adafruit_minimqtt.py#L976-L1005

I guess it doesn't return None, but I don't have a suitable board to check what it does return.

jinglemansweep commented 1 year ago

I will give this a go later this afternoon. I actually have another Matrix Portal M4 arriving today which I'll use for testing.

calcut commented 1 year ago

great. poking a little deeper, it looks like

_sock_exact_recv should raise OSError(errno.ETIMEDOUT) https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/blob/aec777753afc4460832dc943666de856da41e873/adafruit_minimqtt/adafruit_minimqtt.py#L989

which should be caught by

_wait_for_msg https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/blob/aec777753afc4460832dc943666de856da41e873/adafruit_minimqtt/adafruit_minimqtt.py#L894

and cause the loop() to exit.

If that isn't happening, see if you an figure out what is going on.

jddmartin commented 1 year ago

@calcut thank you for suggestion on diagnostics and sorry for the delay in responding.

When I put in those suggested print statements (8th message in this thread), I get: received: 0
received: 0
received: 0
...
received: 0
received: 0
timed out after 10s

Thanks.

jinglemansweep commented 1 year ago

interesting... I wouldn't call myself an expert either I've really just tested with the Adafruit IO broker, but perhaps other brokers behave differently. Maybe returning some sort of keep-alive or ping response, guessing here.

To debug... my strategy was essentially:

You could try printing all responses received, something like this:


 while True: 
     rc = self._wait_for_msg(timeout) 
     if rc is None: 
         break
     print(f"received: {rc}")
     ...

So, I'm getting 0 instead of None. That's with no messages queued or being received, in its idle waiting state:

received: 0
received: 0
received: 0
<snip>

Changing the above line to:

if rc is None or rc == 0:
    break

... fixes the issue. My RGB matrix updates with a decent framerate, with a mqtt_client.loop(timeout=0.001) call.

I can confirm that I can publish MQTT messages and my application receives them and logs them without blocking the RGB matrix updates:

MANAGER::RUN
MANAGER::LOOP
Subscribed to test/topic with QOS level 1
New message on topic test/topic: HELLLLO
New message on topic test/topic: HELLLLO

I'm using my project for testing: https://github.com/jinglemansweep/matrix-portal-asyncio

I'm happy to submit a PR, but I only have Matrix Portal M4's to test with, so my change could break other boards.

jinglemansweep commented 1 year ago

Actually, looking at _wait_for_msg function:

# Block while we parse the rest of the response
self._sock.settimeout(timeout)
print(res)
if res in [None, b""]:
    # If we get here, it means that there is nothing to be received
    return None

Adding the print statement returns the following:

bytearray(b'\x00')
bytearray(b'\x00')
bytearray(b'\x00')
bytearray(b'\x00')

So, the socket is receiving constant single zero bytes. Perhaps a better fix would be adding 0 to the "nothing received" blacklist?

if res in [None, b"", 0]:
    # If we get here, it means that there is nothing to be received
    return None
jinglemansweep commented 1 year ago

Okay, that doesn't work for some reason. If _wait_for_msg returns None at that stage, it appears to cause too many polls, locking up the RGB matrix and eventually causing a MemoryError:

  File "/lib/adafruit_minimqtt/adafruit_minimqtt.py", line 874, in loop
  MemoryError: memory allocation failed, allocating 32768 bytes
jinglemansweep commented 1 year ago

Okay, my mistake. That line should probably be:

if res in [None, b"", bytearray(b'\x00')]:
    # If we get here, it means that there is nothing to be received
    return None

With just that change, everything appears to be working as expected.

jddmartin commented 1 year ago

The most recent suggestion of @jinglemansweep fixes the issue for me (RP2040 feather, with ethernet feather wing, talking to mosquitto broker on LAN).

calcut commented 1 year ago

nice work @jinglemansweep, you up for doing a PR to fix it?

jinglemansweep commented 1 year ago

Sure. I have a branch ready but didn't have permission to push to it

dhalbert commented 1 year ago

Sure. I have a branch ready but didn't have permission to push to it

This is on your end, you mean? A PR is fine.

jinglemansweep commented 1 year ago

Yes, I usually create a branch, push my changes to my branch and submit a PR to get it merged into master/main

I get permission denied when pushing my branch:

ERROR: Permission to adafruit/Adafruit_CircuitPython_MiniMQTT.git denied to jinglemansweep.
fatal: Could not read from remote repository.
dhalbert commented 1 year ago

Yes, I usually create a branch, push my changes to my branch and submit a PR to get it merged into master/main

I get permission denied when pushing my branch:

ERROR: Permission to adafruit/Adafruit_CircuitPython_MiniMQTT.git denied to jinglemansweep.
fatal: Could not read from remote repository.

Did you fork the repo? If you want to make a branch, do it on your fork; you can't make a branch on the adafruit repo and push it (you could make it only locally, but better to fork). We have a guide here: https://learn.adafruit.com/contribute-to-arduino-with-git-and-github

jinglemansweep commented 1 year ago

Sorry, just realised you use the fork model:

My PR from my fork is here: https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/pull/127

dhalbert commented 1 year ago

Fixed by #127, but could use further diagnosis. Redid issue in #128.