adafruit / Adafruit_CircuitPython_MiniMQTT

MQTT Client Library for CircuitPython
Other
72 stars 50 forks source link

Errors with MQTT CircuitPython 9 Publish #211

Open matthewjerome opened 3 months ago

matthewjerome commented 3 months ago

We have an implementation of MiniMQTT that has been quite stable in CircuitPython 8 but started to display multiple issues when we tested in CircuitPython 9 (9.0.3 and below) on multiple ESP32-S3 boards.

The code base sends some API requests over HTTP GET/POST using the Adafruit_CircuitPython_Requests library (with no issues) and then continues on to connect to the MQTT server over SSL to send sensor observations every few seconds. We tried updating to use adafruit_connection_manager for socket_pool and ssl_context but this does not seem to fix the issue. We also tried to run a reconnect() on errors as described in #194 but no luck. It is interesting because sometimes the code runs fine for an hour or more but then most often it causes errors within the first few minutes if not immediately.

We tried various bundles both with *.py source code, *.mpy bundle pre-built and self-built *.mpy variations with no luck and similar observations. The bundle dates we have tried out are : 20240402, 20240307 and 20240224.

The most common errors seen (but not necessarily together) are :

OSError(11,)
OSError(104,)
MemoryError()
MMQTTException('Repeated connect failures',)

Less Common Errors (maybe stemming from the common errors above ? ) :

OSError(128,)
<WatchDogTimeout>
gaierror(-2, 'Name or service not known')

Sample Code :

import wifi
import adafruit_minimqtt.adafruit_minimqtt as MQTT

radio = wifi.radio
pool = adafruit_connection_manager.get_radio_socketpool(radio)
ssl_context = adafruit_connection_manager.get_radio_ssl_context(radio)

mqtt = MQTT.MQTT(
    broker=...,
    port=...,
    is_ssl=True,
    socket_pool=pool,
    ssl_context=ssl_context,
    username=...,
    password=...,
)

try:
    mqtt.publish(topic, message)
except:
    mqtt.reconnect()

Are there any additional debugging steps that we can try out or are there other things we should consider when upgrading to CP9? Thank you!

dhalbert commented 3 months ago

There is a core CircuitPython issue open that may match what you're seeing: https://github.com/adafruit/circuitpython/issues/9123. Feel free to contribute to that issue.

Are you using CIRCUITPY_WIFI_SSID and CIRCUITPY_WIFI_PASSWORD, which will enable the web workflow and cause a connection in advance?

Clearly there is some new problem, and we have to track it down. There's no reason you can't stay on 8.x.x for now.

matthewjerome commented 3 months ago

Thanks for the quick response and pointing us to this issue Dan.

For our implementation. We are not using the web workflow but wifi.radio.connect( ssid=..., password=... ) instead. Will add this to the list to try.

DJDevon3 commented 2 months ago

I have a weather related script running on an Adafruit ESP32-S3 Feather with 9.0 that publishes to AdafruitIO. It's been just as stable as 8.x but I'm probably catching and ignoring or resetting the board for every possible error now. It's more work but it is possible to catch most of those errors with a lot of try/except's and as a last resort do a supervisor.reload.

It might be helpful to see how to catch the errors including the -2 gaierror. Feather Weather MQTT Touch

In my experience I would get gaierrors in both 8.x and 9.0. Only real way to make it stable is to catch all the errors. I've been working on my script for about 2+ years though and use it every day sitting in front of my pc. I've had plenty of time to refine it in the long term.

A lot of the errors come down to wifi disconnections mid-script or DNS lookup failures, or socket errors, the list goes on the point is there are a lot of errors and you have to catch them. A lot of the time.sleep's in my error handling are the approximate amount of time it takes for my wifi to reconnect if it goes down or other reasons. Catching errors and your wifi might have different behavior so you have to customize the way you handle errors for your script. There is no silver bullet solution.

I'm running 9.0 from March 19th, 2024, so about a month now total. My ability to publish with miniMQTT to AdafruitIO has been 99.9% solid. I can show my adafruitio dashboard as proof but just take my word for it. It is possible to get things stabilized with error handling.

matthewjerome commented 2 months ago

thank you @DJDevon3 for sharing your project and experience. I will compare our code and see where we can make some improvements.

mmartinortiz commented 2 months ago

I'm also facing issues when I try to connect an ESP32 running CircuitPython 9.0.4 to a MQTT broker. This is my code:

import os
import time
import ssl
import json
import alarm
import board
import socketpool
import wifi
import adafruit_minimqtt.adafruit_minimqtt as MQTT
import adafruit_bme680
import adafruit_logging as logging

PUBLISH_DELAY = 60
MQTT_TOPIC = "grenhouse"
USE_DEEP_SLEEP = True

# Connect to the Sensor
i2c = board.I2C()
sensor = adafruit_bme680.Adafruit_BME680_I2C(i2c)

wifi.radio.connect(
    os.getenv("CIRCUITPY_WIFI_SSID"), os.getenv("CIRCUITPY_WIFI_PASSWORD")
)
print(f"Connected to {os.getenv('CIRCUITPY_WIFI_SSID')}")
print(f"My IP address: {wifi.radio.ipv4_address}")

# Create a socket pool
pool = socketpool.SocketPool(wifi.radio)

# Set up a MiniMQTT Client
mqtt_client = MQTT.MQTT(
    broker=os.getenv("MQTT_BROKER"),
    port=os.getenv("MQTT_PORT"),
    username=os.getenv("MQTT_USERNAME"),
    password=os.getenv("MQTT_PASSWORD"),
    socket_pool=pool,
    ssl_context=ssl.create_default_context(),
)

mqtt_client.logger = logging.getLogger()

while not mqtt_client.is_connected():
    print(f"Attempting to connect to {mqtt_client.broker}")
    try:
        mqtt_client.connect()
    except Exception as exp:
        print(exp)
        time.sleep(5)

while True:
    temperature = sensor.temperature
    humidity = sensor.humidity
    pressure = sensor.pressure
    gas = sensor.gas

    output = {
        "temperature": temperature,
        "humidity": humidity,
        "pressure": pressure,
        "gas": gas,
    }

    print("Publishing to %s" % MQTT_TOPIC)
    mqtt_client.publish(MQTT_TOPIC, json.dumps(output))

    if USE_DEEP_SLEEP:
        mqtt_client.disconnect()
        pause = alarm.time.TimeAlarm(monotonic_time=time.monotonic() + PUBLISH_DELAY)
        alarm.exit_and_deep_sleep_until_alarms(pause)
    else:
        last_update = time.monotonic()
        while time.monotonic() < last_update + PUBLISH_DELAY:
            mqtt_client.loop()

This is the log:

Connected to WIFI
My IP address: 192.168.1.79
Attempting to connect to 192.168.1.91
69.789: WARNING - Socket error when connecting: Socket already connected to mqtt://192.168.1.91:1883
69.800: WARNING - Socket error when connecting: Socket already connected to mqtt://192.168.1.91:1883
69.812: WARNING - Socket error when connecting: Socket already connected to mqtt://192.168.1.91:1883
69.825: WARNING - Socket error when connecting: Socket already connected to mqtt://192.168.1.91:1883
Repeated connect failures

The logs do not say which line of the _connect() function raised the exception with message "Socket already connected".

How could I debug this further?

justmobilize commented 2 months ago

Since you are setting the logger, I would set it to debug so you can see what it's logging.

mmartinortiz commented 2 months ago

Since you are setting the logger, I would set it to debug so you can see what it's logging.

That got me on track. Thanks

My connection was unauthorised. I used the same credentials that I used for another MQTT client. I had to add a new user to HomeAssistant in order to log values with my ESP32 device. Some details about this in this other HA discussion

Without using a logger, I would have expected that an unauthorised access would be reported as an error by MiniMQTT and not just as "Repeated connect failures"

justmobilize commented 2 months ago

Well, glad to get that solved. I can look at auth and see if there's something that can be fixed there.

justmobilize commented 1 month ago

@matthewjerome are you willing to retry? There have been some fixes for the ESP32-S3 in the latest release.