adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
MIT License
3.98k stars 1.17k forks source link

sequence of ConnectionError raised from wifi on QtPy eventually leads to MemoryError on import wifi #7448

Open vladak opened 1 year ago

vladak commented 1 year ago

CircuitPython version

Adafruit CircuitPython 8.0.0-beta.6 on 2022-12-21; Adafruit QT Py ESP32S2 with ESP32S2

Code/REPL

import wifi
import supervisor
import time
import traceback
from secrets import secrets

try:
    wifi.radio.connect(secrets["ssid"], secrets["password"], timeout=10)

    # send some data to MQTT broker here, not sure if this matters
except Exception:
    print("Code stopped by unhandled exception:")
    print(traceback.format_exception(None, e, e.__traceback__))
    RELOAD_TIME = 10
    print(f"Performing a supervisor reload in {RELOAD_TIME} seconds")
    time.sleep(RELOAD_TIME)
    supervisor.reload()

Behavior

When wifi.connect() fails with ConnectionError repeatedly, it will eventually result in MemoryError raised when import wifi is done. This looks like a resource leak.

To reproduce this, one would need to simulate WiFi problems somehow (possibly use non-existent SSID, although I have not tried this). In my case the exception report goes like this:

  1. the MiniMQTT publish() fails with MMQTTException: No data received from broker for 10 seconds. I guess this is when the WiFi flakiness of the microcontroller takes place. As a result supervisor.reload() is done.
  2. wifi.connect() fails with ConnectionError: No network with that ssid. Another supervisor.reload() is done. This is repeated some 27 times.
  3. MemoryError: Failed to allocate Wifi memory is thrown which stops the microcontroller finally because this exception is not handled in the code (because it happens in the global scope, not within main()). The QtPy will start flashing the LED diode twice with red color, every 5 seconds.

In reality this happens with https://github.com/vladak/shield/blob/master/code.py however I don't think the rest of the code matters. Maybe some data need to be sent over the WiFi in order to enter the state when it fails to work even after supervisor.reload().

When this happens, other microcontrollers (ESP32 based Feathers and QtPy's) on the same WiFi network have no trouble communicating with the MQTT broker.

Description

I am filing this issue for the resource leak, however I suspect there is another problem that leads to the unrecoverable WiFi flakiness. It could be a firmware or HW issue as I don't see this on identical (both SW and HW wise) QtPy.

Additional information

See https://github.com/vladak/shield/issues/13 for complete log of the program and proposed ways to workaround this.

dhalbert commented 1 year ago

Though Espressif has fixed and closed a number of storage leak bugs related to repeated esp_wifi_init() and esp_wifi_deinit() calls, it appears there is at least one leak still present: see https://github.com/espressif/esp-idf/issues/8446, particularly https://github.com/espressif/esp-idf/issues/8446#issuecomment-1400780650, which describes a scenario similar to what we do.