adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
MIT License
3.95k stars 1.16k forks source link

ESP32-S3 ADC use causes crashes when WiFi in use #9291

Open Timeline8 opened 1 month ago

Timeline8 commented 1 month ago

CircuitPython version

1) Adafruit CircuitPython 9.0.5 on 2024-05-22; Waveshare ESP32-S3-Zero with ESP32S3
2) Adafruit CircuitPython 9.1.0-beta.3 on 2024-05-22; Waveshare ESP32-S3-Zero with ESP32S3
3) Adafruit CircuitPython 9.0.4 on 2024-04-16; Adafruit Feather ESP32-S3 TFT with ESP32S3

Code/REPL

import gc
import time
import board
import neopixel
from rainbowio import colorwheel
from adafruit_thermistor import Thermistor

led = neopixel.NeoPixel(board.NEOPIXEL, 1, brightness=0.1)

# Setup thermistor for readings
therm7 = Thermistor(board.A0, 10000, 10000, 25, 3695, high_side=True)  # pin, resistor, nom_thermistor, nom_temp
therm8 = Thermistor(board.A1, 10000, 10000, 25, 3695, high_side=True)

def get_average_temp(pin):
    readings = []

    for _ in range(5):
        reading = pin.temperature
        readings.append(reading)
        time.sleep(0.02)  # 20ms delay between readings

    average_temp_c = sum(readings) / len(readings)  # Average the C reading
    average_temp_f = (average_temp_c * 1.9) + 32  # Convert the Ave C Reading to F

    return average_temp_c, average_temp_f

# main loop
count = 0
while True:
    count += 1
    print(count, f"{gc.mem_alloc()=}")

    average_temp_c, average_temp_f = get_average_temp(therm7)
    print(
        f"   Average therm7 Reading : {average_temp_c:.0f}\u00b0C {average_temp_f:.0f}\u00b0F"
    )

    average_temp_c, average_temp_f = get_average_temp(therm8)
    print(
        f"   Average therm8 Reading : {average_temp_c:.0f}\u00b0C {average_temp_f:.0f}\u00b0F\n\n"
    )

    for x in range(3):
        led.fill(colorwheel((time.monotonic() * 50) % 255))  # change Neopixel color
        time.sleep(1)

Behavior

Various failures but usually crashes share in common: MU pops up “Could not find an attached drive”, Mac OS pops up “Disk Not Ejected Properly”, MU of course has closed the serial window so nothing to see. Printing gc.mem_allocat() with each loop in my code shows allocated memorial in the 4000-8000 range so no apparent run away memory issues.

Sometimes the board will disconnect, come back, code stays running but the Neopixel is steady white like it is in the REPL. Other times it crashed with 3x yellow blinking (Safe mode) and reports an internal watchdog timer expired.

I have an S2 board that is on 9.0.4 and has been running this code for many weeks and sending the data to an IO feed. No chronic crashed like the S3 boards.

Description

What follows is the long list of notes I have been taking as I tried different things. But the above, in behavior, is the executive summary. Below is tedious reading. Sorry...

Testing notes:

Waveshare ESP32-S3 Zero running 9.0.5 and libraries updated via Circup is starting with the “code chooser” code discussed here https://forums.adafruit.com/viewtopic.php?t=210926 starting with the 6th post down.

Code I am running (“choosing”) is a dual thermistor reading in a roughly 3+ second long loop that reads two thermistors and then changes the color of the Neopixel 3 times once per second.

Crashes share in common: MU pops up “Count not find an attached drive”, Mac OS pops up “Disk Not Ejected Properly”, MU of course has closed the serial window so nothing to see. Printing gc.mem_allocat() with each loop in my code shows allocated memorial in the 4000-8000 range so no apparent run away memory issues.

First time I had no display configured so I could not see an error and the Neopixel wasn’t indicating any activity. Added code to display serial window on external display. Reset board with reset button. I failed to note if the drive had reloaded itself after the crash and before reseting the board.

Second time same “crash”. Observed the Neopixel to be constant white indicating it was in the REPL, however the external display showed the code was still running and getting valid thermistor readings. Crashed happened around 290-300 loops. CIRCUITPU remounted its drive I believe but not certain. I let it run for a while longer then reset the board with the reset button.

Third time crashed at loop 272, this time stopping and Neopixel flashing yellow in three blink bursts (safe mode). Reopening MU serial window, failed due to ”Internal watchdog timer expired.” Noted for sure that the CIRCUITPY drive had remounted. Ejected drive and power cycled the board by unplugging USB cable.

Fourth time crashed at loop 233 (gc.mem_alloc at 5568). Same as third run with code stopped, three yellow flashes, and “Internal watchdog timer expired” in the reopened MU serial window.

Switching gears… Renamed the “code chooser” program from code.py and made my thermistor code code.py so it will load and run directly without the chooser reseting the MCU. Also power cycle reset the board.

Different type of crash this time. At loop 53 (mem = 5168). Drive did not unmount and the error in the REPL is

Traceback (most recent call last): File "code.py", line 66, in File "code.py", line 46, in get_average_temp File "adafruit_thermistor.py", line 126, in temperature File "adafruit_thermistor.py", line 116, in resistance ZeroDivisionError: division by zero

Odd. Normally I use 10k resistors with my 10k thermistor but this time I only had 1k resistors on hand. But I wouldn’t think that should matter. Source code for the library doesn’t indicate any restrictions on the resistor range. I believe this failure is just a result of random values when no thermistor is attached.

Ran again and it made it to run 72 but same divide by zero error. Switched to 10k resistors. Hard reset. Made it to run 38, with the previously described crash scenario (disk eject & reconnect, safe mode with an “Internal watchdog timer expired” error) is back. Done for the night!

Next day. Backed up entire Waveshare CIRCUITPY drive. Ran one more time as is. Crashed with the Neopixel showing steady white (REPL indicator) but code was still running. MU and Mac OS both reported drive ejected. Board did not remount and MU doesn’t see it.

Adafruit REV TFT S2 Feather. Copied over all the files that were on the Waveshare. Also verified 9.0.5 and ran Circus to verify all libraries were up to date (all were). Commented out all code that had anything to do with the external display. Thermistors on breadboard changed from D6 and D7 to A0 and A1. No failures after a few hours.

Switched back to Waveshare and ran as is. Eventually failed with the REPL white neopixel, ejected disk, but kept running. Drive did not remount. Did full reinstall of boot loader then 9.0.5. Copied over backed up files onto the MCU again. Hard power cycle reset. Restarted code. Crashed at cycle 288, 3 yellow blink safe mode and “Internal watchdog timer expired” and drive remounted.

Commented out all thermistor stuff and just ran the neopixel and gc memory allocation. Ran 13908 loops without issue (over 12 hours). Uncommented thermistor code and restarted the run (hard reset). Made it 457 loops (a little over 20 minutes) and crashed with the board disconnecting and the TFT fade to black and back in about 3 second pulses.

Restarted as is after getting home from work. Got to about 275, white NeoPixel, still running code, and disconnected. Moved it to a power supply connection only (not computer) and restarted. Looks like it crashed the same way with white Neopixel and code still displaying new lines.

Copied same drive contents to S3 TFT Feather running 9.0.4 and started it on the computer (no thermistors connected). S3 TFT Feather crashed, disconnected, reconnected and reports Safe Mode for Internal Watchdog timer expired. Restarted S3 TFT Feather. Dies same way.

Additional information

No response

dhalbert commented 2 weeks ago

Will this be fixed under the next beta release? 9.1.0-beta.4 or 9.1.1 or whatever the next release will be?

Yes, and it is already fixed in builds with PR9325 in the filename or later. Download from https://adafruit-circuit-python.s3.amazonaws.com/index.html?prefix=bin

Timeline8 commented 2 weeks ago

You can download the "Absolute Newest" build

Oh ya, totally forgot about that link on those pages. Duh! Thanks.

Timeline8 commented 2 weeks ago

Appreciate everyone's help, but I am still experience problems. Back to the Wavershare S3 Zero board, I downloaded _adafruit-circuitpython-waveshare_esp32_s3_zero-en_US-20240613-main-PR9325-03e42a8.uf2_ and installed it. Did it a couple times due to still having problems. The boot_out.txt reads:

Adafruit CircuitPython 9.1.0-beta.3-28-g03e42a8c0c on 2024-06-13; Waveshare ESP32-S3-Zero with ESP32S3 Board ID:waveshare_esp32_s3_zero UID:437BAD9541C4

There is a more recent version by one day but it is (name truncated) ...PR9318-ed5591c.uf2 so I didn't try that one as 9318 is earlier than 9325.

Am downloading the correct version (first file name listed above)? And does the text_out indicate the contents match the file name?

I ask because as I am playing with it I am experiencing disconnects, resets, and the full brightness WHITE NeoPixel issue. However the code either restarts or remains running (see it on my TFT) in the case of the white neopixel. I have not experienced any safe mode crashed due to internal watchdog, so I guess that is something.

dhalbert commented 2 weeks ago

Sorry to hear that. Which pin are you using, and do you know whether it's an ADC1 or an ADC2 pin? The UF2 you're downloading is correct. PR's (pull requests) are not merged in order so the order is not significant.

Timeline8 commented 2 weeks ago

D7 & D8 so IO7 & IO8 which go to GPIO7 & GPIO8 of the S3 (fortunately Waveshare kept numbers the same on the board vs the ESP32) which according to the datasheet for the ESP32-S3 is ADC1

(cut & pasted from table in datasheet)

ADC1_CH6 GPIO7 ADC1_CH7 GPIO8

My full code that I am running at this moment where I am seeing this is ...

import gc
import time
import board
import neopixel
from random import randint
import busio
import displayio
from fourwire import FourWire
from adafruit_st7789 import ST7789
from adafruit_thermistor import Thermistor

displayio.release_displays()

spi = busio.SPI(clock=board.D1, MOSI=board.D2)
tft_res = board.D3
tft_dc = board.D4
tft_cs = board.D5
tft_blk = board.D6  # TFT's backlight control

display_bus = FourWire(spi, command=tft_dc, chip_select=tft_cs, reset=tft_res)
display = ST7789(
    display_bus,
    width=240,
    height=135,
    rowstart=40,  # (320 - width) / 2
    colstart=53,  # (240 - height) / 2
    rotation=270,
    backlight_pin=tft_blk,
)

# Row and column start above are because ST7789 driver is for 320x240 display
# size so need to center real display in that virtual space.

display.brightness = 1.0  # between 0 and 1

led = neopixel.NeoPixel(board.NEOPIXEL, 1, brightness=0.03)

# Setup thermistor for readings
# pin, resistor, therm, @temp, beta, therm on high side
therm7 = Thermistor(board.D7, 10000, 10000, 25, 3695, high_side=True)
therm8 = Thermistor(board.D8, 10000, 10000, 25, 3695, high_side=True)

def get_average_temp(pin):
    readings = []

    for _ in range(5):
        reading = pin.temperature
        readings.append(reading)
        time.sleep(0.02)  # 20ms delay between readings

    average_temp_c = sum(readings) / len(readings)  # Average the C reading
    average_temp_f = (average_temp_c * 1.9) + 32  # Convert the Ave C Reading to F

    return average_temp_c, average_temp_f

# main loop
count = 0
while True:
    count += 1
    print(count, f"{gc.mem_alloc()=}")

    average_temp_c, average_temp_f = get_average_temp(therm7)
    print(
        f"  therm7 : {average_temp_c:.1f}\u00b0C {average_temp_f:.1f}\u00b0F"
    )

    average_temp_c, average_temp_f = get_average_temp(therm8)
    print(
        f"  therm8 : {average_temp_c:.1f}\u00b0C {average_temp_f:.1f}\u00b0F\n\n"
    )

    led[0] = (randint(0, 255), randint(0, 255), randint(0, 255))

    time.sleep(0.1)
bill88t commented 1 week ago

Adding this here to note how the bug looks with #9344. (Link goes to the Adafruit Discord server) https://discord.com/channels/327254708534116352/327298996332658690/1252425213765877882

Timeline8 commented 1 week ago

CONFIG_FREERTOS_UNICORE=y, so that only one core is used: problem seems to go away. That would explain why ESP32-S2,which has only one core, doesn't have the problem.

Circling back to this comment, which was before you thought it was fixed only for us to find it may not be, is this statement still true? And if so, is this something that can be done through CircuitPython?

I ask because as I try to decide which single configuration to use going forward on my project using S3 modules would be great if I could just configure it to one core if that hides the problem for now. If not, I may be looking at the Pi Pico W as my core board. The idea being a single design using the same components and configuration (MCU, display, sensors, etc.) in order to facilitate having any change or update later be an easy roll out to all my deployments.