Timeline8 commented 4 weeks ago

CircuitPython version

1) Adafruit CircuitPython 9.0.5 on 2024-05-22; Waveshare ESP32-S3-Zero with ESP32S3
2) Adafruit CircuitPython 9.1.0-beta.3 on 2024-05-22; Waveshare ESP32-S3-Zero with ESP32S3
3) Adafruit CircuitPython 9.0.4 on 2024-04-16; Adafruit Feather ESP32-S3 TFT with ESP32S3

Code/REPL

import gc
import time
import board
import neopixel
from rainbowio import colorwheel
from adafruit_thermistor import Thermistor

led = neopixel.NeoPixel(board.NEOPIXEL, 1, brightness=0.1)

# Setup thermistor for readings
therm7 = Thermistor(board.A0, 10000, 10000, 25, 3695, high_side=True)  # pin, resistor, nom_thermistor, nom_temp
therm8 = Thermistor(board.A1, 10000, 10000, 25, 3695, high_side=True)

def get_average_temp(pin):
    readings = []

    for _ in range(5):
        reading = pin.temperature
        readings.append(reading)
        time.sleep(0.02)  # 20ms delay between readings

    average_temp_c = sum(readings) / len(readings)  # Average the C reading
    average_temp_f = (average_temp_c * 1.9) + 32  # Convert the Ave C Reading to F

    return average_temp_c, average_temp_f

# main loop
count = 0
while True:
    count += 1
    print(count, f"{gc.mem_alloc()=}")

    average_temp_c, average_temp_f = get_average_temp(therm7)
    print(
        f"   Average therm7 Reading : {average_temp_c:.0f}\u00b0C {average_temp_f:.0f}\u00b0F"
    )

    average_temp_c, average_temp_f = get_average_temp(therm8)
    print(
        f"   Average therm8 Reading : {average_temp_c:.0f}\u00b0C {average_temp_f:.0f}\u00b0F\n\n"
    )

    for x in range(3):
        led.fill(colorwheel((time.monotonic() * 50) % 255))  # change Neopixel color
        time.sleep(1)

Behavior

Various failures but usually crashes share in common: MU pops up “Could not find an attached drive”, Mac OS pops up “Disk Not Ejected Properly”, MU of course has closed the serial window so nothing to see. Printing gc.mem_allocat() with each loop in my code shows allocated memorial in the 4000-8000 range so no apparent run away memory issues.

Sometimes the board will disconnect, come back, code stays running but the Neopixel is steady white like it is in the REPL. Other times it crashed with 3x yellow blinking (Safe mode) and reports an internal watchdog timer expired.

I have an S2 board that is on 9.0.4 and has been running this code for many weeks and sending the data to an IO feed. No chronic crashed like the S3 boards.

Description

What follows is the long list of notes I have been taking as I tried different things. But the above, in behavior, is the executive summary. Below is tedious reading. Sorry...

Testing notes:

Waveshare ESP32-S3 Zero running 9.0.5 and libraries updated via Circup is starting with the “code chooser” code discussed here https://forums.adafruit.com/viewtopic.php?t=210926 starting with the 6th post down.

Code I am running (“choosing”) is a dual thermistor reading in a roughly 3+ second long loop that reads two thermistors and then changes the color of the Neopixel 3 times once per second.

Crashes share in common: MU pops up “Count not find an attached drive”, Mac OS pops up “Disk Not Ejected Properly”, MU of course has closed the serial window so nothing to see. Printing gc.mem_allocat() with each loop in my code shows allocated memorial in the 4000-8000 range so no apparent run away memory issues.

First time I had no display configured so I could not see an error and the Neopixel wasn’t indicating any activity. Added code to display serial window on external display. Reset board with reset button. I failed to note if the drive had reloaded itself after the crash and before reseting the board.

Second time same “crash”. Observed the Neopixel to be constant white indicating it was in the REPL, however the external display showed the code was still running and getting valid thermistor readings. Crashed happened around 290-300 loops. CIRCUITPU remounted its drive I believe but not certain. I let it run for a while longer then reset the board with the reset button.

Third time crashed at loop 272, this time stopping and Neopixel flashing yellow in three blink bursts (safe mode). Reopening MU serial window, failed due to ”Internal watchdog timer expired.” Noted for sure that the CIRCUITPY drive had remounted. Ejected drive and power cycled the board by unplugging USB cable.

Fourth time crashed at loop 233 (gc.mem_alloc at 5568). Same as third run with code stopped, three yellow flashes, and “Internal watchdog timer expired” in the reopened MU serial window.

Switching gears… Renamed the “code chooser” program from code.py and made my thermistor code code.py so it will load and run directly without the chooser reseting the MCU. Also power cycle reset the board.

Different type of crash this time. At loop 53 (mem = 5168). Drive did not unmount and the error in the REPL is

Traceback (most recent call last): File "code.py", line 66, in File "code.py", line 46, in get_average_temp File "adafruit_thermistor.py", line 126, in temperature File "adafruit_thermistor.py", line 116, in resistance ZeroDivisionError: division by zero

Odd. Normally I use 10k resistors with my 10k thermistor but this time I only had 1k resistors on hand. But I wouldn’t think that should matter. Source code for the library doesn’t indicate any restrictions on the resistor range. I believe this failure is just a result of random values when no thermistor is attached.

Ran again and it made it to run 72 but same divide by zero error. Switched to 10k resistors. Hard reset. Made it to run 38, with the previously described crash scenario (disk eject & reconnect, safe mode with an “Internal watchdog timer expired” error) is back. Done for the night!

Next day. Backed up entire Waveshare CIRCUITPY drive. Ran one more time as is. Crashed with the Neopixel showing steady white (REPL indicator) but code was still running. MU and Mac OS both reported drive ejected. Board did not remount and MU doesn’t see it.

Adafruit REV TFT S2 Feather. Copied over all the files that were on the Waveshare. Also verified 9.0.5 and ran Circus to verify all libraries were up to date (all were). Commented out all code that had anything to do with the external display. Thermistors on breadboard changed from D6 and D7 to A0 and A1. No failures after a few hours.

Switched back to Waveshare and ran as is. Eventually failed with the REPL white neopixel, ejected disk, but kept running. Drive did not remount. Did full reinstall of boot loader then 9.0.5. Copied over backed up files onto the MCU again. Hard power cycle reset. Restarted code. Crashed at cycle 288, 3 yellow blink safe mode and “Internal watchdog timer expired” and drive remounted.

Commented out all thermistor stuff and just ran the neopixel and gc memory allocation. Ran 13908 loops without issue (over 12 hours). Uncommented thermistor code and restarted the run (hard reset). Made it 457 loops (a little over 20 minutes) and crashed with the board disconnecting and the TFT fade to black and back in about 3 second pulses.

Restarted as is after getting home from work. Got to about 275, white NeoPixel, still running code, and disconnected. Moved it to a power supply connection only (not computer) and restarted. Looks like it crashed the same way with white Neopixel and code still displaying new lines.

Copied same drive contents to S3 TFT Feather running 9.0.4 and started it on the computer (no thermistors connected). S3 TFT Feather crashed, disconnected, reconnected and reports Safe Mode for Internal Watchdog timer expired. Restarted S3 TFT Feather. Dies same way.

Additional information

No response

dhalbert commented 4 weeks ago

adafruit_thermistor is quite simple. It uses analogio.AnalogIn. I'm thinking there is an ADC problem.

Timeline8 commented 3 weeks ago

Additional observation this morning. Put the S3 TFT feather on to a breadboard so I could hook up physical thermistors. Only change to the code is the resistors I used are in a 1k SIP, so the setup code for the two thermistors changed from 10k to 1k. Crashed in the 194th loop with a divide by zero error that I have been seeing periodically. If I simply try a CTRL-D or a Save in MU to soft boot the code immediately crashes on the first loop. Repeated CTRL-D multiple times in a row to verify. Restarting the code via soft boot does not clear out of memory whatever is causing this failure. Hitting the boards RST button does restart the code properly where it went over 400 loops then went to safe mode with the internal watchdog timer expired error. I have pasted the REPL below of the div by zero message in the 194th loop and then a subsequent Save (soft boot) in MU.

I did notice on my S2 running this code (also have it running on a Pico W without issue), I was taking samples for my averaging function every 50ms rather than the 20ms posted here. Changed my S3 TFT feather to 50ms, but still crashes with the WD timer expired Safe Made message.

194 gc.mem_alloc()=5504
   Average therm7 Reading : 23°C 75°F
Traceback (most recent call last):
  File "code.py", line 40, in <module>
  File "code.py", line 19, in get_average_temp
  File "adafruit_thermistor.py", line 126, in temperature
  File "adafruit_thermistor.py", line 116, in resistance
ZeroDivisionError: division by zero

Code done running.

Press any key to enter the REPL. Use CTRL-D to reload.

Adafruit CircuitPython 9.0.4 on 2024-04-16; Adafruit Feather ESP32-S3 TFT with ESP32S3
>>> [D
... 
>>> 
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
1 gc.mem_alloc()=4768
Traceback (most recent call last):
  File "code.py", line 35, in <module>
  File "code.py", line 19, in get_average_temp
  File "adafruit_thermistor.py", line 126, in temperature
  File "adafruit_thermistor.py", line 116, in resistance
ZeroDivisionError: division by zero

Code done running.

jepler commented 3 weeks ago

The division by zero error is:

        if self.high_side:
            # Thermistor connected from analog input to high logic level.
            reading = self.pin.value / 64
            reading = (1023 * self.series_resistor) / reading

if the analog pin value is 0, then it divides by 0.

Timeline8 commented 3 weeks ago

Which is what I thought might be happening when I was running with no hardware connected to the pins where random noise at the open pins could result in a zero reading. This is why I then made sure to add the thermistors & resistors this morning so there would be no way there should be 0V at the pin and I still experienced the divide by zero fault. It ran the loop 193 times with valid room temperature mid/upper 70°Fs readings for both thermistors then 3 seconds later on loop 194 read the first thermistor then failed reading the second thermistor. Doesn't seem like a hardware problem with my simple circuit

But that is kind of besides the point. I can not get this code to run for any length of time without some sort of crash on an ESP32-S3 where as the same code on an S2 and on a Pico W run none stop day in and day out for weeks now. With the S3 usually it is going into safe mode with an internal WD timer expired failure with the board disconnecting and sometimes reconnecting, sometimes the divide by zero, and still other times with the board disconnecting with the Neopixel going steady white but the code still running where I can see the measurements being reported every 3 seconds on the TFT but obviously the neopixel part of the code doing nothing anymore.

One last test. I removed my thermistors and replaced them with a second 1k SIP so that the A0 and A1 pins are getting a straight fixed resistor voltage divider verified with my meter that both pin are at 1.65V. Reran code as is (so temp reported is 94°C 211°F since I didn't change the thermistor line from 10k@25°C). It managed to run over a half hour before it bombed out with the "Internal watchdog timer expired." safe mode message. At least no divide by zero.

dhalbert commented 3 weeks ago

Just to be clear, you are getting things like the watchdog crash or divide by zero on 9.1.0-beta.3 as well as 9.0.x, right? We moved to a new version of ESP-IDF for 9.1.0, and I want to make sure that version still has the issue.

Timeline8 commented 3 weeks ago

I am fairly sure I did but re-reading my tedious notes above, I see I don't have an entry stating that. However I did note I ran the beta in the forum discussion. But I will double check tonight as the Waveshare should still be running the 9.1.0-beta.3. On the Adafruit S3 TFT I only ran that one on 9.0.4 as I thought it might be the version on the two S2s I have running this code for months without issue but checking one of the S2s, it is at 9.0.3. I can upgrade the S3 TFT tonight to 9.1.0-beta.3 and retest it as well. And I got a notice yesterday that my backorder from Digikey for the QtPy S3 I ordered has shipped, so when I get it I will load 9.1.0-beta.3 on that one as well and see how it acts (my S3 REV TFT is sadly still on backorder). I will report everything I find.

Do you think there is any value loading 9.0.3 onto any of the S3 boards since the S2 boards run fine with it? Or do you think this is an S3 specific issue and we should only be looking upward and onward with current versions only?

dhalbert commented 3 weeks ago

Do you think there is any value loading 9.0.3 onto any of the S3 boards since the S2 boards run fine with it? Or do you think this is an S3 specific issue and we should only be looking upward and onward with current versions only?

Based on your testing, I think this is an S3-specific issue. S3 with 9.1.0-beta.3 is the only test still to do, I would say. If that's a fix, great, otherwise I will set up an S3 board and let it run for hours.

Timeline8 commented 3 weeks ago

2024-06-04

Waveshare - verified it was still on the beta that I last ran it. From boot_out.txt… Adafruit CircuitPython 9.1.0-beta.3 on 2024-05-22; Waveshare ESP32-S3-Zero with ESP32S3

Running with TFT to see code running, using 2x 1k SIP packages to create voltage divider at pins D7 & D8 to simulate the thermistors and verified with a meter that each pin had 1.64V at each pin while running..

Looped just over 100 times (~ 5 minutes) before the board disconnected from my iMac, MU and MacOS both reported disk ejection, Neopixel changed to steady white, and board did not reconnect (MU serial window closed and icon show no board connected). However the TFT shows the code continuing to run normally and as I type this is at loop 180.

S3 TFT - Ran “circup update --all” first to get the suggested link for the newest version of CP and also updated the libraries. Downloaded the beta and loaded it. From boot_out.txt… Adafruit CircuitPython 9.1.0-beta.3 on 2024-05-22; Adafruit Feather ESP32-S3 TFT with ESP32S3

Hard reset board (pulled USB cable). Reduced code from the Waveshare because the external TFT code not required. Same 2x 1k SIP resistors to create resistor divider voltage at A0 and A1. Verified 1.57V at each pin while running.

Ran for 45+ minutes and last I checked it was over 700 loops. I came back a little later and it had disconnected, but had then reconnected and restarted. The loop was up to 170+ on the TFT. Neopixel was acting normally per the code.

CTRL-C, ejected board, power cycled it, and restarted. So far this second run it is behaving itself and has made it over an hour and at loop 1385.

Timeline8 commented 3 weeks ago

And to add to the end of the last post, the S3 feather sometween about 11pm and 2am disconnected and reconnected 4 times and finally stopped in Safe Model with an "Internal watchdog timer expired".

dhalbert commented 3 weeks ago

On either board, do you have a settings.toml with CIRCUITPY_WIFI_SSID and CIRCUITPY_WIFI_PASSWORD? That will connect to the wifi network. In other words, wifi will be active even if not mentioned in the test program.

It certainly sounds like the boards are hard-crashing resetting spontaneously. I will try a very simple test that simply reads as fast as possible from the ADC.

Timeline8 commented 3 weeks ago

Yes, I still have the WiFi settings auto connecting like that. We did talk about disabling that over at the forums but I don't think I tried it. So that is now on my to-do list for tonight: Run both the Waveshare S3 and Feather TFT S3 without the .toml file.

If there is a conflict between the two (ADC and WiFi), that will be disappointing for me since I need both for my application, so I won't be able to simply not use WiFi. But the problem can't be fixed later if we don't narrow it down to a root cause, so I will try it sans WiFi.

Also just received my QtPy S3 today. So if the other two crash relatively quickly with the WiFi off, I can try the QtPy. While I wouldn't think the model board matters as they are all S3, I have noticed that the Waveshare is good about crashing sooner while the Feather likes to wait until later.

dhalbert commented 3 weeks ago

I was just testing on a QT Py ESP32-S3, with the very simple test program below. With CIRCUITPY_WIFI_SSID and CIRCUITPY_WIFI_PASSWORD commented out in settings.toml, it runs indefinitely, with millions of conversions. With them not commented out (so web set up in advance) , it crashes hard almost immediately. I am using board.A2, which uses ADC1 on the QT Py S3.

So now I know how to reproduce this. It is strange because ADC2 is supposed to be shared with WiFi, not ADC1. But maybe something is interrupting the conversion in some bad way.

No need for you to test further at this point. Thanks for persevering through this.

Test program with two 1kohm resistors forming a 3.3/2 voltage divider connected to pin A2:

import analogio
import board

a2 = analogio.AnalogIn(board.A2)

count = 0
while True:
    count += 1
    if count % 100000 == 0:
        print("count", count)
    v = a2.value
    if v < 32000 or v > 33000:
        print(count, v)

dhalbert commented 3 weeks ago

Testing with A0, which is an ADC2 pin, I also get crashes rather quickly, and sometimes get safe-mode "Internal watchdog timer expired".

Timeline8 commented 3 weeks ago

Hi Dan, Thank you very much for confirming this. Since I first reported this on the forums and after days of replies there and here, since no one seemed to be interesting in actually running the code to see if it could be reproduced, I was starting to wonder if it was me and everyone was just being polite by not telling me I'm the idiot. ;)

You and I actually had a conversation on the forums on the Wifi vs ADC about two months ago because I had read about the possible interference between the two and was concerned if I should be making a point to use ADC1 to avoid that. Regardless, it looks like your testing shows this is a different issue since you got the crash on both ADC1 & 2.

I guess for now I will have to proceed with my projects targeting S2 boards. The project I am slowly developing as I learn CP, and add features to as I go, depends on both ADC and WiFi for IO Feeds.

dhalbert commented 3 weeks ago

@Timeline8 Rest assured we were interested, but if we can delegate some testing to eliminate possibilities, then we try that. (We have all too many bugs to look at :slightly_smiling_face:). For instance, was displayio involved or not? And I was hoping it was really fixed in 9.1.0-beta.3, since we'd upgraded the underlying Espressif software (ESP-IDF) in that release. I also spent time looking for similar reports in the ESP-IDF repo, but could not find any.

I hope we can figure this out soon, because broken ADC's when wifi is in use is a pretty serious limitation. As your testing indicates, the S2 boards could be a substitute. If you are interested in more precise temperature readings, then you could use external I2C ADC breakout board. If you are measuring ambient air temperature (not liquids), then one of the I2C temperature breakouts (there are many) could be used. But the easiest would be to fix ESP32-S3, of course.

bill88t commented 3 weeks ago

I just ran into this with the cardputer. Spamming the adc with wifi connected produces instant resets, watchdog safemodes and hangups.

I just finished the battery driver, which uses IO10 of the ESP32-S3. When connected to a network, after about 30 seconds it does one of the following:

Resets (NO SAFEMODE, actual reset).
hangs up (neopixel went white, not a color I use, and stayed there, unresponsive).
Watchdog safemode.

The polling rate was 30 samples/s.

The code accounts for None reads.

bill88t commented 3 weeks ago

Attempted this patch, which reduces the points of failure:

--- a/ports/espressif/common-hal/analogio/AnalogIn.c
+++ b/ports/espressif/common-hal/analogio/AnalogIn.c
@@ -115,10 +115,10 @@ uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) {
     #endif

     uint32_t adc_reading = 0;
-    size_t sample_count = 0;
+    int sample_count = 0;
     // Multisampling
     esp_err_t ret = ESP_OK;
-    for (int i = 0; i < NO_OF_SAMPLES; i++) {
+    while (sample_count < NO_OF_SAMPLES) {
         int raw;
         ret = adc_oneshot_read(adc_handle, channel, &raw);
         if (ret != ESP_OK) {
@@ -127,9 +127,6 @@ uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) {
         adc_reading += raw;
         sample_count += 1;
     }
-    if (sample_count == 0) {
-        raise_esp_error(ret);
-    }
     adc_reading /= sample_count;

     // This corrects non-linear regions of the ADC range with a LUT, so it's a better reading than raw

It didn't work. I think the error is in esp-idf.

I will attempt to create a minimal esp-idf application implementing oneshot adc and wifi.

bill88t commented 3 weeks ago

Yep, fun stuff:

I (21616) EXAMPLE: ADC1 Channel[3] Raw Data: 562
I (21616) EXAMPLE: ADC1 Channel[3] Cali Voltage: 481 mV
I (21626) EXAMPLE: ADC2 Channel[0] Raw Data: 537
E (21626) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (21626) task_wdt:  - IDLE0 (CPU 0)
E (21626) task_wdt: Tasks currently running:
E (21626) task_wdt: CPU 0: main
E (21626) task_wdt: CPU 1: IDLE1
E (21626) task_wdt: Print CPU 0 (current core) backtrace

Backtrace: 0x4200B48F:0x3FC93680 0x4200B8AC:0x3FC936A0 0x40377331:0x3FC936D0 0x42007582:0x3FC992B0 0x42008447:0x3FC992E0 0x42006FB1:0x3FC99300 0x42006B22:0x3FC99320 0x4200F58A:0x3FC99340 0x4200EDF5:0x3FC99360 0x4200EE52:0x3FC99380 0x4200F491:0x3FC993B0 0x420134C3:0x3FC993E0 0x42012E86:0x3FC99400 0x4200F601:0x3FC99720 0x4201B431:0x3FC99750 0x403804C9:0x3FC99780 0x42008C26:0x3FC997D0 0x4201AC97:0x3FC99840 0x4037AD1D:0x3FC99870

0;32mI (21626) EXAMPLE: ADC2 Channel[0] Cali Voltage: 467 mV
I (21706) EXAMPLE: ADC1 Channel[2] Raw Data: 732
I (21716) EXAMPLE: ADC1 Channel[2] Cali Voltage: 620 mV
I (21716) EXAMPLE: ADC1 Channel[3] Raw Data: 543

Stock idf 5.2.1, untouched oneshot adc example.. I didn't even setup wifi in it.. The nvs partition may have connected it automatically, idk.

jepler commented 3 weeks ago

bill88t commented 3 weeks ago

Using a YD-ESP32-S3 which is a dual usb-C board for this, which is excellent for debugging and built some debug builds.

import wifi, board, analogio;wifi.radio.connect("SSID", "PASSWD");a=analogio.AnalogIn(board.GPIO10)
while True:
    a.value

This reliably crashes it. Maximum 15s.

Decoded backtrace of debug build (clean tree, current master):

0x4037cc7a: ram_chip_i2c_readReg at ??:?
0x40378aa0: regi2c_ctrl_write_reg_mask at /home/bill88t/git/circuitpython/ports/espressif/esp-idf/components/esp_hw_support/regi2c_ctrl.c:46
0x420a6761: adc_ll_calibration_init at /home/bill88t/git/circuitpython/ports/espressif/esp-idf/components/hal/esp32s3/include/hal/adc_ll.h:790
 (inlined by) adc_hal_calibration_init at /home/bill88t/git/circuitpython/ports/espressif/esp-idf/components/hal/adc_hal_common.c:92
0x420a09db: adc_oneshot_read at /home/bill88t/git/circuitpython/ports/espressif/esp-idf/components/esp_adc/adc_oneshot.c:174
0x42042937: common_hal_analogio_analogin_get_value at /home/bill88t/git/circuitpython/ports/espressif/common-hal/analogio/AnalogIn.c:123
0x42039455: analogio_analogin_obj_get_value at /home/bill88t/git/circuitpython/ports/espressif/../../shared-bindings/analogio/AnalogIn.c:101
0x42014f0a: fun_builtin_1_call at /home/bill88t/git/circuitpython/ports/espressif/../../py/objfun.c:68
0x4200eaf5: mp_call_function_n_kw at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:725
0x4200ecdf: mp_convert_member_lookup at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:1183
0x4200ee19: mp_load_method_maybe at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:1253
0x4200ee2e: mp_load_method at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:1262
0x4200eeed: mp_load_attr at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:1071
0x4201f549: mp_execute_bytecode at /home/bill88t/git/circuitpython/ports/espressif/../../py/vm.c:437
0x420150e1: fun_bc_call at /home/bill88t/git/circuitpython/ports/espressif/../../py/objfun.c:273
0x4200eaf5: mp_call_function_n_kw at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:725
0x4200eb0a: mp_call_function_0 at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:699
0x4206e056: parse_compile_execute at /home/bill88t/git/circuitpython/ports/espressif/../../shared/runtime/pyexec.c:152
0x4206e45d: pyexec_friendly_repl at /home/bill88t/git/circuitpython/ports/espressif/../../shared/runtime/pyexec.c:748
0x4202491b: run_repl at /home/bill88t/git/circuitpython/ports/espressif/../../main.c:946
0x42024f67: main at /home/bill88t/git/circuitpython/ports/espressif/../../main.c:1084 (discriminator 1)
0x42026cca: app_main at /home/bill88t/git/circuitpython/ports/espressif/supervisor/port.c:503
0x4215a854: main_task at /home/bill88t/git/circuitpython/ports/espressif/esp-idf/components/freertos/app_startup.c:208

For this crash, the reason was: Guru Meditation Error: Core 1 panic'ed (Interrupt wdt timeout on CPU1).

possibly related? espressif/esp-idf#12466

I feel like it is. I think memory corruption takes place.

I only sometimes get a coredump. Sometimes usb just dies and debug serial, 5 seconds after usb has died, says that:

I (54623) wifi:bcn_timeout,ap_probe_send_start
W (54625) CP wifi: event 21 0x21
I (57131) wifi:ap_probe_send over, resett wifi status to disassoc
I (57132) wifi:state: run -> init (c800)
I (57133) wifi:pm stop, total sleep time: lu us / lu us

I (57136) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
W (57145) CP wifi: disconnected
W (57146) CP wifi: reason 200 0xc8
I (57149) CP wifi: Retrying connect. 4 retries remaining
W (59593) CP wifi: disconnected
W (59593) CP wifi: reason 201 0xc9

As if it didn't crash.

Timeline8 commented 3 weeks ago

@dhalbert, no problem on me doing some of the upfront testing. I know from my own experiences learning CircuitPython and also reading other people pleas for help, 99% of the time it is user error. So no harm on pushing back a bit on the user to kind of "prove it".

As for my project, it is for aquarium monitoring and eventually some controls later. Therefore I am sensing water temperature. Thermistors are well suited for this. Simple to use, and easy to waterproof if needed. I would love to see the build in ADC of the S3 back in track again, but the S2 is just as capable (don't need dual cores to read a temperature once every five minutes) so I still have a path forward.

bablokb commented 3 weeks ago

@Timeline8: have you thought about using DS18B20-sensors? They are available in a waterproof enclosure. And they are easy to use.

bill88t commented 3 weeks ago

I have good news and bad news.

Good news: I fixed the watchdog timer crash. Bad news: It's 2 bugs. It now 100% does the weird crash, where usb dies, but the core keeps running.

I don't have a backtrace to work with now. Also with my improvements adc make it 100x faster, crashing in less than a second.

I basically moved all the init and deinit stuff where it should be. The constant calibration data init was triggering the watchdog crash.

Patch

```diff diff --git a/ports/espressif/common-hal/analogio/AnalogIn.c b/ports/espressif/common-hal/analogio/AnalogIn.c index 9b1b14f07c..264de46a2a 100644 --- a/ports/espressif/common-hal/analogio/AnalogIn.c +++ b/ports/espressif/common-hal/analogio/AnalogIn.c @@ -53,74 +53,80 @@ void common_hal_analogio_analogin_construct(analogio_analogin_obj_t *self, // Turn off the pull-up as soon as we know the pin will be used for analog reads, // since it may take a while for the voltage to stabilize if the input is high-impedance. gpio_pullup_dis(pin->number); -} - -bool common_hal_analogio_analogin_deinited(analogio_analogin_obj_t *self) { - return self->pin == NULL; -} -void common_hal_analogio_analogin_deinit(analogio_analogin_obj_t *self) { - if (common_hal_analogio_analogin_deinited(self)) { - return; - } - reset_pin_number(self->pin->number); - self->pin = NULL; -} - -uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) { - adc_oneshot_unit_handle_t adc_handle; adc_oneshot_unit_init_cfg_t adc_config = { .unit_id = self->pin->adc_index, .ulp_mode = ADC_ULP_MODE_DISABLE }; - CHECK_ESP_RESULT(adc_oneshot_new_unit(&adc_config, &adc_handle)); + CHECK_ESP_RESULT(adc_oneshot_new_unit(&adc_config, &(self->adc_handle))); adc_oneshot_chan_cfg_t channel_config = { .atten = ATTENUATION, .bitwidth = DATA_WIDTH }; - adc_channel_t channel = (adc_channel_t)self->pin->adc_channel; - adc_oneshot_config_channel(adc_handle, channel, &channel_config); + self->channel = (adc_channel_t)self->pin->adc_channel; + adc_oneshot_config_channel(self->adc_handle, self->channel, &channel_config); adc_cali_scheme_ver_t supported_schemes; adc_cali_check_scheme(&supported_schemes); - adc_cali_scheme_ver_t calibration_scheme = 0; - adc_cali_handle_t calibration; + self->calibration_scheme = 0; #if defined(ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED - adc_cali_curve_fitting_config_t config = { - .unit_id = self->pin->adc_index, - .chan = channel, - .atten = ATTENUATION, - .bitwidth = DATA_WIDTH - }; - if (adc_cali_create_scheme_curve_fitting(&config, &calibration) == ESP_OK) { - calibration_scheme = ADC_CALI_SCHEME_VER_CURVE_FITTING; + self->config.unit_id = self->pin->adc_index; + self->config.chan = self->channel; + self->config.atten = ATTENUATION; + self->config.bitwidth = DATA_WIDTH; + if (adc_cali_create_scheme_curve_fitting(&(self->config), &(self->calibration)) == ESP_OK) { + self->calibration_scheme = ADC_CALI_SCHEME_VER_CURVE_FITTING; } #endif #if defined(ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED - if (calibration_scheme == 0) { - adc_cali_line_fitting_config_t config = { - .unit_id = self->pin->adc_index, - .atten = ATTENUATION, - .bitwidth = DATA_WIDTH, + if (self->calibration_scheme == 0) { + self->config.unit_id = self->pin->adc_index; + self->config.atten = ATTENUATION; + self->config.bitwidth = DATA_WIDTH; #ifdef CONFIG_IDF_TARGET_ESP32 - .default_vref = DEFAULT_VREF, + self->config.default_vref = DEFAULT_VREF; #endif }; - if (adc_cali_create_scheme_line_fitting(&config, &calibration) == ESP_OK) { - calibration_scheme = ADC_CALI_SCHEME_VER_LINE_FITTING; + if (adc_cali_create_scheme_line_fitting(&(self->config), &(self->calibration)) == ESP_OK) { + self->calibration_scheme = ADC_CALI_SCHEME_VER_LINE_FITTING; } } #endif +} + +bool common_hal_analogio_analogin_deinited(analogio_analogin_obj_t *self) { + return self->pin == NULL; +} +void common_hal_analogio_analogin_deinit(analogio_analogin_obj_t *self) { + if (common_hal_analogio_analogin_deinited(self)) { + return; + } + #if defined(ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED + if (self->calibration_scheme == ADC_CALI_SCHEME_VER_CURVE_FITTING) { + adc_cali_delete_scheme_curve_fitting(self->calibration); + } + #endif + #if defined(ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED + if (self->calibration_scheme == ADC_CALI_SCHEME_VER_LINE_FITTING) { + adc_cali_delete_scheme_line_fitting(self->calibration); + } + #endif + adc_oneshot_del_unit(self->adc_handle); + reset_pin_number(self->pin->number); + self->pin = NULL; +} + +uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) { uint32_t adc_reading = 0; size_t sample_count = 0; // Multisampling esp_err_t ret = ESP_OK; for (int i = 0; i < NO_OF_SAMPLES; i++) { int raw; - ret = adc_oneshot_read(adc_handle, channel, &raw); + ret = adc_oneshot_read(self->adc_handle, self->channel, &raw); if (ret != ESP_OK) { continue; } @@ -134,20 +140,8 @@ uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) { // This corrects non-linear regions of the ADC range with a LUT, so it's a better reading than raw int voltage; - adc_cali_raw_to_voltage(calibration, adc_reading, &voltage); - + adc_cali_raw_to_voltage(self->calibration, adc_reading, &voltage); - #if defined(ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED - if (calibration_scheme == ADC_CALI_SCHEME_VER_CURVE_FITTING) { - adc_cali_delete_scheme_curve_fitting(calibration); - } - #endif - #if defined(ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED - if (calibration_scheme == ADC_CALI_SCHEME_VER_LINE_FITTING) { - adc_cali_delete_scheme_line_fitting(calibration); - } - #endif - adc_oneshot_del_unit(adc_handle); return voltage * ((1 << 16) - 1) / 3300; } diff --git a/ports/espressif/common-hal/analogio/AnalogIn.h b/ports/espressif/common-hal/analogio/AnalogIn.h index 944039bec2..460543a1a8 100644 --- a/ports/espressif/common-hal/analogio/AnalogIn.h +++ b/ports/espressif/common-hal/analogio/AnalogIn.h @@ -12,8 +12,25 @@ #include "FreeRTOS.h" #include "freertos/semphr.h" #include "py/obj.h" +#include "adc_cali_schemes.h" +#include "esp_adc/adc_oneshot.h" +#include "esp_adc/adc_cali.h" +#include "driver/gpio.h" +#include "hal/adc_types.h" + typedef struct { mp_obj_base_t base; const mcu_pin_obj_t *pin; + adc_oneshot_unit_handle_t adc_handle; + adc_channel_t channel; + #if defined(ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED + adc_cali_curve_fitting_config_t config; + #endif + #if defined(ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED + adc_cali_line_fitting_config_t config; + #endif + adc_cali_scheme_ver_t calibration_scheme; + adc_cali_handle_t calibration; + } analogio_analogin_obj_t; ``` (`git apply` over master)

bill88t commented 3 weeks ago

Doing the adc sampling into the core also doesn't fix this. There is nothing that can be done from the CircuitPython side to workaround this.

This needs to be fixed from ESP-IDF.

Here is my experimental patch with an updated ADC api that internalizes the sampling. (So that python doesn't loop over adc reads)

Patch

```diff diff --git a/locale/circuitpython.pot b/locale/circuitpython.pot index 0e3878bc3d..407d3713a9 100644 --- a/locale/circuitpython.pot +++ b/locale/circuitpython.pot @@ -113,7 +113,7 @@ msgstr "" #: ports/raspberrypi/common-hal/rp2pio/StateMachine.c #: ports/raspberrypi/common-hal/usb_host/Port.c #: shared-bindings/digitalio/DigitalInOut.c -#: shared-bindings/microcontroller/Pin.c +#: shared-bindings/microcontroller/Pin.c shared-module/max3421e/Max3421E.c msgid "%q in use" msgstr "" @@ -1114,10 +1114,12 @@ msgstr "" msgid "Input/output error" msgstr "" +#: ports/espressif/common-hal/_bleio/__init__.c #: ports/nordic/common-hal/_bleio/__init__.c msgid "Insufficient authentication" msgstr "" +#: ports/espressif/common-hal/_bleio/__init__.c #: ports/nordic/common-hal/_bleio/__init__.c msgid "Insufficient encryption" msgstr "" @@ -1154,6 +1156,7 @@ msgstr "" #: ports/atmel-samd/common-hal/alarm/pin/PinAlarm.c #: ports/atmel-samd/common-hal/countio/Counter.c #: ports/atmel-samd/common-hal/frequencyio/FrequencyIn.c +#: ports/atmel-samd/common-hal/max3421e/Max3421E.c #: ports/atmel-samd/common-hal/ps2io/Ps2.c #: ports/atmel-samd/common-hal/pulseio/PulseIn.c #: ports/atmel-samd/common-hal/rotaryio/IncrementalEncoder.c @@ -2139,6 +2142,7 @@ msgstr "" msgid "Unknown BLE error: %d" msgstr "" +#: ports/espressif/common-hal/max3421e/Max3421E.c #: ports/raspberrypi/common-hal/wifi/__init__.c #, c-format msgid "Unknown error code %d" @@ -3810,7 +3814,7 @@ msgstr "" msgid "pop from empty %q" msgstr "" -#: shared-bindings/socketpool/Socket.c shared-bindings/ssl/SSLSocket.c +#: shared-bindings/socketpool/Socket.c msgid "port must be >= 0" msgstr "" @@ -4229,6 +4233,10 @@ msgstr "" msgid "usecols keyword must be specified" msgstr "" +#: shared-bindings/analogio/AnalogIn.c +msgid "value is out of bounds" +msgstr "" + #: py/objint.c #, c-format msgid "value must fit in %d byte(s)" diff --git a/ports/espressif/common-hal/analogio/AnalogIn.c b/ports/espressif/common-hal/analogio/AnalogIn.c index 9b1b14f07c..7ab808b527 100644 --- a/ports/espressif/common-hal/analogio/AnalogIn.c +++ b/ports/espressif/common-hal/analogio/AnalogIn.c @@ -22,7 +22,6 @@ #include #define DEFAULT_VREF 1100 -#define NO_OF_SAMPLES 2 #define ATTENUATION ADC_ATTEN_DB_11 #if defined(CONFIG_IDF_TARGET_ESP32) #define DATA_WIDTH ADC_BITWIDTH_12 @@ -43,7 +42,7 @@ #endif void common_hal_analogio_analogin_construct(analogio_analogin_obj_t *self, - const mcu_pin_obj_t *pin) { + const mcu_pin_obj_t *pin, uint8_t sample_size) { if (pin->adc_index == NO_ADC || pin->adc_channel == NO_ADC_CHANNEL) { raise_ValueError_invalid_pin(); } @@ -53,74 +52,82 @@ void common_hal_analogio_analogin_construct(analogio_analogin_obj_t *self, // Turn off the pull-up as soon as we know the pin will be used for analog reads, // since it may take a while for the voltage to stabilize if the input is high-impedance. gpio_pullup_dis(pin->number); -} - -bool common_hal_analogio_analogin_deinited(analogio_analogin_obj_t *self) { - return self->pin == NULL; -} - -void common_hal_analogio_analogin_deinit(analogio_analogin_obj_t *self) { - if (common_hal_analogio_analogin_deinited(self)) { - return; - } - reset_pin_number(self->pin->number); - self->pin = NULL; -} -uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) { - adc_oneshot_unit_handle_t adc_handle; + self->sample_size = sample_size; adc_oneshot_unit_init_cfg_t adc_config = { .unit_id = self->pin->adc_index, .ulp_mode = ADC_ULP_MODE_DISABLE }; - CHECK_ESP_RESULT(adc_oneshot_new_unit(&adc_config, &adc_handle)); + CHECK_ESP_RESULT(adc_oneshot_new_unit(&adc_config, &(self->adc_handle))); adc_oneshot_chan_cfg_t channel_config = { .atten = ATTENUATION, .bitwidth = DATA_WIDTH }; - adc_channel_t channel = (adc_channel_t)self->pin->adc_channel; - adc_oneshot_config_channel(adc_handle, channel, &channel_config); + self->channel = (adc_channel_t)self->pin->adc_channel; + adc_oneshot_config_channel(self->adc_handle, self->channel, &channel_config); adc_cali_scheme_ver_t supported_schemes; adc_cali_check_scheme(&supported_schemes); - adc_cali_scheme_ver_t calibration_scheme = 0; - adc_cali_handle_t calibration; + self->calibration_scheme = 0; #if defined(ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED - adc_cali_curve_fitting_config_t config = { - .unit_id = self->pin->adc_index, - .chan = channel, - .atten = ATTENUATION, - .bitwidth = DATA_WIDTH - }; - if (adc_cali_create_scheme_curve_fitting(&config, &calibration) == ESP_OK) { - calibration_scheme = ADC_CALI_SCHEME_VER_CURVE_FITTING; + self->config.unit_id = self->pin->adc_index; + self->config.chan = self->channel; + self->config.atten = ATTENUATION; + self->config.bitwidth = DATA_WIDTH; + if (adc_cali_create_scheme_curve_fitting(&(self->config), &(self->calibration)) == ESP_OK) { + self->calibration_scheme = ADC_CALI_SCHEME_VER_CURVE_FITTING; } #endif #if defined(ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED - if (calibration_scheme == 0) { - adc_cali_line_fitting_config_t config = { - .unit_id = self->pin->adc_index, - .atten = ATTENUATION, - .bitwidth = DATA_WIDTH, - #ifdef CONFIG_IDF_TARGET_ESP32 - .default_vref = DEFAULT_VREF, - #endif - }; - if (adc_cali_create_scheme_line_fitting(&config, &calibration) == ESP_OK) { - calibration_scheme = ADC_CALI_SCHEME_VER_LINE_FITTING; - } + if (self->calibration_scheme == 0) { + self->config.unit_id = self->pin->adc_index; + self->config.atten = ATTENUATION; + self->config.bitwidth = DATA_WIDTH; + #ifdef CONFIG_IDF_TARGET_ESP32 + self->config.default_vref = DEFAULT_VREF; + #endif + } + ; + if (adc_cali_create_scheme_line_fitting(&(self->config), &(self->calibration)) == ESP_OK) { + self->calibration_scheme = ADC_CALI_SCHEME_VER_LINE_FITTING; + } +} + #endif +} + +bool common_hal_analogio_analogin_deinited(analogio_analogin_obj_t *self) { + return self->pin == NULL; +} + +void common_hal_analogio_analogin_deinit(analogio_analogin_obj_t *self) { + if (common_hal_analogio_analogin_deinited(self)) { + return; + } + #if defined(ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED + if (self->calibration_scheme == ADC_CALI_SCHEME_VER_CURVE_FITTING) { + adc_cali_delete_scheme_curve_fitting(self->calibration); + } + #endif + #if defined(ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED + if (self->calibration_scheme == ADC_CALI_SCHEME_VER_LINE_FITTING) { + adc_cali_delete_scheme_line_fitting(self->calibration); } #endif + adc_oneshot_del_unit(self->adc_handle); + reset_pin_number(self->pin->number); + self->pin = NULL; +} +uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) { uint32_t adc_reading = 0; size_t sample_count = 0; // Multisampling esp_err_t ret = ESP_OK; - for (int i = 0; i < NO_OF_SAMPLES; i++) { + for (int i = 0; i < self->sample_size; i++) { int raw; - ret = adc_oneshot_read(adc_handle, channel, &raw); + ret = adc_oneshot_read(self->adc_handle, self->channel, &raw); if (ret != ESP_OK) { continue; } @@ -134,20 +141,8 @@ uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) { // This corrects non-linear regions of the ADC range with a LUT, so it's a better reading than raw int voltage; - adc_cali_raw_to_voltage(calibration, adc_reading, &voltage); - + adc_cali_raw_to_voltage(self->calibration, adc_reading, &voltage); - #if defined(ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED - if (calibration_scheme == ADC_CALI_SCHEME_VER_CURVE_FITTING) { - adc_cali_delete_scheme_curve_fitting(calibration); - } - #endif - #if defined(ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED - if (calibration_scheme == ADC_CALI_SCHEME_VER_LINE_FITTING) { - adc_cali_delete_scheme_line_fitting(calibration); - } - #endif - adc_oneshot_del_unit(adc_handle); return voltage * ((1 << 16) - 1) / 3300; } diff --git a/ports/espressif/common-hal/analogio/AnalogIn.h b/ports/espressif/common-hal/analogio/AnalogIn.h index 944039bec2..a2a8c453d8 100644 --- a/ports/espressif/common-hal/analogio/AnalogIn.h +++ b/ports/espressif/common-hal/analogio/AnalogIn.h @@ -12,8 +12,27 @@ #include "FreeRTOS.h" #include "freertos/semphr.h" #include "py/obj.h" +#include "adc_cali_schemes.h" +#include "esp_adc/adc_oneshot.h" +#include "esp_adc/adc_cali.h" +#include "driver/gpio.h" +#include "hal/adc_types.h" + typedef struct { mp_obj_base_t base; const mcu_pin_obj_t *pin; + uint8_t sample_size; + adc_oneshot_unit_handle_t adc_handle; + adc_channel_t channel; + #if defined(ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_CURVE_FITTING_SUPPORTED + adc_cali_curve_fitting_config_t config; + #endif + #if defined(ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED) && ADC_CALI_SCHEME_LINE_FITTING_SUPPORTED + adc_cali_line_fitting_config_t config; + #endif + adc_cali_scheme_ver_t calibration_scheme; + adc_cali_handle_t calibration; + uint8_t samples; + } analogio_analogin_obj_t; diff --git a/shared-bindings/analogio/AnalogIn.c b/shared-bindings/analogio/AnalogIn.c index f56ce600c2..e120f16dfd 100644 --- a/shared-bindings/analogio/AnalogIn.c +++ b/shared-bindings/analogio/AnalogIn.c @@ -46,19 +46,28 @@ MP_WEAK const mcu_pin_obj_t *common_hal_analogio_analogin_validate_pin(mp_obj_t //| when you read a value. You can retry the read. //| """ //| ... -static mp_obj_t analogio_analogin_make_new(const mp_obj_type_t *type, - mp_uint_t n_args, size_t n_kw, const mp_obj_t *args) { - // check number of arguments - mp_arg_check_num(n_args, n_kw, 1, 1, false); - - // 1st argument is the pin - const mcu_pin_obj_t *pin = common_hal_analogio_analogin_validate_pin(args[0]); +static mp_obj_t analogio_analogin_make_new(const mp_obj_type_t *type, size_t n_args, size_t n_kw, const mp_obj_t *args) { + enum { ARG_pin, ARG_samples }; + static const mp_arg_t allowed_args[] = { + { MP_QSTR_pin, MP_ARG_REQUIRED | MP_ARG_OBJ, }, + { MP_QSTR_samples, MP_ARG_KW_ONLY | MP_ARG_INT, {.u_int = 2} }, + }; + mp_arg_val_t parsed_args[MP_ARRAY_SIZE(allowed_args)]; + mp_arg_parse_all_kw_array(n_args, n_kw, args, MP_ARRAY_SIZE(allowed_args), allowed_args, parsed_args); + const mcu_pin_obj_t *pin = common_hal_analogio_analogin_validate_pin(parsed_args[ARG_pin].u_obj); + mp_int_t samples = parsed_args[ARG_samples].u_int; + if (samples <= 0 || samples >= 255) { + mp_raise_ValueError(MP_ERROR_TEXT("value is out of bounds")); + } + uint8_t sample_size = samples; analogio_analogin_obj_t *self = mp_obj_malloc(analogio_analogin_obj_t, &analogio_analogin_type); - common_hal_analogio_analogin_construct(self, pin); + common_hal_analogio_analogin_construct(self, pin, sample_size); return MP_OBJ_FROM_PTR(self); } + + //| def deinit(self) -> None: //| """Turn off the AnalogIn and release the pin for other use.""" //| ... diff --git a/shared-bindings/analogio/AnalogIn.h b/shared-bindings/analogio/AnalogIn.h index 303ef030bf..6f9a278213 100644 --- a/shared-bindings/analogio/AnalogIn.h +++ b/shared-bindings/analogio/AnalogIn.h @@ -12,7 +12,7 @@ extern const mp_obj_type_t analogio_analogin_type; const mcu_pin_obj_t *common_hal_analogio_analogin_validate_pin(mp_obj_t obj); -void common_hal_analogio_analogin_construct(analogio_analogin_obj_t *self, const mcu_pin_obj_t *pin); +void common_hal_analogio_analogin_construct(analogio_analogin_obj_t *self, const mcu_pin_obj_t *pin, uint8_t sample_size); void common_hal_analogio_analogin_deinit(analogio_analogin_obj_t *self); bool common_hal_analogio_analogin_deinited(analogio_analogin_obj_t *self); uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self); ```

dhalbert commented 3 weeks ago

@bill88t Can you submit the state saving as a PR? Did you find any other reports of an ESP-IDF bug, in addition to what @jepler found?

bill88t commented 3 weeks ago

9315

No I did not find any more ESP-IDF bugs matching this. Though, I didn't spend much time on the issue tracker.

dhalbert commented 3 weeks ago

EDIT: WRONG (see below):

~~New very interesting discovery:~~

The quick crash on ADC1 pins only happens when the board is a 4MB flash / 2MB PSRAM board. I tried an 8/0 QT Py ESP32-S3 and it does not crash. @Timeline8's boards are all 4/2, and I was testing on a QT Py ESP32-S3 4/2 when I reproduced the problem.

bill88t commented 3 weeks ago

Cardputer is 8/0 and crashes just fine? The YD-ESP32-S3 which is 16/8, also explodes to pieces.

Maybe some other factor is at play? Chip revision, temperature?

jepler commented 3 weeks ago

so the commonality is no PSRAM .. what if you make a no-psram build on one of the 4/2s, does the problem go away? if so, maybe there's some object that needs to be located in main RAM not PSRAM?

EDIT by @dhalbert: my testing was wrong

dhalbert commented 3 weeks ago

Cardputer is 8/0 and crashes just fine?

Are you testing an ADC1 or an ADC2 pin on the Cardputer?

I will do further testing on various boards tomorrow.

bill88t commented 3 weeks ago

Are you testing an ADC1 or an ADC2 pin on the Cardputer?

BAT_ADC is wired to IO10 which is ADC1_CH9.

Leaving this here for quick reference. Screenshot_20240607-112932_ONLYOFFICE Documents

bill88t commented 3 weeks ago

My current working theory is that there is some critical period of time where wifi and adc may try to write to memory concurrently somehow.

Adc can trigger this scenario from the calibration data init and from .value.

It may be that slower, /2 (esp32-s3r2) memories trigger this a lot more often.

dhalbert commented 3 weeks ago

My 4/2 vs 8/0 distinction was wrong. I had neglected to uncomment CIRCUITPY_WIFI_SSID and CIRCUITPY_WIFI_PASSWORD on the ESP32-S3 8/0 board. When I enabled initial wifi on the 8/0 board, it crashed in the same way. I edited the posts above to indicate that.

Timeline8 commented 3 weeks ago

@bablokb , Yes I am aware of the DS18B20 but I just happen to have a LOT of thermistors on hand that are waterproof, so that is one reason. No matter how cheap I can find DS18B20s, no one is going to beat FREE. 😉 While the DS18B20 are fairly simple to use once you get them setup, even they are no match for the dead-simple lowly thermistor. Plus I have seen enough reports of less than reputable supplies (counterfeits?) of DS18B20 being on the market that gives me pause.

But fundamentally, the ADC on the S3 should work. I hate having to work around a function that is promoted as one of the features of a product only to find out it doesn't work. I have to imagine there must be a lot of users out there with various ADC application running S3 MCUs getting nuisance failures with no idea why. I hope it doesn't turn them off from ever using the ESP32 line again if they perceive them as unreliable.

dhalbert commented 3 weeks ago

Some more debugging:

Often USB disconnects, but the program is still running, if it doesn't hit a watchdog reset. I verified this by blinking the LED periodically and by writing a count to UART. Another interesting thing is that when it disconnects from USB, the UART output, both on the UART I'm writing to and the debug UART, become gibberish, as if the speed has changed.

bill88t commented 3 weeks ago

This hasn't happened to me. And I have run it 100 times by now.

Memory corruption most certainly. I think it starts executing arbitrary code. Ofc producing different random results every time.

USB-JTAG would be a blessing if it worked right now. I still don't have a jtag dongle for normal jtag, so yea.. 👍

dhalbert commented 3 weeks ago

I think it starts executing arbitrary code.

That's not what I'm seeing (using a Metro ESP32-S3)

import analogio
import busio
import board
import digitalio

led = digitalio.DigitalInOut(board.LED)
led.switch_to_output()

analog = analogio.AnalogIn(board.A5)

uart = busio.UART(tx=board.TX, baudrate=115200)

count = 0
while True:
    count += 1
    if count % 100000 == 0:
        print("count", count)
    if count % 100 == 0:
        led.value = not led.value
    uart.write(str(count).encode())
    uart.write(b' ')
    v = analog.value

The LED keeps blinking sometimes, other times it eventually restarts in safe mode.

bill88t commented 3 weeks ago

With my example:

while True:
    adc.value

It either halts or safemodes. I should note, I only tested from REPL.

If proc'ed by my os and the "halt" scenario occurs, the neopixel, instead of heartbeating in 2 different blues, it becomes 255-value white (not the REPL white). USB in this scenario is disconnected.

Also I have already determined the watchdog timeout is cause of the calibration init functions. Under the adc pr, it doesn't happen, since those only run once during init.

Timeline8 commented 3 weeks ago

If proc'ed by my os and the "halt" scenario occurs, the neopixel, instead of heartbeating in 2 different blues, it becomes 255-value white (not the REPL white). USB in this scenario is disconnected.

I would see this mode when the failure was not reporting an internal watchdog timer expired. When I had the TFT display hooked up or running my S3 TFT Feather, the board disconnects from USB, the Neopixel turns solid white, and no reconnection of the board to USB. However the TFT shows the code continuing to run and display my thermistor based temperatures. Note that in my code I was cycling through Neopixel colors 3 times between thermistor readings so it is interesting that the crash takes over the Neopixel leaving it white but otherwise the code is still running.

dhalbert commented 3 weeks ago

Fails on 8.2.7 as well.

bill88t commented 3 weeks ago

ESP32-S3 TFT Feather example:

from time import sleep
import board, analogio, digitalio
from neopixel_write import neopixel_write
import pwmio

state_a = bytearray([0, 8, 0])
state_b = bytearray([0, 0, 8])

adc = analogio.AnalogIn(board.D10)
pwm = pwmio.PWMOut(board.A4)
pwm.duty_cycle = 32767 # 50%
nx = digitalio.DigitalInOut(board.NEOPIXEL)
nxp = digitalio.DigitalInOut(board.NEOPIXEL_POWER)
nx.switch_to_output()
nxp.switch_to_output()
nxp.value = 1

print("Ripping!")

while True:
    neopixel_write(nx, state_a)
    for _ in range(30):
        adc.value
    neopixel_write(nx, state_b)

This example produces a 500Hz signal on board.A4. When the "halt" white-neopixel scenario occurs, the pwm signal frequency changes from 500Hz to ~192Hz indicating that the system clock breaks down. The duty cycle somehow manages to remain at 50%. This happens regardless of the power source, be it USB or Li-Po.

This at the very least confirms a few things:

A crash does not happen in the "halt" scenario.
The code keeps running at some capacity (We don't reach the "Code done running").
Safe mode doesn't and won't kick in.

From reading: https://docs.espressif.com/projects/esp-idf/en/v5.2.2/esp32/api-reference/peripherals/adc_oneshot.html and https://docs.espressif.com/projects/esp-idf/en/v5.2.2/esp32/api-reference/peripherals/clk_tree.html we can tell some clock source is used for adc, but nowhere I can find on the idf does it actually touch the clocks.

I read components/esp_adc/adc_continuous.c and components/esp_adc/adc_oneshot.c.

I think it may be best to switch to the continuous api. In the same style as it's now, initializing the ADC during get_value and getting a bunch of values.

dhalbert commented 3 weeks ago

A few more tests: CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ_160=y instead of 240 MHz: no help CONFIG_ADC_ONESHOT_CTRL_FUNC_IN_IRAM=y: no help CONFIG_FREERTOS_UNICORE=y, so that only one core is used: problem seems to go away. That would explain why ESP32-S2,which has only one core, doesn't have the problem.

bill88t commented 2 weeks ago

Then this also means that it's not some lock-not-being-updated issue, but that whatever function runs on the other core is not thread safe and causes the crash.

And that "whatever function" is most certainly a wifi function. (Since this happens when connected) Somewhere an rtc_spinlock is missing.

bill88t commented 2 weeks ago

I went through all the files containing the string rtc_spinlock and extended the range of the critical code. I also added it in a lot of places.

For the moment I assume all the spinlocks work as intended, across cores. All of the test were performed with clean builds.

Nothing fixed it. Not even switching to portENTER_CRITICAL_SAFE. No wifi code explicitly spinlocks rtc, but instead leaves components/esp_hw_support/adc_share_hw_ctrl.c to do the adc stuff.

I don't think I can find the actual fix.

I will test how viable is CONFIG_FREERTOS_UNICORE=y.

bill88t commented 2 weeks ago

CONFIG_FREERTOS_UNICORE has literally 0 downsides as far as I can tell.

Measured a 1.1% perf difference on a division benchmark, with the unicore build being the faster one. There was no other differences across the builds. (Other than that indeed, the crashes went away.)

So I think, at least temporarily, we should go with that, and open an upstream ESP-IDF issue to have it resolved properly. (I can go do it if you want me to.)

dhalbert commented 2 weeks ago

Here are some other apparently related issues. These are about running the ADC tight loops, it appears, that cause watchdog errors. The original test in this issue does 20ms waits between reads, but still has problems. https://github.com/espressif/esp-idf/issues/12466 (as mentioned by @jepler) https://github.com/espressif/esp-idf/issues/8753 https://github.com/espressif/arduino-esp32/issues/6549 (same author as 8753)

dhalbert commented 2 weeks ago

I think we may have good news! I tested the Metro ESP32-S3 build from PR #9325 (Espressif BLE), but upgrades ESP-IDF to 5.2.2. My test program no longer fails. I put back 9.0.5 and it fails again as usual. This is very encouraging!

There isn't something completely obvious in the ESP-IDF release notes that points out a fix for this specific problem, but I'll take it :slightly_smiling_face: .

bill88t commented 2 weeks ago

This fixes the temperature thing too. It's a complete fix as far as I can tell. Will leave my cardputer running watch -n 0.01 sensors all day. I merged it downstream to the adc_paranoia branch and it played nicely with massive sample sizes too (65500 samples per poll).

Timeline8 commented 2 weeks ago

Will this be fixed under the next beta release? 9.1.0-beta.4 or 9.1.1 or whatever the next release will be?

bill88t commented 2 weeks ago

You can download the "Absolute Newest" build from the downloads page for your board. That will have the fix, along with any future 9.1.x releases.

adafruit / circuitpython

ESP32-S3 ADC use causes crashes when WiFi in use #9291

CircuitPython version

Code/REPL

Behavior

Description

Additional information

9315