adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.11k stars 1.22k forks source link

CircuitPython HARD FAULTs crashes hard after several hours from async/await loops #7926

Open chilipeppr opened 1 year ago

chilipeppr commented 1 year ago

CircuitPython version

When running this simple code that creates several async/await tasks that run in parallel, CircuitPython will HARD_FAULT randomly roughly every several hours.

I have tested this on CircuitPython 8.0.5 and 8.1.0 and they both act similarly.

Code/REPL

# main_test_why_circuitpython_hardfaults.py
# This file will just run a simple asyncio infinite loop to see if that causes
# a hard fault after 24 hours.

import asyncio
import time
from adafruit_datetime import datetime as adatetime

class Dashboard:
    def __init__(self):

        print("Initting Dashboard")

        # Create several infinite loop tasks that loop each 0.01 to 0.5 seconds
        self.loop_task1 = asyncio.create_task(self.asyncTaskLoop("task1", 100, 0.1))
        self.loop_task2 = asyncio.create_task(self.asyncTaskLoop("task2", 50, 0.2))
        self.loop_task3 = asyncio.create_task(self.asyncTaskLoop("task3", 100, 0.1))
        self.loop_task4 = asyncio.create_task(self.asyncTaskLoop("task4", 20, 0.5))
        self.loop_task5 = asyncio.create_task(self.asyncTaskLoop("task5", 17, 0.6))
        self.loop_task6 = asyncio.create_task(self.asyncTaskLoop("task6", 100, 0.1))
        self.loop_task7 = asyncio.create_task(self.asyncTaskLoop("task7", 50, 0.2))
        self.loop_task8 = asyncio.create_task(self.asyncTaskLoop("task8", 100, 0.1))
        self.loop_task9 = asyncio.create_task(self.asyncTaskLoop("task9", 20, 0.5))
        self.loop_task10 = asyncio.create_task(self.asyncTaskLoop("task10", 17, 0.6))

    async def asyncTaskLoop(self, taskName, howOftenTellUsYouLooped, loopDelay):
        print("{}: Starting infinite loop".format(taskName))

        ctr = 0
        startTime = time.monotonic()

        while True:
            await asyncio.sleep(loopDelay) # don't forget the await

            ctr += 1
            if ctr > howOftenTellUsYouLooped:

                print("{} {}: Looped {} times waiting {}s on each loop. Should have taken {}s, but took {}s".format(
                    adatetime.now(), taskName, howOftenTellUsYouLooped, loopDelay, 
                    loopDelay*howOftenTellUsYouLooped, 
                    time.monotonic() - startTime))
                ctr = 0
                startTime = time.monotonic()

async def main():

    d = Dashboard()

    await asyncio.gather(
        d.loop_task1, d.loop_task2, d.loop_task3, d.loop_task4, d.loop_task5,
        d.loop_task6, d.loop_task7, d.loop_task8, d.loop_task9, d.loop_task10
    )  # Don't forget the await!

asyncio.run(main())

Behavior

You get a standard output roughly every 10 seconds as the script runs.

image

Initting Dashboard
task1: Starting infinite loop
task2: Starting infinite loop
task3: Starting infinite loop
task4: Starting infinite loop
task5: Starting infinite loop
task6: Starting infinite loop
task7: Starting infinite loop
task8: Starting infinite loop
task9: Starting infinite loop
task10: Starting infinite loop
2023-05-02 06:20:27 task1: Looped 100 times waiting 0.1s on each loop. Should have taken 10.0s, but took 10.0s
2023-05-02 06:20:27 task3: Looped 100 times waiting 0.1s on each loop. Should have taken 10.0s, but took 10.0234s
2023-05-02 06:20:27 task6: Looped 100 times waiting 0.1s on each loop. Should have taken 10.0s, but took 10.0234s
2023-05-02 06:20:27 task8: Looped 100 times waiting 0.1s on each loop. Should have taken 10.0s, but took 10.0313s
2023-05-02 06:20:28 task2: Looped 50 times waiting 0.2s on each loop. Should have taken 10.0s, but took 10.1563s
2023-05-02 06:20:28 task7: Looped 50 times waiting 0.2s on each loop. Should have taken 10.0s, but took 10.1563s
2023-05-02 06:20:28 task4: Looped 20 times waiting 0.5s on each loop. Should have taken 10.0s, but took 10.5234s
2023-05-02 06:20:28 task9: Looped 20 times waiting 0.5s on each loop. Should have taken 10.0s, but took 10.5234s
2023-05-02 06:20:28 task5: Looped 17 times waiting 0.6s on each loop. Should have taken 10.2s, but took 10.7813s
2023-05-02 06:20:28 task10: Looped 17 times waiting 0.6s on each loop. Should have taken 10.2s, but took 10.7969s

Eventually you'll get a hard crash at random intervals. Usually this is around 6 to 12 hours. It happens more often if you add more tasks.

From safemode.py. Kill switch off. 2023-05-01 21:45:06, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.HARD_FAULT
From main.py. Kill switch off. 2023-05-01 21:44:54, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.NONE
From safemode.py. Kill switch off. 2023-05-02 00:18:08, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.HARD_FAULT
From main.py. Kill switch off. 2023-05-02 00:18:03, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.NONE

To track these hard faults I have safemode.py reboot the device back into normal operation, but log the reboot. I also log all normal boots from main.py. Then I dish control over to the actual code posted at the top of this bug report.

main.py

# This file is initially loaded by the ESP-32 S2 CircuitPython
# We watch for a kill switch, i.e. a button on GPIO1, and if it's on we
# don't proceed with running the main program.

from killswitch.killswitch import KillSwitch
from reboot.r import RebootReason

def main():

    # Log the reboot reason
    r = RebootReason()

    ks = KillSwitch()

    if ks.isKillSwitchOn():
        print("Kill switch is on, so exiting code.")

        r.logToFileWithWifi("From main.py. Kill switch on.")

        # Create our display hardware object so we spit out our IP address to display
        import display.display
        d = display.display.Display()

        # Dump log file
        r.dumpLogFile()

    else:
        print("Kill switch is off, so proceeding with running massive main code...")

        r.logToFileWithWifi("From main.py. Kill switch off.")

        # Create our display hardware object so we spit out our IP address to display
        import display.display
        d = display.display.Display()

        # Dump log file
        r.dumpLogFile()

        # import main_kitchensink
        import main_test_why_circuitpython_hardfaults

main()

safemode.py

# This file is the file initially loaded by the ESP-32 S2 CircuitPython
# after booting into safemode. We're going to just log that we went into safemode
# and then do a full reboot back to normal mode. So, we're basically overriding this
# safety measure as we're in production and we know we have good code. So, I don't
# want to get the Marble Run into a state where it's not responding. I have seen
# reboots to safe mode every several hours and I'm not sure why, so just handle this by rebooting.

from killswitch.killswitch import KillSwitch
from reboot.r import RebootReason

def main():

    # Log the reboot reason
    r = RebootReason()

    ks = KillSwitch()

    if ks.isKillSwitchOn():
        print("Kill switch is on, so exiting code.")

        r.logToFile("From safemode.py. Kill switch on.")

        # Create our display hardware object so we spit out our IP address to display
        import display.display
        d = display.display.Display()

        # Dump log file
        r.dumpLogFile()

    else:
        print("Kill switch is not on, so proceeding with rebooting to normal mode...")

        r.logToFile("From safemode.py. Kill switch off.")

        # Create our display hardware object so we spit out our IP address to display
        import display.display
        d = display.display.Display()

        # Dump log file
        r.dumpLogFile()

        r.rebootToNormalMode()

main()

reboot/r.py

"""This file will look at what the boot reason was for and then log it in a file."""

import supervisor
import microcontroller
from adafruit_datetime import datetime as adatetime
import storage

class RebootReason:

    def __init__(self) -> None:

        print("Running Reboot Reason")

        self.fileNameLog = "rebootreason.txt"

    def checkReason(self):

        print("Supervisor.runtime.run_reason:", supervisor.runtime.run_reason)
        # print("Supervisor.runtime.autoreload:", supervisor.runtime.autoreload)
        print("Supervisor.runtime.usb_connected:", supervisor.runtime.usb_connected)
        print("Microcontroller.Processor.reset_reason:", microcontroller.Processor.reset_reason)
        print("Microcontroller.cpu.reset_reason:",microcontroller.cpu.reset_reason)
        print("supervisor.runtime.safe_mode_reason:", supervisor.runtime.safe_mode_reason)

    def logToFile(self, prefix):

        # Remount so CircuitPython can write to the drive
        storage.remount("/", readonly=False)

        f = open(self.fileNameLog, "a")
        f.write(
            # "{} {}, supervisor.runtime.run_reason:{}, microcontroller.cpu.reset_reason:{}, supervisor.runtime.safe_mode_reason:{}\n".format(
            "{} {}, {}, {}, {}\n".format(
            prefix,
            adatetime.now(), 
            supervisor.runtime.run_reason, 
            microcontroller.cpu.reset_reason,
            supervisor.runtime.safe_mode_reason
        ))

        f.flush()
        f.close()

    def logToFileWithWifi(self, prefix):

        # Reset the datetime on the ESP32 from NTP since we have a network connection
        import rtc
        import socketpool
        import wifi
        import adafruit_ntp
        pool = socketpool.SocketPool(wifi.radio)
        ntp = adafruit_ntp.NTP(pool, tz_offset=-6 )

        # NOTE: This changes the system time so make sure you aren't assuming that time
        # doesn't jump.
        rtc.RTC().datetime = ntp.datetime

        # Do normal logging with correct timestamp
        self.logToFile(prefix)

    def dumpLogFile(self):

        f = open(self.fileNameLog, "r")

        for line in f:
            print(line)

        f.close()

    def d(self):

        self.dumpLogFile()

    def rebootToUf2Mode(self):

        microcontroller.on_next_reset(microcontroller.RunMode.UF2)
        microcontroller.reset()

    def rebootToSafeMode(self):

        microcontroller.on_next_reset(microcontroller.RunMode.SAFE_MODE)
        microcontroller.reset()

    def rebootToNormalMode(self):

        microcontroller.on_next_reset(microcontroller.RunMode.NORMAL)
        microcontroller.reset()

    def rebootToBootLoaderMode(self):

        microcontroller.on_next_reset(microcontroller.RunMode.BOOTLOADER)
        microcontroller.reset()

And this is the super simple killswitch.py which you really don't need, but in case you read the code above you'd want to see this basic class.

# This class handles the kill switch on the Marble Run
# The user gets to toggle the kill switch and if it's off, the main code
# runs, but if it's on then the main.py code just exits so you can debug
#
# We are literally just doing one test here on boot. So this is very
# straightforward code.

import board
import digitalio

class KillSwitch:

    def __init__(self):

        print("Initting Kill Switch library...")
        self._pinKillSwitchPin = board.IO1

        self._pinKillSwitch = digitalio.DigitalInOut(self._pinKillSwitchPin)
        self._pinKillSwitch.switch_to_input(pull=digitalio.Pull.UP)

    def isKillSwitchOn(self):

        val = not self._pinKillSwitch.value
        print("KillSwitch:", val)
        return val    

Description

No response

Additional information

No response

anecdata commented 1 year ago

Wemos S2 Mini?

chilipeppr commented 1 year ago

Yes.

On Tue, May 2, 2023 at 10:20 AM anecdata @.***> wrote:

Wemos S2 Mini?

— Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/7926#issuecomment-1531666182, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4J23IG7DGU33NNYXHT5NTXEEQ5FANCNFSM6AAAAAAXTAC3WM . You are receiving this because you authored the thread.Message ID: @.***>

RetiredWizard commented 1 year ago

I increased the number of asyncio tasks to 30 and started your test program on a UM Feather S2 and an Adafruit Feather ESP32-S2 about 13 hours ago. They haven't crashed yet but I'll let them run over night. I'm assuming that adding the extra tasks didn't mess the test up. Unfortunately, I don't have a Wemos S2 Mini to test on, hopefully it's not hardware or port specific.

I did notice that the Lolin S2 boards are among the relatively few S2 boards that have CIRCUITPY_ESP_FLASH_FREQ set to 80m in mpconfigboard.mk. Your test script doesn't access flash so that's probably not relevant though.

chilipeppr commented 1 year ago

I was actually using a Lolin S2 Mini, not a Wemos S2 Mini, but aren't they basically the same thing?

I've been running this even further since I posted and I did get a HARD_FAULT again a couple hours ago. Here's my log below. I also seem to sometimes get the WATCHDOG as well as a reason for a reboot. So my period between HARD_FAULTs is about 22 hours.

I'm building a Marble Run for the local school's STEM lab to inspire the kids and this thing needs to run 24x7 so the kids can hit the button whenever they want all day to launch the marbles down a huge track. So was noticing the hard faults on an ongoing basis and initially thought it was my code, but it really seems like it's just the OS doing it.

From safemode.py. Kill switch off. 2023-05-01 21:45:06, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.HARD_FAULT
From main.py. Kill switch off. 2023-05-01 21:44:54, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.NONE
From safemode.py. Kill switch off. 2023-05-02 00:18:08, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.HARD_FAULT
From main.py. Kill switch off. 2023-05-02 00:18:03, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.NONE
From main.py. Kill switch off. 2023-05-02 06:07:34, supervisor.RunReason.AUTO_RELOAD, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.NONE
From main.py. Kill switch off. 2023-05-02 06:20:17, supervisor.RunReason.REPL_RELOAD, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.NONE
From safemode.py. Kill switch off. 2023-05-02 11:00:43, supervisor.RunReason.STARTUP, microcontroller.ResetReason.WATCHDOG, supervisor.SafeModeReason.WATCHDOG
From main.py. Kill switch off. 2023-05-02 10:59:46, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.NONE
From safemode.py. Kill switch off. 2023-05-02 18:22:22, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.HARD_FAULT
From main.py. Kill switch off. 2023-05-02 18:23:10, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.NONE
RetiredWizard commented 1 year ago

Yea, I think they are the same board. Hopefully I'll see a crash by morning, If not, maybe I'll try loading the Lolin firmware on one of them and see if I can reproduce the crash that way.

chilipeppr commented 1 year ago

Here's my bout_out.txt, but keep in mind I saw the same thing on 8.0.5.

Adafruit CircuitPython 8.1.0-beta.1 on 2023-03-30; S2Mini with ESP32S2-S2FN4R2
Board ID:lolin_s2_mini
UID:487F307D7D25
RetiredWizard commented 1 year ago

I had a bit of a glitch overnight last night which killed my terminal sessions, so from yesterday's test runs all I know is that neither board crashed after about 13 hours.

I started the tests up again this morning but this time, I loaded the Lolin S2 mini firmware up on one of the boards first. The test program has been running again for about 13 hours. I'll check them again in the morning and hopefully at least one of them will have hard faulted.

chilipeppr commented 1 year ago

Sounds good. Yeah, I left mine running today again too, but I actually had accidentally left it in the REPL, so missed out on my test today as well.

On Wed, May 3, 2023 at 9:51 PM RetiredWizard @.***> wrote:

I had a bit of a glitch overnight last night which killed my terminal sessions, so from yesterday's test runs all I know is that neither board crashed after about 13 hours.

I started the tests up again this morning but this time, I loaded the Lolin S2 mini firmware up on one of the boards first. The test program has been running again for about 13 hours. I'll check them again in the morning and hopefully at least one of them will have hard faulted.

— Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/7926#issuecomment-1534010944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4J23J3G4N7TAWEK7TWTF3XEMKUPANCNFSM6AAAAAAXTAC3WM . You are receiving this because you authored the thread.Message ID: @.***>

chilipeppr commented 1 year ago

Had a HARD_FAULT again last night.

The timestamp is odd. Safemode got hit at 3:33AM. Then it rebooted into normal mode which should be about 10 seconds later, but the clock says 3:27AM. I do the RTC lookup when booting in normal mode, so it just makes me think the clock runs fast on the ESP32. Is it possible a fast clock throws stuff off over time and that's what causes a hard fault?

From safemode.py. Kill switch off. 2023-05-04 03:33:58, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.HARD_FAULT
From main.py. Kill switch off. 2023-05-04 03:27:43, supervisor.RunReason.STARTUP, microcontroller.ResetReason.SOFTWARE, supervisor.SafeModeReason.NONE
RetiredWizard commented 1 year ago

No luck for me, 25 hours, 2 boards, 30 asyncio loops and no faults yet. I'll keep them running but I'm wondering if it's something with the specific board. Do you have just one of the Lolin boards?

I've gone ahead and ordered one which should be here in about 10 days but I also seem to remember that there was an issue with different manufacturers of these boards using different parts that behaved differently. Hopefully I've just been lucky and I'll be able to reproduce this on one of these s2 boards and then eventually do some debugging. 🍀

chilipeppr commented 1 year ago

I have tested this on 2 different Lolin S2 Mini's and same issue, but I could try this on other ESP32's now that you mention it could be specific to the version of the chip they used.

On Thu, May 4, 2023 at 10:16 AM RetiredWizard @.***> wrote:

No luck for me, 25 hours, 2 boards, 30 asyncio loops and no faults yet. I'll keep them running but I'm wondering if it's something with the specific board. Do you have just one of the Lolin boards?

I've gone ahead and ordered one which should be here in about 10 days but I also seem to remember that there was an issue with different manufacturers of these boards using different parts that behaved differently. Hopefully I've just been lucky and I'll be able to reproduce this on one of these s2 boards and then eventually do some debugging. 🍀

— Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/7926#issuecomment-1534962766, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4J23N4HXVBUOT4Z4Y6ME3XEPB37ANCNFSM6AAAAAAXTAC3WM . You are receiving this because you authored the thread.Message ID: @.***>

chilipeppr commented 1 year ago

Do you think it has anything to do with me activating the display?

On Thu, May 4, 2023 at 10:17 AM John Lauer @.***> wrote:

I have tested this on 2 different Lolin S2 Mini's and same issue, but I could try this on other ESP32's now that you mention it could be specific to the version of the chip they used.

On Thu, May 4, 2023 at 10:16 AM RetiredWizard @.***> wrote:

No luck for me, 25 hours, 2 boards, 30 asyncio loops and no faults yet. I'll keep them running but I'm wondering if it's something with the specific board. Do you have just one of the Lolin boards?

I've gone ahead and ordered one which should be here in about 10 days but I also seem to remember that there was an issue with different manufacturers of these boards using different parts that behaved differently. Hopefully I've just been lucky and I'll be able to reproduce this on one of these s2 boards and then eventually do some debugging. 🍀

— Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/7926#issuecomment-1534962766, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4J23N4HXVBUOT4Z4Y6ME3XEPB37ANCNFSM6AAAAAAXTAC3WM . You are receiving this because you authored the thread.Message ID: @.***>

RetiredWizard commented 1 year ago

Well I'm not doing anything with the display, I'd suggest you run the same test you have posted above. I don't think just having the display physically attached should be an issue if the software doesn't address it.

Another thought, how are you powering the board?

chilipeppr commented 1 year ago

I do have the display regurgitating the standard output, so code would be executing in those display classes. I just commented out that part of the code so I'm ONLY testing the async tasks. If this doesn't crash on me then it would have to be the display causing this. If it does crash, one other idea is that I am using the Web Workflow to see the serial output. Perhaps it's the Wifi classes causing it.

As for powering the board, I'm just using a normal USB-C wall wart for a Raspberry Pi that's powering the ESP32-S2, so plenty of amps coming out of that power supply as it's a 3.5A one I had lying around.

I'll let you know what I see over the next 24 hours.

On Thu, May 4, 2023 at 10:20 AM RetiredWizard @.***> wrote:

Well I'm not doing anything with the display, I'd suggest you run the same test you have posted above. I don't think just having the display physically attached should be an issue if the software doesn't address it.

Another thought, how are you powering the board?

— Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/7926#issuecomment-1534969937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4J23N3LC2ZIPOK5G37WR3XEPCKNANCNFSM6AAAAAAXTAC3WM . You are receiving this because you authored the thread.Message ID: @.***>

RetiredWizard commented 1 year ago

I've switched my monitoring to the web workflow serial terminal as well....

RetiredWizard commented 1 year ago

The UM Feather S2 has now been running 52 hours without crashing. Somewhere between 25 and 52 hours the Adafruit Feather S2 with the Lolin S2 Mini firmware crashed but I didn't capture any information as a terminal wasn't connected at the time. Before I build a debug image and try and capture a traceback, maybe I'll try a Lolin build with CIRCUITPY_ESP_FLASH_FREQ set to 40m.

chilipeppr commented 1 year ago

I had another crash last night. I did not run any display code. So, this was a clean run of just the asyncio.

On Fri, May 5, 2023 at 12:07 PM RetiredWizard @.***> wrote:

The UM Feather S3 has now been running 52 hours without crashing. Somewhere between 25 and 52 hours the Adafruit Feather S2 with the Lolin S2 Mini firmware crashed but I didn't capture any information as a terminal wasn't connected at the time. Before I build a debug image and try and capture a traceback, maybe I'll try a Lolin build with CIRCUITPY_ESP_FLASH_FREQ set to 40m.

— Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/7926#issuecomment-1536543173, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4J23LOQJWPZNL5AUE73WTXEUXULANCNFSM6AAAAAAXTAC3WM . You are receiving this because you authored the thread.Message ID: @.***>

RetiredWizard commented 1 year ago

The UM Feather S2 has now been running for 6 days without crashing. I restarted the Adafruit Feather S2 running the Lolin firmware 3 days ago and it hasn't crashed either.

The only time I've seen a crash is when the terminal has been disconnected so I decided to use your logging routine and see if I could recreate the crashes without the serial terminal session. It turns out your code won't run under 8.0.5 because the safemode.py feature isn't implemented in the 8.0.5 line yet.

I'll rebuild my tests using the 8.1.0 build and see if I can reproduce the issue there.

chilipeppr commented 1 year ago

Ok, that's promising to me actually. It means there is not something deep in the bowels of CircuitPython or ESP-IDF causing this. Maybe I should just change my board and then I won't have crashes anymore. Or perhaps I'll try to slow down that frequency on the Flash chip like you were commenting on a while ago.

On Mon, May 8, 2023 at 10:23 AM RetiredWizard @.***> wrote:

The UM Feather S2 has now been running for 6 days without crashing. I restarted the Adafruit Feather S2 running the Lolin firmware 3 days ago and it hasn't crashed either.

The only time I've seen a crash is when the terminal has been disconnected so I decided to use your logging routine and see if I could recreate the crashes without the serial terminal session. It turns out your code won't run under 8.0.5 because the safemode.py feature isn't implemented in the 8.0.5 line yet.

I'll rebuild my tests using the 8.1.0 build and see if I can reproduce the issue there.

— Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/7926#issuecomment-1538561096, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4J23PIBTKJSDPKS65QT4LXFEFVNANCNFSM6AAAAAAXTAC3WM . You are receiving this because you authored the thread.Message ID: @.***>

RetiredWizard commented 1 year ago

I've been running my two S2 boards for another couple days without any luck reproducing the issue.

The Lolin board I ordered came in, unfortunately, it turned out to be one of the likely counterfeit devices based on the silk screen and lack of PS RAM. I was able to build a custom CircuitPython binary which started the REPL, however I couldn't run the test script for more than 15 to 20 minutes and the REPL/Web Access interface would periodically hang for short periods of time.

I ordered the board through Walmart so it should be easy to return but I'm not sure it's worth re-ordering another board directly from China as it will take over a month to get here.

chilipeppr commented 1 year ago

Interesting. I wonder if mine are counterfeit as well. I got them on Amazon, so not really sure. Either way, I was able to just deal with this problem by auto-rebooting on crash, having it run safemode.py, logging the error, doing another reboot to get into main.py and then proceed from there. I still see a reboot roughly every 12 hours or so, but it hasn't been a problem for my Marble Run project for a STEM lab at a high school. The kids are having plenty of fun with the final working circuit board that drives the marble elevator. So, all is good for now with the workarounds!

On Fri, May 12, 2023 at 9:51 AM RetiredWizard @.***> wrote:

I've been running my two S2 boards for another couple days without any luck reproducing the issue.

The Lolin board I ordered came in, unfortunately, it turned out to be one of the likely counterfeit https://forums.adafruit.com/viewtopic.php?t=197737 devices based on the silk screen and lack of PS RAM. I was able to build a custom CircuitPython binary which started the REPL, however I couldn't run the test script for more than 15 to 20 minutes and the REPL/Web Access interface would periodically hang for short periods of time.

I ordered the board through Walmart so it should be easy to return but I'm not sure it's worth re-ordering another board directly from China as it will take over a month to get here.

— Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/7926#issuecomment-1545866898, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4J23PUF63K5423HAYM7XDXFZE5PANCNFSM6AAAAAAXTAC3WM . You are receiving this because you authored the thread.Message ID: @.***>

RetiredWizard commented 1 year ago

I'm glad you got something working :grin:. If you decide to test at the 40M flash speed let me know how it goes, but it looks to me like the issue is specific to the Lolin board so I don't think I can do much more at this point.

I doubt your boards are the counterfeit as the standard CircuitPython UF2s won't boot on a counterfeit board.