adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.09k stars 1.22k forks source link

USB disconnects with asyncio program that has exception in task #6706

Closed jepler closed 2 years ago

jepler commented 2 years ago

CircuitPython version

Adafruit CircuitPython 8.0.0-alpha.1-96-g3bff36685-dirty on 2022-08-06; Raspberry Pi Pico with rp2040

Code/REPL

# SPDX-FileCopyrightText: 2022 Dan Halbert for Adafruit Industries
#
# SPDX-License-Identifier: MIT

import asyncio
import board
import digitalio

async def blink(pin, interval, count):  # Don't forget the async!
    with digitalio.DigitalInOut(pin) as led:
        led.switch_to_output(value=False)
        for _ in range(count):
            led.value = True
            await asyncio.sleep(interval)  # Don't forget the await!
            led.value = False
            await asyncio.sleep(interval)  # Don't forget the await!

async def crash():
    1/0

async def main():  # Don't forget the async!
    led_task = asyncio.create_task(blink(board.LED, 0.25, 10))
    crash_task = asyncio.create_task(crash())
    await asyncio.gather(led_task, crash_task)  # Don't forget the await!
    print("done")

asyncio.run(main())

Behavior

A traceback is printed, the LED blinks 10 times beause the led_task keeps running, and then the device becomes unresponsive (eventual USB disconnect):

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
  File "/lib/asyncio/core.py", line 214, in run_until_complete
  File "code.py", line 19, in crash
ZeroDivisionError: division by zero

[tio 11:47:17] Disconnected

Note that before the 10 blinks have completed, it's possible to ctrl-c and get to a working repl.

Take care when using this code; you may need to know how to start your device in safe mode so that it doesn't just end up in a cycle of freezing.

Description

Situations like this frequently occur during the development of asyncio programs, so it'd be nice if the whole device didn't crash.

Additional information

No response

jepler commented 2 years ago

asyncio version is

__version__ = "0.5.12"

no difference if return_exceptions=True is specified.

jepler commented 2 years ago

Using the pure-Python implementation of Task and TaskQueue, the problem does not occur.

Neradoc commented 2 years ago

I was curious because I have been testing cancellling tasks (catching asyncio.CancelledError after a few trials and errors) and did not notice that issue. Then again it was with never-ending loops. Anyway I did some tests and...

This causes the issue:

async def crash():
    raise Exception("oops")

But this doesn't:

async def crash():
    raise BaseException("oops")

Instead it leads to:

Traceback (most recent call last):
  File "code.py", line 102, in <module>
  File "asyncio/core.py", line 235, in run
  File "asyncio/core.py", line 189, in run_until_complete
  File "code.py", line 93, in crash
BaseException: oops

Code done running.

Press any key to enter the REPL. Use CTRL-D to reload.
jepler commented 2 years ago

When the problem occurs, task_iternext is called while the task "is done" but data is mp_rom_none.

STATIC mp_obj_t task_iternext(mp_obj_t self_in) {
    mp_obj_task_t *self = MP_OBJ_TO_PTR(self_in);
    if (TASK_IS_DONE(self)) {
        // Task finished, raise return value to caller so it can continue.
        nlr_raise(self->data);

This value (a special constant 0x6, which is not a valid pointer) ends up being checked as follows in py/vm.c:

            if (mp_obj_is_subclass_fast(MP_OBJ_FROM_PTR(((mp_obj_base_t*)nlr.ret_val)->type), MP_OBJ_FROM_PTR(&mp_type_StopIteration))) {

however, it's not valid to cast 0x6 to an mp_obj_base_t.

Looking at differences in the Python level between using native task and not using it, I saw that when using native task I saw that the line in gather ts[i] = await ts[i] never returns in the native task case. Instead, the exception is printed on the terminal and .. I'm not sure what happens. Bad things.

I tried "mixing and matching" the native and python-coded Task and TaskQueue, but this combination leads to a different failure. While all crashes in the core are in some sense interesting, this is probably not worth pursuing further:

Assertion 'node->child == NULL && node->next == NULL' failed, at file pairheap.h:88

when using native TaskQueue but Python Task.

jepler commented 2 years ago

It looks like we need some form of 90aaf2dbef657e5afb8855a42d26093c3ef2a38d in our asyncio library.

jepler commented 2 years ago

With other changes, this test segfaults on Unix, which may give good light to test by:

$ MICROPYPATH=frozen/Adafruit_CircuitPython_asyncio/:frozen/Adafruit_CircuitPython_Ticks/ ./ports/unix/micropython-coverage asyncio_crash1.py 
Traceback (most recent call last):
  File "frozen/Adafruit_CircuitPython_asyncio//asyncio/core.py", line 241, in run_until_complete
  File "asyncio_crash1.py", line 13, in crash
ZeroDivisionError: division by zero
Segmentation fault
# SPDX-FileCopyrightText: 2022 Dan Halbert for Adafruit Industries
#
# SPDX-License-Identifier: MIT

import asyncio

async def blink(pin, interval, count):  # Don't forget the async!
    for _ in range(count):
        await asyncio.sleep(interval)  # Don't forget the await!
        await asyncio.sleep(interval)  # Don't forget the await!

async def crash():
    1/0

async def main():  # Don't forget the async!
    led_task = asyncio.create_task(blink(None, 0.25, 2))
    crash_task = asyncio.create_task(crash())
    await asyncio.gather(led_task, crash_task)  # Don't forget the await!
    print("done")

asyncio.run(main())
jepler commented 2 years ago

Also relevant is the core change: https://github.com/micropython/micropython/pull/8929

This would have prevented the problem from being a hardfault / segmentation fault / hang.