adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.02k stars 1.19k forks source link

ItsyBitsy M4 Express w/ CP5.3.0: bignum(?) code prevents USB from functioning (mount and REPL access) #2949

Closed derhexenmeister closed 4 years ago

derhexenmeister commented 4 years ago

I posted about this here CircuitPython program prevents drive from mounting and mentioned on discord, but am not sure that the right developers have seen it.

Warning- before anyone tries this please make sure that you have a backup of your CIRCUITPY drive, and you know how to run a flash eraser specific to your CircuitPython board. e.g. https://learn.adafruit.com/adafruit-feather-m4-express-atsamd51/troubleshooting

I am using a Mac running 10.14.6 connected to an ItsyBitsy M4 Express running "Adafruit CircuitPython 5.3.0 on 2020-04-29; Adafruit ItsyBitsy M4 Express with samd51g19"

I found that it I put this simple program into code.py, then my ItsyBitsy will not mount as a drive. To recover I had to enter bootloader mode, erase the drive, and then reload CircuitPython. Maybe there's a better way to recover, or there's something unique about my setup which causes this. I'm suspicious that the bignum code is not giving time to other code involved in handling the USB subsystem. (If I had a hacked version of CircuitPython which didn't load code.py, then that could be loaded to enable a mount and cleanup.)

print("The waiting game")
i = 0
while True:
    print("Looping {}".format(i))
    i = i + 1
    a = 2**(65536*2)
dhalbert commented 4 years ago

Try doing a slow double-click. There is a 700ms interval when the on-board DotStar is yellow. If you click during that time, CircuitPython will restart in safe mode and will not run code.py. 700ms is too short for you to see it and react in time, but try a slow double-click and you may hit it.

Of course, we need to figure out what's wrong here in any case.

derhexenmeister commented 4 years ago

That's a great tip - I had no idea that there was a safe mode, but do see it on Troubleshooting now that you mention it. It did allow me to mount CIRCUITPY and recover.

DavePutz commented 4 years ago

Just as a point of info; I tested the sample script on a PyPortal (also an SAMD51; I don't have a ItsyBitsy M4 Express) running CP 5.3.0 on a Windows 10 machine; and saw no issues with accessing CIRCUITPY while the loop was running.

derhexenmeister commented 4 years ago

@DavePutz - can you try that, but reboot the board so that the code runs during startup?

DavePutz commented 4 years ago

Yes, that is how I ran my test. I see the 'looping' messages on the screen, and CIRCUITPY is mounted and accessible.

derhexenmeister commented 4 years ago

@DavePutz - ok thanks, just wanted to make sure. On 5/20 "geekguy" tried this on an ItsyBitsy M4, and said "I just tried your code on my ItsyBitsy M4 and it froze the board up. I can not connect to the REPL." Sounds like good news that the PyPortal doesn't have this issue, and it also uses an M4.

dhalbert commented 4 years ago

I would check whether it has to do with the operating system, since @DavePutz 's test was on Windows, but @derhexenmeister found the problem on MacOS.

derhexenmeister commented 4 years ago

I just tried on a Dell Precision 5530 running Windows 10 Pro (10.0.18363) and it complained "USB device not recognized". CIRCUITPY did not mount, and the virtual com port (USB) was not available.

dhalbert commented 4 years ago

@derhexenmeister I think it's unlikely this has anything to do with it, but what is the version of the bootlloader? (See INFO_UF2.TXT after double-clicking.)

derhexenmeister commented 4 years ago

I believe that I have the latest version (there was a recent recommendation on the forum):

UF2 Bootloader v3.9.0 SFHWRO
Model: ItsyBitsy M4 Express
Board-ID: SAMD51G19A-Itsy-v0
dhalbert commented 4 years ago

I found an easier way to break USB (at least CDC), from the REPL:

Adafruit CircuitPython 5.4.0-beta.1-36-g25d5f2cfc on 2020-06-19; Adafruit Metro M4 Express with samd51j19
>>> 10**40000

FATAL: read zero bytes from port
term_exitfunc: reset failed for dev UNKNOWN: Input/output error
DavePutz commented 4 years ago

This is not a new issue, I tested as far back as CP 5.0 with similar results. The issue seems to be that when a compute-intensive instruction is being run (and the 'a = 2*(655362)' takes about 4 seconds) there are no breaks to run any background tasks. As a result, the USB handshakes and commands take a very long time to complete. The only calls to run_background() happen when we move to the next line in the WHILE loop. Maybe we are getting one command through at a time?? Should there be some way to have an interrupt in order to check background tasks more often? I believe that the Pyportal did not see this issue because it has displayio involved, which seems to affect the timing.

dhalbert commented 4 years ago

The longint operations are single virtual machine opcodes; the background task running does not happen during that time. That's the basic problem. We could sprinkle calls to run background tasks through the longint code, but we need to make sure that does not violate some assumptions the VM has about opcode atomicity.

DavePutz commented 4 years ago

Would it be possible (or reasonable) to use a timer at the start of such operations to provide interrupts where background tasks could be run?

On Sun, Jul 12, 2020 at 5:58 PM Dan Halbert notifications@github.com wrote:

The longint operations are single virtual machine opcodes; the background task running does not happen during that time. That's the basic problem. We could sprinkle calls to run background tasks through the longint code, but we need to make sure that does not violate some assumptions the VM has about opcode atomicity.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/2949#issuecomment-657287833, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNJKEVKRK4DPXGJ2H4DH4TR3I5ZXANCNFSM4NIRX2KA .

dhalbert commented 4 years ago

Background tasks need to run at a "safe' time, when there isn't a storage operation in progress, etc. It is probably possible to add calls to check for background tasks before the storage allocations in the longint code.

Though it's not good the longint code locks out background tasks, it would be good to know the use case for doing longint operations with huge operands. When you first found this, were you trying to compute something, or was it just to see how well the longints work?

kevinjwalters commented 4 years ago

BTW, not that relevant to this issue but on the reacting within 700ms front, do you mean 70ms? One of my learn guides featured testing this using a CPX. Just under 200ms is generally achievable if you're very focussed and finger is resting on the button. Sub 250ms isn't too difficult and that includes a 74 year old friend in a noisy cafe. Graphs here: https://learn.adafruit.com/circuit-playground-bluefruit-quick-draw-duo/reaction-timer-results

dhalbert commented 4 years ago

BTW, not that relevant to this issue but on the reacting within 700ms front, do you mean 70ms? One of my learn guides featured testing this using a CPX. Just under 200ms is generally achievable if you're very focussed and finger is resting on the button. Sub 250ms isn't too difficult and that includes a 74 year old friend in a noisy cafe. Graphs here: https://learn.adafruit.com/circuit-playground-bluefruit-quick-draw-duo/reaction-timer-results

I'm not sure what you mean about 70ms. There is a 700ms wait, during which the status RGB LED is yellow. If you click reset during this time, you will enter safe mode.

For me, the easiest way to do this is not to try to wait for the yellow, but simply to double-click slowly enough that the second click is in the 700ms window.

derhexenmeister commented 4 years ago

"When you first found this, were you trying to compute something, or was it just to see how well the longints work?"

I was trying to test a tool which is very similar to Ampy, by trying out various types of "user" code to stress the part of the tool which breaks back into the REPL. I wanted to see what timeouts etcetera made sense.