adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.09k stars 1.21k forks source link

REPL Access is Lost After Awhile #3588

Closed robertgallup closed 3 years ago

robertgallup commented 3 years ago

On the Circuit Playground Express, after running a CircuitPython program, access to the REPL is lost. Ctrl+C stops execution and the serial connection remains, but there's no way to regain the REPL prompt. Sometimes restarting the serial console (on the PC side) unlocks the REPL, or resetting the CPX board.

dhalbert commented 3 years ago

Could you give more details?

What version of CircuitPython are you using? Do you notice this in general, or with a particular program? If the latter, could you supply the program? What kind of host computer are you using, and how are you connecting via serial (mu, screen, Tera Term, etc.)?

FoamyGuy commented 3 years ago

I have spoken with this user on Discord a bit as well. I can fill in some of that information.

They are using a CPX with 6.0.0-rc.0 I am not sure about the code they are using.

I did reproduce this issue once this morning though, my device was running simpletest script for a color sensor (that was not connected, was testing for someone else).

I haven't been able to nail down the exact steps to replicate it consistently though. I've switched to no code.py and a simple time.monotonic() printing loop and checked back in a few times but so far I have been able to access the REPL successfully all except for the one time this morning.

In my case: Host PC was Win7, serial connection was via Mu.

I will try to fish up the color sensor example I had running, though I don't think there is anything "special" about it, to the best of my knowledge it would have been sitting on whatever exception is thrown when the sensor isn't found

Edit: this was the script running (or more likely crashed) on my device at the time I saw this occur: https://github.com/adafruit/Adafruit_CircuitPython_AS7341/blob/main/examples/as7341_simpletest.py

robertgallup commented 3 years ago

The problem exists on CircuitPython v6.0.0-rc0. Did not exist on v5.3.1. It happens with this very simple program:

While True:
    pass

It's occurred using Screen in iTerm on MacOS 10.15.7. Also occurred using the Atom language-circuitpython package (that's where I first saw it).

I'm testing on an itsybitsy M4 right now to see if it's the same.

FoamyGuy commented 3 years ago

I had this happen again with this code running on the CPX:

import time
while True:
    #print(time.monotonic())
    time.sleep(1)

Interestingly, it ran for a long time with the print statement un-commented before this and never got into this state during that time. I'll try to see if I can narrow down whether the printing is really making a difference or if it was just coincidence.

robertgallup commented 3 years ago

So, the problem arises for me on the CPX between 1h 00m and 2h 37m. Will try 1h 45m next.

The missing REPL occurs with the original program:

while True:
    pass

And, I just confirmed the behavior is the same at 2h 37m with this code:

import time
while True:
    print ("Still here!")
    time.sleep(10)

Last part of the output from the later when I pressed CTRL+C was:

Still here!
Still here!
Still here!
Still here!
Still here!
Traceback (most recent call last):
  File "main.py", line 4, in <module>
KeyboardInterrupt:
robertgallup commented 3 years ago

After 3 hours running, my ItsyBitsy M4 has the same problem, the REPL is unresponsive.

Code is:

import time
while True:
    print ("Still here!")
    time.sleep(10)

After CTRL+C final output is:

Still here!
Still here!
Traceback (most recent call last):
  File "main.py", line 4, in <module>
KeyboardInterrupt:

Also, on the ItsyBitsy, after CTRL+C, the dotstar blink pattern is: long-green, long-red, 4 x short-blue.

When I eject the ItsyBitsy, the dotstar returns to solid green.

dhalbert commented 3 years ago

Looking over recent PR's, I am wondering if this has anything to do with #3546, maybe in the overflow handling. Maybe we could use a bisect to check.

robertgallup commented 3 years ago

It looks like #3546 was merged 9 days ago? As a quick guess, maybe try the version just before the #3546 merge and see if it's 'good' or 'bad' and go from there? (are past builds kept anywhere? I'm happy to help pin this down, but I'm not set up to build CircuitPython)

FoamyGuy commented 3 years ago

@robertgallup all of the builds are archived on S3. Here is the link for CPX English US builds: https://adafruit-circuit-python.s3.amazonaws.com/index.html?prefix=bin/circuitplayground_express/en_US/

If you go to circuitpython.org/downloads and find your board then look for the "Browse S3" button it will lead you to the page for that device.

dhalbert commented 3 years ago

https://adafruit-circuit-python.s3.amazonaws.com/bin/circuitplayground_express/en_US/adafruit-circuitpython-circuitplayground_express-en_US-20201013-3d21eec.uf2 is the merge commit for #3546. The one just before that is https://adafruit-circuit-python.s3.amazonaws.com/bin/circuitplayground_express/en_US/adafruit-circuitpython-circuitplayground_express-en_US-20201013-a010dc3.uf2.

If you are willing to do a little testing on these, that would be great! Thanks.

dhalbert commented 3 years ago

The way we'd normally test this is to do a "bisect". We'd pick a known bad and a known good build. Then we pick a build halfway in between, and test that. If it's bad, we then pick another build halfway between the second bad one and the good one, and repeat. Each time we conquer and divide the possibilities in half. Git has a way of doing this automatically. Unfortunately since it takes several hours to determine good/bad, this might get pretty tedious. So I made a stab at a possible place to start.

robertgallup commented 3 years ago

Excellent, thanks! I was just going looking for that build. I'll see what I can find out. I'm using my own 'bisect' to find out the minimum test time (currently between 1h and 2h 30m :-) )

kevinjwalters commented 3 years ago

Is this a duplicate of #2686 perhaps with a change that increases the frequency it occurs at?

robertgallup commented 3 years ago

Perhaps. Was there ever a resolution to that issue? FWIW, my issue doesn't seem to be impacted by amount of serial output.

tannewt commented 3 years ago

@hathach You'll want to read through this too.

kevinjwalters commented 3 years ago

I've just had a CLUE running 6.0.0-rc.0 stop sending output to USB serial console but I can still type commands and see them on the CLUE's screen and they execute. This is different to #2686. print output also does not go to the USB serial console. I can disconnect and reconnect to this but it does not fix it, it's still in same state where i can execute stuff blind on USB serial console or less blind if I look at the screen!

This is on windows 8.1, btw

robertgallup commented 3 years ago

@dhalbert It looks like your hunch might be correct!

TL;DR My testing suggests the #3546 commit did introduce the problem. The previous commit passes and the #3546 commit fails.

The Full Version:

I spent yesterday confirming (unintentionally) that this issue doesn't exist in v5.3.1. :-)

This morning I made additional progress and found that with the following program, the issue is reproducible after 50m (maybe less):

import time
print ("Here...")
while True:
    t = time.monotonic()
    hours = int(t) // 3600
    t -= hours * 3600
    minutes = int(t) // 60
    t -= minutes * 60
    seconds = int(t)
    print(f"Still here: {hours}h {minutes}m {seconds}s")
    time.sleep(10)

The results are below. I first confirmed that the test failed on v6.0.0-rc.0. Then, I confirmed that the commit prior to #3546 passed. Then confirmed the #3546 commit failed. And, finally, I confirmed that the prior commit succeeded again. Not an iron-clad case, but it is suspicious.

################################################################################
# FAIL - Adafruit CircuitPython 6.0.0-rc.0 on 2020-10-16
################################################################################

Adafruit CircuitPython 6.0.0-rc.0 on 2020-10-16; Adafruit CircuitPlayground Express with samd21g18

Still here: 0h 48m 22s
Still here: 0h 48m 32s
Still here: 0h 48m 42s
Still here: 0h 48m 52s
Still here: 0h 49m 2s
Traceback (most recent call last):
  File "main.py", line 11, in <module>
KeyboardInterrupt:

Code done running. Waiting for reload.

<unresponsive to CTRL-C or CTRL+D>

################################################################################
# PASS - Adafruit CircuitPython 6.0.0-beta.2-100-ga010dc35f on 2020-10-13
################################################################################

Still here: 0h 50m 32s
Still here: 0h 50m 42s
Still here: 0h 50m 52s
Still here: 0h 51m 2s
Still here: 0h 51m 12s
Still here: 0h 51m 22s
Traceback (most recent call last):
  File "main.py", line 11, in <module>
KeyboardInterrupt:

Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 6.0.0-beta.2-100-ga010dc35f on 2020-10-13; Adafruit CircuitPlayground Express with samd21g18
>>>

################################################################################
# FAIL - Adafruit CircuitPython 6.0.0-beta.2-103-g3d21eec3c on 2020-10-13
################################################################################

Adafruit CircuitPython 6.0.0-beta.2-103-g3d21eec3c on 2020-10-13; Adafruit CircuitPlayground Express with samd21g18
>>>
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
main.py output:
Here...
Still here: 0h 0m 19s

...

Still here: 0h 49m 1s
Still here: 0h 49m 11s
Still here: 0h 49m 21s
Still here: 0h 49m 31s
Still here: 0h 49m 41s
Still here: 0h 49m 51s
Still here: 0h 50m 1s
Still here: 0h 50m 11s
Still here: 0h 50m 21s
Still here: 0h 50m 31s

<unresponsive to CTRL+C, CTRL+D >

################################################################################
# PASS - Adafruit CircuitPython 6.0.0-beta.2-100-ga010dc35f on 2020-10-13
################################################################################

Still here: 0h 49m 54s
Still here: 0h 50m 4s
Still here: 0h 50m 14s
Still here: 0h 50m 24s
Still here: 0h 50m 34s
Traceback (most recent call last):
  File "main.py", line 11, in <module>
KeyboardInterrupt:

Code done running. Waiting for reload.
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.

Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 6.0.0-beta.2-100-ga010dc35f on 2020-10-13; Adafruit CircuitPlayground Express with samd21g18
>>>
tannewt commented 3 years ago

@robertgallup Any chance you have a debugger you can connect to an M0? That will tell us where it is when it is hung.

robertgallup commented 3 years ago

I'm not experienced with debuggers, but I have one for Cypress (miniprog3) that looks like it supports jtag. Do you happen to know if this would work for this with generic software? Otherwise, I think I can get ahold of a j-link (mini edu?) by Monday-ish. And, I'm always happy to learn something new.

dhalbert commented 3 years ago

It's not hanging, though, right? It's just that REPL output is no longer being sent back on USB (but is showing up on the display), is that right?

robertgallup commented 3 years ago

It's not a hang, per se. The REPL is not responsive after I stop the program with CTRL+C. But, if I didn't interrupt, it appears the program would continue running and writing to serial. It might be interesting to see with the debugger what was going on after I did the first keyboard interrupt.

robertgallup commented 3 years ago

@dhalbert from your comment, "...but is showing up on the display", is there a way for me to see the REPL output on a display (OLED, etc.) rather than over serial?

dhalbert commented 3 years ago

I mistakenly thought you were running this on a PyPortal or CLUE or similar board with an integral display (which shows the REPL).

cwalther commented 3 years ago

Regarding debuggers, if you happen to have a Raspberry Pi, I’m currently using one (3B+) to debug CircuitPython on a SAMD51, no extra hardware needed. Can expand if you are interested.

Regarding displays, if you have an external one you can connect, you should be able to get terminalio on the ItsyBitsy M4. Maybe even on the CPX, it looks like there’s a circuitplayground_express_displayio firmware.

jerryneedell commented 3 years ago

@cwalther please expand! Just a link to instructions would be great.

cwalther commented 3 years ago

I was gathering information from a bunch of places, as far as I remember these were the most helpful ones: https://learn.adafruit.com/programming-microcontrollers-using-openocd-on-raspberry-pi https://learn.adafruit.com/debugging-the-samd21-with-gdb https://iosoft.blog/2019/01/28/raspberry-pi-openocd/

robertgallup commented 3 years ago

@cwalther Interesting, thanks. I'll have to check it out. Also, @dhalbert, your misunderstanding was fortuitous. Thanks to your comment I remembered I have a TFT Gizmo I can connect to my CPX! And, it looks like with the displayio build it displays the REPL... :-)

robertgallup commented 3 years ago

A few final data points for now. I connected the TFT Gizmo to the CPX and found the following:

  1. Monitoring the CPX program from a serial console, the REPL becomes unresponsive consistent with the other cases.

  2. Not connected to a serial console and running the program while monitoring output on the TFT Gizmo, I can connect a serial monitor after 50m and break out to the REPL no problem.

  3. Without a program running (i.e. no main.py/code.py), there's no problem entering the REPL after 50m.

dhalbert commented 3 years ago

I ran this on Ubuntu 20.04 on a Metro M0 Express with 6.0.0-rc.0 for 6500 seconds, and was able to ctrl-C it at that point:

>>> while True:
...     print(int(time.monotonic()))
...     time.sleep(10)
...     

I could try again with a CPX, but I'm wondering if I should try this on another OS.

@robertgallup This failed for you on MacOS after 50 minutes. During that time did the Mac go to sleep, or were you keeping it awake?

kevinjwalters commented 3 years ago

@dhalbert Is there a possiblity the USB physical socket and/or other devices in use have an effect through their presence or from traffic? USB gives the appearance of a point to point topology with a desktop/laptop making a star with nothing shared but I'm wondering if there's more to it?

robertgallup commented 3 years ago

@dhalbert I've just run a couple of tests with all timeouts turned off and they both passed. I'm running one final test with my normal timeouts back on to see if I can get it to fail again.

robertgallup commented 3 years ago

@dhalbert With sleep back on, both the CPX and a Trinket M0 failed. This seemed to be the case even if I was sitting there working during the time which should have prevented any sleep.

On the other hand, I turned the screen saver and sleep off and left a Trinket M0 on all night (5-1/2 hours). This morning I could break out and access the REPL with no problem...

Now, this just in, I started the Trinket again without resetting it, restored sleep/screen saver and the Trinket was unresponsive after 30 minutes. It was happily sending serial output, but would not accept any input (i.e. CTRL+C wouldn't work). When I reset the device, all was working again.

tannewt commented 3 years ago

@robertgallup So this only happens when the code has a time.sleep() call in it?

robertgallup commented 3 years ago

@tannewt I haven't tested recently, but the original test case didn't include time.sleep():

while True:
    pass

EDIT: I've tested that this simple case eventually locks on my Trinket M0.

dhalbert commented 3 years ago

I am able to reproduce this problem, without waiting a long time, by putting the host computer into sleep mode. I reproduced this on both Windows and Linux.

Simple test program:

# Prints elapsed number of seconds, every 10 seconds.
import time
while True:
    print (int(time.monotonic()))
    time.sleep(10)

Example with 6.0.0-rc.0:

$ repl  # an alias I have to use picocom
82
92
102
[--here, I put the host computer to sleep, and then waited a bit to wake it up]
172
[--typed ctrl-c here]
Traceback (most recent call last):
  File "code.py", line 4, in <module>
KeyboardInterrupt: 

Code done running. Waiting for reload.
[--hangs at this point, does not take input]

By comparison, here's 5.3.1:

 repl
12
22
[--put host computer to sleep here, then waited a bit and woke it back up]
62
[--typed ctrl-c here]
Traceback (most recent call last):
  File "code.py", line 4, in <module>
KeyboardInterrupt: 

Code done running. Waiting for reload.
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.

Press any key to enter the REPL. Use CTRL-D to reload.
[--pressed a key]
>>> 

@tannewt suggested a possible lead on fixing this to me yesterday, and I'll investigate that.

robertgallup commented 3 years ago

@dhalbert I don't know if this is useful, but I just did a manual bisect on S3 builds using your protocol (with a 30 second sleep time). The build that seems to have broken is:

https://adafruit-circuit-python.s3.amazonaws.com/bin/circuitplayground_express/en_US/adafruit-circuitpython-circuitplayground_express-en_US-20200918-749cbe1.uf2

The build just before is:

https://adafruit-circuit-python.s3.amazonaws.com/bin/circuitplayground_express/en_US/adafruit-circuitpython-circuitplayground_express-en_US-20200917-9cf9441.uf2

dhalbert commented 3 years ago

I think I have a fix: see PR #3624. Build artifacts are available here https://github.com/adafruit/circuitpython/runs/1334933429?check_suite_focus=true if you want to test.

robertgallup commented 3 years ago

@dhalbert #3624 has been working great for me so far!

EDIT: I've had this build running on my CPX for over 22,400 seconds with many sleeps with no issues. :-)

dhalbert commented 3 years ago

Fixed by #3624

kevinjwalters commented 3 years ago

Is this a possible or likely fix for #2686 too?

dhalbert commented 3 years ago

Is this a possible or likely fix for #2686 too?

If the host computer went to sleep during the long interval mentioned in #2686, then yes, this should fix that too.

dhalbert commented 3 years ago

Is this a possible or likely fix for #2686 too?

If the host computer went to sleep during the long interval mentioned in #2686, then yes, this should fix that too.

I take that back. #2686 talks about loss of output. In the test above, output is not lost. When the sleeping computer wakes up, output continues to the terminal program. The issue was that CircuitPython didn't realize that the computer reconnected, so it did not go back to the REPL after a ctrl-C.