USB output only starts after main.py finishes (RP2040)

urish commented 2 years ago

CircuitPython version

- Adafruit CircuitPython 7.0.0-beta.0 on 2021-08-24; Raspberry Pi Pico with rp2040
- Adafruit CircuitPython 6.3.0 on 2021-06-01; Raspberry Pi Pico with rp2040

(tested on both versions)

Code/REPL

print("Hello, Pi Pico!")

Behavior

I'm working on adding CircuitPython support to my Raspberry Pi Pico simulator on Wokwi.com.

The USB Serial output seems to initialize only after main.py finishes running, so the user can't see any output (or enter any input, such as Ctrl+C) as long as main.py is running. Once main.py finishes running, the USB Serial console initializes and then the user is dropped into the REPL and can interact with it.

With MicroPython, when using USB console, it seems to be able to initialize the USB Serial while main.py in running. So the very first prints do not show up, but then after ~16 milliseconds prints starting to show.

Here's a bit more complex example demonstrating the issue. The code is running and blinking the LED, but the Serial prints do not show up, as the Serial USB does not initialize until after main() finishes:

https://wokwi.com/arduino/projects/307811252339475011

Ideally, I think the best user experience for the simulator is to start executing main.py only when the Serial console has fully initialized. This way, the user can also see the output in case of any syntax errors in main.py. Also, I'd love to be able to use the official Pi Pico builds out of the box, without having to maintain a separate build for running in the simulator.

I'd appreciate any relevant input or pointers how to look into this and provide CircuitPython users with the best possible user experience.

Description

No response

Additional information

No response

dhalbert commented 2 years ago

The behavior you describe does not happen on a real Pi Pico when running CircuitPython. USB is started when code.py starts.

Do you by chance have both a main.py and a code.py? Both are accepted: code.pyis preferred for CircuitPython. (The accepted files are code.txt, code.py, main.txt, main.py, searched for in that order.)

Could you describe how you emulate the Pi Pico? Can you trace the USB activity?

urish commented 2 years ago

Thanks for the speedy response and for trying to reproduce this on the real Pi Pico!

In this case, I only have main.py (also tried now with code, getting same behavior):

>>> import os
>>> os.listdir('/')
['main.py', 'diagram.json', 'boot_out.txt']

The Pi Pico is emulated using an open source library that I created for the purpose. I can trace USB activity, as well as connect with GDB to debug the CP code that is running inside the simulation.

What I'm currently seeing:

CP enables the USB peripheral before main.py starts executing
The simulator sends a USB reset
CP acks the reset (by setting the SIE_BUS_RESET bit in the SIE_STATUS register)
The simulator sends a Set Device Address setup packet
Then main.py (or code.py) starts running
main.py finishes running
Only now CP acknowledges the device address setup packet (by writing to the EP0_IN_BUFFER_CONTROL register), and the USB exchange continues (e.g. reading the descriptors, setting the device configuration, etc.)

Any ideas how to move forward from here?

dhalbert commented 2 years ago

So if main.py is an infinite loop, you never see the EP0_IN_BUFFER_CONTROL write?

Both MicroPython and CircuitPython use TinyUSB as the USB stack. CPy calls tud_task() to handle pending USB work in the background. We check for background tasks at safe points in the VM interpreter via RUN_BACKGROUND_TASKS, aka supervisor_run_background_tasks_if_tick(). We see whether at least one 1/1024sec tick has elapsed.

You would probably want to watch for the execution of various routines in supervisor/shared/usb.c and tick.c. See usb_background() and usb_irq_handler() in particular in the former and supervisor_run_background_tasks_if_tick() in the latter.

My weak guesses are that:

something about your interrupt handling is not properly propagating the USB interrupts, so they only get handled when things are idle, when main.py has completed.
something about your timekeeping is not doing the tick handling that allows the background tasks to run.

urish commented 2 years ago

So if main.py is an infinite loop, you never see the EP0_IN_BUFFER_CONTROL write?

Correct

You would probably want to watch for the execution of various routines in supervisor/shared/usb.c and tick.c. See usb_background() and usb_irq_handler() in particular

This was a very good pointer, thanks!

So here's what I'm observing:

dcd_init() is called, and registers dcd_rp2040_irq() as IRQ handler for USB.
USB peripheral is enabled, and then the reset sequence happens.
The Set Device Address setup packet is sent and picked up by dcd_rp2040_irq(), which does not seem to ack it.
At some point in the future, post_usb_init() is called and sets up the usb_irq_handler() as the IRQ handler for USB. The handler does not get called.
Eventually, usb_background() is called after main.py finishes executing. At that point, it picks up the USB initialization process:

Breakpoint 1, usb_background () at ../../supervisor/shared/usb/usb.c:105
105     void usb_background(void) {
(gdb) where
#0  usb_background () at ../../supervisor/shared/usb/usb.c:105
#1  0x1001c5e0 in stop_mp () at ../../main.c:191
#2  0x1001c600 in cleanup_after_vm (heap=heap@entry=0x20005820 <allocations+8>)       
    at ../../main.c:235
#3  0x1001c850 in run_code_py (safe_mode=safe_mode@entry=NO_SAFE_MODE)
    at ../../main.c:310
#4  0x1001c9f8 in main () at ../../main.c:608
(gdb) c

Furthermore, if I change post_usb_init() to manually call usb_irq_handler(); once, everything works as expected - USB Serial is initialized within ~64ms from the point main.py starts running. This is what the full function looks like with the change:

void post_usb_init(void) {
    irq_set_enabled(USBCTRL_IRQ, false);

    irq_handler_t usb_handler = irq_get_exclusive_handler(USBCTRL_IRQ);
    if (usb_handler) {
        irq_remove_handler(USBCTRL_IRQ, usb_handler);
    }
    irq_set_exclusive_handler(USBCTRL_IRQ, usb_irq_handler);

    irq_set_enabled(USBCTRL_IRQ, true);

    usb_irq_handler();
}

I guess this is not a good solution, but at least it helps to narrow down where the problem might be. Any suggestions how to move forward with this?

tannewt commented 2 years ago

This sounds like a TinyUSB issue. dcd_init() shouldn't setup the interrupt handler. That should be left to the client. @hathach thoughts?

CircuitPython intercepts the interrupt so that it can queue up the USB background task. By missing the interrupt, it misses the queued work.

urish commented 2 years ago

Also, assuming that we find a way to fix or work around the issue, there's still some time between when main.py starts running and USB Serial is up.

This means that the user is not going to see the first few prints, and more importantly, the output in case of a syntax error. I'm wondering if there is anything we can do to delay running main.py until after USB Serial is ready? Perhaps something in boot.py?

dhalbert commented 2 years ago

Yes, there is a race between code.py and USB coming up. USB is not initialized in the boot.py because you can make changes to USB there.

I do see this race when using tio, which reconnects immediately when it sees that USB is up. A single print("Hello") does not show output after a hard reset, but does after a soft-reload (ctrl-D). We did not see this as an issue before because it was not possible to manually get a connection that fast. However, tio and your simulator reconnect immediately.

But I don't see anything in MicroPython that waits for serial to come up either, so I'm not sure why you don't see the problem there. It also does tud_usb_init(), and it does not output anything to USB CDC unless tud_cdc_connected() is true.

We could wait a short time for tud_cdc_connected() to be true, but we don't really know that it will ever be true, so we need to time out. I've seen USB enumeration take up to 5 seconds in some cases, and I'm not sure we want to wait that long to run code.py.

hathach commented 2 years ago

This sounds like a TinyUSB issue. dcd_init() shouldn't setup the interrupt handler. That should be left to the client. @hathach thoughts?

CircuitPython intercepts the interrupt so that it can queue up the USB background task. By missing the interrupt, it misses the queued work.

It is originally from raspberry pi team, I remembered we have an discussion with them before pico is released and result in API to get and set irq handler API. It shouldn't be an issue with current implementation of post_usb(), irq is disabled and reinstalled with cpython irq. The race condition is probably on other issue.

urish commented 2 years ago

But I don't see anything in MicroPython that waits for serial to come up either, so I'm not sure why you don't see the problem there. It also does tud_usb_init(), and it does not output anything to USB CDC unless tud_cdc_connected() is true.

With MicroPython I do see that problem, so the default is to use the UART console and not the USB CDC while running in the simulator.

We could wait a short time for tud_cdc_connected() to be true, but we don't really know that it will ever be true, so we need to time out. I've seen USB enumeration take up to 5 seconds in some cases, and I'm not sure we want to wait that long to run code.py.

Where would you add this wait? One alternative I can think about it to have a way to indicate to CircuitPython that we want this wait, e.g. some file on the filesystem or a flag set by boot.py. This way, users who don't need to see the initial main.py output won't be affected by the change.

tannewt commented 2 years ago

It shouldn't be an issue with current implementation of post_usb(), irq is disabled and reinstalled with cpython irq. The race condition is probably on other issue.

It is an issue for CircuitPython as-implemented because an IRQ occurs outside of CircuitPython and therefore we don't queue up a background task to handle the resulting events. A simple fix would be to queue up the usb background function after we switch to our own IRQ.

I'd still like to emphasize that IRQ settings are done outside of the USB peripheral, just like clocks, and should be handled outside of TinyUSB as well.

Where would you add this wait? One alternative I can think about it to have a way to indicate to CircuitPython that we want this wait, e.g. some file on the filesystem or a flag set by boot.py. This way, users who don't need to see the initial main.py output won't be affected by the change.

Seeing the initial main.py output isn't critical because a reload will start things over again and show all output. Does your simulator do a complete reset each time?

Like Dan said, we can't wait for USB because we don't know if we'll ever enumerate.

urish commented 2 years ago

Does your simulator do a complete reset each time?

Yes. It first creates a fat file system with the user's code, writes it to the flash, and then starts the simulation.

I'd really appreciate any kind of solution / workaround that will let the user see the output from the very beginning. One workaround that comes to mind (which doesn't require any change in CP) would be to write code.py into _code.py (or similar), and then when the user starts the simulation, wait until we see the prompt and have the simulator inject import _code. There are probably some better ways, I'm open for suggestions...

tannewt commented 2 years ago

Why not do USB mass storage instead? If you have USB CDC already then it seems you are most of the way there already.

hathach commented 2 years ago

It is an issue for CircuitPython as-implemented because an IRQ occurs outside of CircuitPython and therefore we don't queue up a background task to handle the resulting events. A simple fix would be to queue up the usb background function after we switch to our own IRQ.

I am well aware of the background task, and post_usb_init() did removed the installed ISR in the tud_init() and installed its own. There should be no issues for circuipython with current implementation.

urish commented 2 years ago

Why not do USB mass storage instead? If you have USB CDC already then it seems you are most of the way there already.

Thanks for the suggestion! This is an interesting direction, I'll take a look at what it takes to use the mass storage to write code.py after the USB CDC is ready. One thing I'm worried about is that it may slow down the start up time (which is already much slower in comparison with MP, I still have to investigate why), but the only way to find out will is to give it a go...

tannewt commented 2 years ago

One thing I'm worried about is that it may slow down the start up time (which is already much slower in comparison with MP, I still have to investigate why), but the only way to find out will is to give it a go...

We delay for 1 second at power-on start up to give the user a chance to enter safe mode: https://github.com/adafruit/circuitpython/blob/main/supervisor/shared/safe_mode.c#L83 We're ok with that because start up usually only happens once. (It won't delay when waking from a deep sleep.)

urish commented 2 years ago

We delay for 1 second at power-on start up to give the user a chance to enter safe mode: https://github.com/adafruit/circuitpython/blob/main/supervisor/shared/safe_mode.c#L83 We're ok with that because start up usually only happens once. (It won't delay when waking from a deep sleep.)

So far, from what I can see, there's no way for the simulator to skip this wait unless:

The simulator detects that the code is in a tight loop, e.g. repeatedly reading supervisor_ticks_ms64() - I assume this eventually reads the hardware timer value, and then advance the simulated time by one second to skip the loop.
There's some way to figure out the memory address of the value returned by port_get_saved_word() and then the simulator can initialize it with some value, e.g. SAFE_MODE_DATA_GUARD | (MANUAL_SAFE_MODE << 8), and always enable safe mode. Not sure if this is useful, though - as I don't see a way to exit safe mode without restarting one more time.

If you have some other thoughts how to skip this startup delay when running in the simulation, I'd love to hear them!

dhalbert commented 2 years ago

If you are trying to do a faithful board simulation, I think there are times when the user will want the 1-second delay to be able to enter safe mode. Can someone click on the reset button on the simulated board? Right now, you create the filesystem each time, but if eventually you make it persist, the user may want to enter safe mode, to, say, avoid boot.py.

And right now do you do soft-restart when the user types ctrl-D? Then it would be same as a real board.

urish commented 2 years ago

If you are trying to do a faithful board simulation, I think there are times when the user will want the 1-second delay to be able to enter safe mode. Can someone click on the reset button on the simulated board? Right now, you create the filesystem each time, but if eventually you make it persist, the user may want to enter safe mode, to, say, avoid boot.py.

For the online simulator, there's no way to click on the reset button at the moment. I envision most users will get a link to some project, the click the "play" button, see how it works, and start playing around with the code. So time to start can have a notable impact on the user experience. To show what I mean, try running these two examples:

current setup: https://wokwi.com/arduino/projects/308170222209073730
skipping the initial delay: https://wokwi.com/arduino/projects/308170222209073730?circuitpythonhack=1

The ?circuitpythonhack is a combination of the two methods from my previous comment: whenever you read from the timer, the simulator looks at 0x20040000, where the port saved word resides. If it has a non-zero value, the simulator will fast-forward the time by adding 100ms to the timer value whenever the program reads it. This essentially skips the delay in wait_for_safe_mode_reset(). I can live with this workaround, as it doesn't seem too fragile, but perhaps we can still come up with a better solution?

And right now do you do soft-restart when the user types ctrl-D? Then it would be same as a real board.

Yes, it works just like on the real board. Why?

dhalbert commented 2 years ago

And right now do you do soft-restart when the user types ctrl-D? Then it would be same as a real board.

Yes, it works just like on the real board. Why?

I just mean that the "play" button could start from a hard reset, or it could just write code.py, as happens on physical hardware now.

The main reasons now for using boot.py are to customize the presented USB and to change the remount the filesystem to allow read/write. Is the simulator able to simulate HID devices, etc.?

urish commented 2 years ago

Is the simulator able to simulate HID devices, etc.?

Nope, at least until someone asks for it. Remounting the FS read/write could come up with some use cases (e.g. users who prototype a data logger and want to experiment with code that writes to the FS).

Writing code.py when you press on play could be a possible solution. What I'm still concerned about is the first-time experience: users who open a link to a simulation project will still have to wait for the safe boot detection code to finish.

One workaround would be to take a snapshot of the SRAM / peripheral state after USB has connected and the REPL has started, and to store it alongside with each CP version. Then, as soon as the page loads, we load the snapshot and start from there, skipping all the init code. There isn't any snapshot support in the simulator at the moment, but I can look what it will take to implement it.

TBH, I haven't released the CircuitPython simulation yet, so I'm just speculating on how users might use it/what they'll care about. If it makes more sense, we can wait a few weeks until I collect some user feedback and then come back to discuss if/how to optimize things.

tannewt commented 2 years ago

I envision most users will get a link to some project, the click the "play" button, see how it works, and start playing around with the code.

What if you don't have the play button and just start the simulator immediately? That way it'd be like plugging in a CP board. Are you able to blink the on board LED? That'd make it clearer that something is happening.

Can this simulator be run headless? I'd love to use it for library CI to track memory use.

dhalbert commented 2 years ago

If there is some delay, you could have a little UI delay countdown like a circle filling its edge (like the Apple download/install circle in the App store) that takes one second to complete. That would make it obvious that there was something to wait for.

urish commented 2 years ago

Can this simulator be run headless? I'd love to use it for library CI to track memory use.

Yes, I made a small demo repo for you it runs the simulator from the command line inside a Node.js process:

https://github.com/wokwi/rp2040js-circuitpython

It's just the bare minimum that you need to get the Pi Pico CircuitPython build running in your terminal - if you need any help customizing it to run on the CI, I'd love to help!

tannewt commented 2 years ago

Starred! Thank you!

My main use for it would be measuring import sizes of libraries so we'd need a way to dump memory state at a certain point. Something similar to the memory analysis I did a while back.

urish commented 2 years ago

I haven't watch the whole thing, but from skimming it seems like you use GDB to take the memory dump. There's a GDB Interface built into the simulator, so I can change the demo repo to expose that on a certain port. Would that do the trick, or do prefer a different method for dumping the memory?

tannewt commented 2 years ago

Neat! That should be enough. We can likely automate talking to it from python where we can also do the analysis. It's not the top of my todo list so it may be a while until I look deeply into it.

urish commented 2 years ago

Neat! That should be enough.

Done! If you need any help with automating it, I'm at your service.

urish commented 2 years ago

Closing the issue, as the original problem has now been fixed. CircuitPython 7.0 (RC 1) is working pretty well in the simulation, it can even do SSD1306 output.

For reference, this is what I ended up doing to make the complete output of code.py available from the very beginning. It all happens automatically when the user click on the "Play" button:

Write an empty filesystem
Start the RP2040 simulation
Wait for USB CDC to connect
Write all the files to the filesystem
Send Ctrl+D over CDC to reload code.py

Seems to be pretty robust!

There's still some one issue with the performance: it looks like all the delays (e.g. time.sleep()) are done with busy waits, which really slows down the simulation. I'll open a different issue for this.

adafruit / circuitpython