adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.11k stars 1.22k forks source link

multicore access on Raspberry Pi Pico #4106

Open mlewus opened 3 years ago

mlewus commented 3 years ago

The Pi Pico has 2 physical cores, but only one core is usable in CircuitPython. Micropython has limited multicore functionality when used with the pico, allowing the user to start a separate task while passing variables in the task call, like this:

def mytask(pin, delay):
# bang away at a pin or whatever, call with
import _thread
_thread.start_new_thread(mytask, (GP2, 0.2))

mytask runs independently of the main mcu core, and runs until it returns or mcu reset.

Is this planned for inclusion in Circuitpython?

tannewt commented 3 years ago

It isn't a short-term goal. We have many other APIs to implement before we do multi-core.

I'm considering an API to facilitate running native code on the second core instead of additional Python. What are you using it to do?

mlewus commented 3 years ago

Not sure how relevant this is to CircuitPython, but I did some multicore testing with MicroPython on a Pico back in February. I found that global variables require about 1mSec to get updated between cores. The only element that updates faster is a lock. These must be interrupt driven as they update very fast, on the order of a few microseconds.

Red-M commented 3 years ago

I'd really like this for a project I'm building to allow the 2nd core to handle a bunch of additional IO tasks while the first core is on input detection.

mlewus commented 3 years ago

My limited understanding is that Circuitpython is, or started out as, a fork of Micropython. Micropython supports multitasking, locks, and inter-task globals on the Pico. Can a Circuitpython developer comment as to the difficulty of implementing this on Circuitpython?

tannewt commented 3 years ago

My limited understanding is that Circuitpython is, or started out as, a fork of Micropython. Micropython supports multitasking, locks, and inter-task globals on the Pico. Can a Circuitpython developer comment as to the difficulty of implementing this on Circuitpython?

It's really hard to say because none of us have looked into it.

Right now we're based on MicroPython ~1.9.4 (https://github.com/micropython/micropython/commit/25ae98f07cb3c4488cb955403dfe56b8bb8db6f0) which is a year or two old. I think upstream has made some concurrency improvements since then so the first step I'd take is to merge in newer MicroPython. There is an outstanding PR to do that here: #4280

Once that is done, I'd try enabling the settings in CircuitPython to see if they work. You'd also need to look at the MP changes for the RP2040's second core. That'll likely expose many areas where there are concurrency issues that need to be fixed.

Red-M commented 3 years ago

Personally, I'd prefer if CircuitPython offered "more difficult" to use features such as _thread because I really just want to get this done and I know how to handle myself with concurrency and memory access.

I'd really want Python on the 2nd core because frankly all my IO tasks are laid out with CircuitPython libraries, so if I had to write C for it I might instead just move to MicroPython instead and re-implement the libraries I'm currently using in MicroPython.

mlewus commented 3 years ago

The problem for me is that using circuitpython on the Pico means throwing away half of the machine. I don't really see C++ on the 2nd core as a solution. Most people who use circuitpython do so for the advantages that it offers over C++. I'm not sure how the decision making works as to which features get to the top of the list but it would be great to see this get prioritized. I've been using micropython, but that means rewriting a lot of the hardware drivers that already exist in Circuitpython. So that is not ideal.

Red-M commented 3 years ago

Would this then be better suited to being tracked along/after 7.x has a release so that there is more control around concurrency?

ladyada commented 3 years ago

@Red-M if its just for keymatrix management, it would probably be better to adapt _gamepad to be a matrix scanning helper...? then it isnt multi-core dependant and could run on other chips :)

Red-M commented 3 years ago

@ladyada Thanks for the reply, but I've gotten the keymatrix poll/scan down to just interrupt pin polling which is way faster. I'm making a full sized keyboard with RGB and other features so I'm really wanting access to the other core with all the IO libraries.

ladyada commented 3 years ago

if the keymatrix was handled in the background - what other IO would you need?

Red-M commented 3 years ago

dotstar and displayio.

ladyada commented 3 years ago

kk - displayio is also handled in the background with DMA and dotstars are near instantaneous - you can drive them at 8mhz. im not convinced you'd gain anything with dual core support and its not going to be adding to circuitpython soon

mlewus commented 3 years ago

Frankly I think that's a little short-sighted. But it's your football. Closing this issue.

ladyada commented 3 years ago

reopening because its a valid issue. for some purposes - a keyboard, for example - we really do think it does not need a dual core. we are trying to help get projects done and not have to wait for something that does not exist!

Red-M commented 3 years ago

I have to agree with @mlewus. You can't just wait for another cycle to get more feedback to handle other tasks, near instant isn't nearly as fast as real time.

ladyada commented 3 years ago

in python the garbage collector can run at anytime - theres no such thing as true guaranteed real time task handling in interpretted python which is why you'd have to do it C.

i'd like to find a solution to the problem, however the problem isnt clear yet - USB tasks out every 1ms, its not 'instant' either. some really specific numbers or complete example code would be useful. if it cannot be measured, we cannot improve it :)

dhalbert commented 3 years ago

The adafruit-blinka Python library, which is mostly used for Linux, emulates many of the native modules provided by CircuitPython. But besides supporting Linux, it also has a MicroPython wrapper. So you can still use CircuitPython libraries, on top of MicroPython, if you do indeed want to take advantage of something only MicroPython provides.

dhalbert commented 3 years ago

Some keyboard projects done in CircuitPython: https://github.com/KMKfw/kmk_firmware https://github.com/tannewt/ckd63 (4 years old)

If you'd like to discuss this in more detail, this issue is probably not the place. Discord or the forums are good.

mlewus commented 3 years ago

@dhalbert, I'll take a look at adafruit-blinka and continue this in a more appropriate forum. Thanks for your help!

Red-M commented 3 years ago

@dhalbert Just letting you know and others in case they see this issue but that suggestion doesn't work due to adafruit/Adafruit_Python_PlatformDetect#144.

EDIT: @ladyada For giggles I tried using gamepad (_gamepad doesn't exist) and my usage of pins from the MCP IO expanders means that gamepad reliably hard crashes the pico and its so bad that I can't even get tracebacks, I'm guessing its something to do with supvisor not being able to access the pins over the I2C bus.

ladyada commented 3 years ago

yeah you cannot use gamepad (yet) because it doesnt support an expander. blinka is great but will not add hid support to micropython-on-pico. if you need help getting your project optimized you can come by discord

tannewt commented 3 years ago

The problem for me is that using circuitpython on the Pico means throwing away half of the machine. I don't really see C++ on the 2nd core as a solution. Most people who use circuitpython do so for the advantages that it offers over C++. I'm not sure how the decision making works as to which features get to the top of the list but it would be great to see this get prioritized. I've been using micropython, but that means rewriting a lot of the hardware drivers that already exist in Circuitpython. So that is not ideal.

The RP2040 isn't our only platform and the second core isn't the only thing we can't do that the hardware supports. It's all a question of prioritization and Adafruit-funded folks are prioritizing other work. (We discuss this in places like this issue and the weekly meeting.) We'd love to have threading and multicore support but it's just not clear to us that the benefit is worth time cost. I always prioritize PRs so no one outside of Adafruit would be blocked on implementing this.

I'd encourage you to try CircuitPython for your specific project. There may be other ways to optimize the code so that you can do what you want from CircuitPython even with just one core. (I know @Red-M got their keyboard code going by optimizing I2C transactions when scanning.)

mlewus commented 3 years ago

@tannewt I completely understand that Adafruit has limited resources and has to make the best use of them to support business & community goals. I think Adafruit is a great example of a well-run tech company. It's the reason that I buy products from you whenever I can.

Red-M commented 3 years ago

I understand that there is limited time and resource for concurrency but more and more devices/MCUs are supporting/providing more than 1 core.

It would be great to have this in general and not just for the RP2040 also for various reasons/usecases. I'd honestly prefer some planning go into getting concurrency as a whole (if it isn't/hasn't been done yet) because more devices in the future will have a multicore offering.

Offering even just async tasks to be placed onto other cores that share the same IO would be excellent, as much as I'd prefer to do threading workloads.

EDIT: A Cython like superset language would be excellent for this exact kind of problem to allow python code to be on the 2nd core as well.

mlewus commented 3 years ago

@Red-M, a week ago I would have agreed with you. But one of the Adafruit developers brought up a valid point. Which is, they support 100s of boards with circuitpython. Only a few of those are multi-core. Given the vastly greater scope of requested functionality with respect to available development labor, I understand why this has not been done.

Red-M commented 3 years ago

@mlewus I'm not saying it isn't going to be a lot of work but some form of concurrency planning should be done soon as more MCUs are going to come out with more than 1 core.

sirkha commented 3 years ago

Adding to the application conversation, I have a Feather RP2040 with a Pimoroni Envior+ wing, an AMG8833 wing, an adalogger wing, and a PoE wing all attached to a single trippler that also has an IR tranceiver and rotary encoder on it. I would like to be able to use the 2nd core to process the analog audio from the mic on the enviro+ and possibly stream it over the network to do voice command recognition while the first core handles the rest of the sensors and U/I.

mlewus commented 3 years ago

@Red-M, yes of course you're right. I think we're going to see more low end multi-core microcontrollers. And there are some tasks that are too complex for a Pio but for which you really don't want to be interrupted.

john-- commented 3 years ago

@Red-M, yes of course you're right. I think we're going to see more low end multi-core microcontrollers. And there are some tasks that are too complex for a Pio but for which you really don't want to be interrupted.

Agreed. I'm happy to see multi-core support is being considered!

nmorse commented 3 years ago

up-vote for native code on the second processor, I have a project that requires 4 rotary-encoders and presents a HID interface.

Possibly this is an example of an app that needs dual-cores: circuitpython on Proc0, native (driver) code on Proc1?

ladyada commented 3 years ago

@nmorse did you try circuitpython as is? 4 rotary encoders and HID is not processor intensive.

nmorse commented 3 years ago

I will try that, (on this project) up till now I had been doing all 'C-sdk', thanks

razzededge commented 2 years ago

I also would vote for this - even some kind of port/wrapper for _threads from MicroPython would be nice.

SchroedersKater commented 2 years ago

Another vote from me

mlewus commented 2 years ago

My FWIW if this is to be prioritized: We need to show demonstrable instances where a real project, preferably with publicly posted code, that can't be made to work with the current circuitpython, but could be done with multiple core support. Or, ones that could be made to perform meaningfully better with multiple core support.

So far there have been a lot of what if kind of requests here, including one of mine, but none that meet that standard. Adafruit has limited resources and many demands on them. If we want those resources applied to this feature, we need to describe a better use case than had been shown so far. Or, live without it until either one of us takes it on ourselves or more processors come out with multiple cores, which would provide more use cases for this code.

ladyada commented 2 years ago

hiya folks, lets keep comments positive so they cant be misread - we're here to solve issues, sometimes we can solve issues without waiting for the circuitpython team to do a lot of work.

we recently added asyncio, working with micropython team so that our codebases and libraries could be compatible. there is work happening with adding more advanced features: circuitpython has a lot of async stuff going on and as we gain more dual core processors, there'll be more work on it.

we want to make sure that what we write is well-supported and usable. adding use cases will help us greatly to make sure we have coverage - we think we captured the most common async/irq needs so far and there will be more!

Red-M commented 2 years ago

I think if you want to meet coverage of end-user usecases, you might want to look at how Python handles that in their unit tests.

EDIT: Most of the frustration from users is that micropython doesn't have a good or wide spread USB stack to work with (limited number of boards and supported modes of operation, micropython has gotten vastly better in both respects over the 2 years I've been interested in this space of software but they still have a ways to go to match circuitpython for USB support). But circuitpython's lack of _thread and IRQ support/callbacks makes a lot of logic and integration harder because most of that has to be supported by the HALs (or similar structures) instead which makes things much harder as someone coming into this space of micro controllers who want to get firmware made much more quickly.

Also noting from this comment as an update:

(I know @Red-M got their keyboard code going by optimizing I2C transactions when scanning.)

I've since moved away from those MCP IO expanders because they were too slow (even in the faster SPI mode that I added to the library to support the SPI variant of those chips), to using 2 controllers with 1 controller being optional, making my PCBs more complex due to the lack of software support for the 2nd core on the pico (which the pico doesn't have enough pins for my goals).

I say this as a statement of the work being done for python on micro-controllers being in a good state but I'd love to have more advanced methods to interact with hardware.

rmilby13 commented 2 years ago

@ladyada Use Case: I would like to use one core for monitoring a Loconet (from Digitrax) communication channel and a DCC reception (using PIO for the low level work) and use the other core for managing 16 GPIO (servos/buttons/current block detection/IR sensors, maybe other uses later), updating a neopixel bus, i2c communication (RFID, eeprom, display, etc) and possibly a bit more. I want to keep the loconet and DCC on it's own core to help prevent data loss in the PIO I/O buffers. I'm just getting started now, so I haven't tried to load everything on one core yet.. It may handle it, but as the RP2040 has dual cores I would like to leverage one just for communications and one for "local work".

bluelasers commented 2 years ago

@rmilby13 That is a good idea. However I would recommend you look into using two RP2040s. Do you want to create a repo with discussions enabled where we could chat about it? If you want to use a single RP2040 this gets more complicated. There is a lot to talk about and they do not like that stuff here. I honestly am not sure they will get around to this, since its the only multicore MCU they support. Any meaningful API for building this would need their support, so that is too bad. They could make some money on this, but I doubt they want the baggage.

tomasinouk commented 2 years ago

Hi, I came here looking for a solution to a problem, where RP2040 communicates with LTE/WiFi to send data. At this time pretty much any library is using time.sleep. Meaning while I am sending data over UART the code is blocking. The device does not react to buttons, etc. My understanding is if a thread would be used for offloading data another thread could be used to keep the device "operational". I am happy to be educated on different approach to resolve this scenario. Thank you

AdaRoseCannon commented 2 years ago

The use case I have is very superficial but I want to run animations on the macropad screen, whilst having the LEDs fade smoothly. Unfortunately if I am writing big changes to the screen the rate the LEDs update slows down and it looks a little off.

So I want to run my displayio things in the other thread so that the standard keyboard features can run independently, such as detecting keypresses and LED animations.

agrabbs commented 2 years ago

Possible use case: we have a sensor logger application that reads from a 6dof and writes to the onboard flash of the RP2040. We poll the sensor as many times as possible in a span of 1 to 10 seconds and have tried various methods of saving the data:

  1. One sensor poll, one IO write (or append): Slowest, highest amount of sensor data loss > ~50%
  2. Poll sensor into array and write at end (anything over 2 seconds and the RP2040 runs out of ram): Fastest, but we cannot run for more than 2-3 seconds and the memory is full.
  3. Poll sensors into array and write to file at X interval: Fast, but we lose about ~150 sensor data points during each IO write.

We would love to see a library that allowed utilization of the second core so we can offload the IO write to another core while the sensor data collection continues.

ladyada commented 2 years ago

@agrabbs you are not CPU bound, you are flash-memory-speed bound - writing to any flash memory requires erasing blocks, which will take a long time AND the same internal flash memory is being read from for the RP2040 to run.

this is very hard to do. ideally you can buffer into an array - you should have ~ 100k available. or, FRAM is the easiest external-memory solution as it has instantaneous writes. ESP32-Sx with PSRAM might be reasonable because you get SRAM speed but 2MB of space

RyanDam commented 2 years ago

I'm building a small step sequencer + synthesizer using raspberry pico. This project involves keypad+encoder, screen and audio output. Currently, everything is running in just 1 core and sometimes the audio glitch because of heavy UI rendering. It's will be best if I can utilize the second core for just audio processing while the first core is dedicated to handling user input and UI rendering. By utilizing the second core, maybe more advanced audio processing can be archivable.

bradanlane commented 2 years ago

My usecase is mostly an up vote of @AdaRoseCannon example:

Display update in one thread while “everything else” is in another thread.

(I know of a micropython project where they went through macerations to get this working but it made world of difference in the end result.)

bradanlane commented 2 years ago

I have another usecase which may be niche

charlieplexing required an infinite loop to keep the LEDs updated. It would be convenient to run this on a second core. It would require some form of shared memory or communications to choose which LEDs need to be active.

jepler commented 2 years ago

It may be possible to do charlieplexing with features now in CircuitPython, without multicore. It's possible to repeatedly send a buffer to a PIO peripheral forever with background_write, which opens a ton of interesting possibilities.

There's an example on the Learn system to use this to drive a 4-digit 7-segment LED directly.

In an example I coded but didn't demonstrate in the guide, I combined the LED segment driving code with the advanced PWM code and controlled the brightness of segments individually: https://github.com/adafruit/Adafruit_CircuitPython_PIOASM/blob/main/examples/pioasm_7seg_fader.py

I think the same structure would adapt readily to charlieplexed LEDs.

deshipu commented 2 years ago

The _pew module does LED matrix updating without either a second core or pio, as it does it on samd21. The matrix happens to not be charlieplexed, but I have successfully used variants of the same code with a charlieplexed matrix as well.

Rybec commented 1 year ago

My project is also audio synthesis. Timing is critical, and I'm trying to find a way to do it with the PIO, but the FIFO is extremely small for buffering audio, and with CircuitPython I don't have the option of triggering an interrupt to run the generation code as needed. I haven't written any code yet, so I'm not sure exactly how limited I'm going to be yet, but the ideal situation would be to be able to use the second core to generate the audio and the first to handle user input and the display.

Interestingly, while I was looking for way to just run Python code on the second core, native C would be even better for this particular project, because timing would be far easier to manage, and speed wouldn't be as much of an issue. (Also, I have some basic audio synth software written in C already...) Long term, I would love to see both Python and C/C++ as options for this. The ability to run native code directly from CircuitPython on the RP2040 would open up a ton of options, but not every multi-threaded project needs that.

But yeah, I would really like to write an audio synth using the RP2040, and it would be ideal if it was a "second core" module that just uses the second core so that the first is free for whatever else. And I'm not sure how feasible this is with just the one core.

I do have some things I can try. PyRTOS might be able to manage scheduling in a way that isn't too heavy handed. (I added service routines a while back that might be ideal for this...) I can use a simpler timing loop algorithm. I might be able to use asyncio, but it is designed for I/O bound tasks, and audio synthesis is CPU bound. I don't think any of these solutions would allow the synthesizer code to be released as a solid standalone synth module though, since they would be locked into a serial timing algorithm that any user would have to work within. (And I really would like to turn the synth portion specifically into an open source project, but I don't want to force the user to adhere to a restrictive scheduling regimen.)

Anyhow, I figured if you are looking for applications where this kind of feature would be critical, I've got one! I can always go with MicroPython instead, but if I'm going to write a full on synth module, I would really rather do it for CircuitPython.