Makuna / NeoPixelBus

An Arduino NeoPixel support library supporting a large variety of individually addressable LEDs. Please refer to the Wiki for more details. Please use the GitHub Discussions to ask questions as the GitHub Issues feature is used for bug tracking.
GNU Lesser General Public License v3.0
1.17k stars 256 forks source link

ESP32-S3, ESP32-C3: investigate new method to use "General-purpose DMA" units #598

Open softhack007 opened 1 year ago

softhack007 commented 1 year ago

Is your feature request related to a problem? Please describe. On ESP32-S3, NeoPixelBus is not supporting "I2S" DMA methods yet, and only 4 RMT channels are possible. In comparison, "classic ESP32" allows up to 10 independent WS2812b or SK6812 LED channels, while the new MCUs only allow for 4 hardware driver channels.

It seems that -C3 and -S3 have additional "Generic DMA units" (see below), so my hope is that these could be used to drive more LED stripes independently.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] see above.

Describe the solution you'd like Additional Methods for ESP32-S3 (and maybe -C3) that give us a few more hw-accelerated busses.

Describe alternatives you've considered In principle, all LEDs could be chained together in a single BUS, however in WLED we have observed that performance drops with more that >500 LEDs on a single driver channel. We can't use I2S#0, because Soundreactive WLED needs it for I2S microphone input.

Additional context https://docs.espressif.com/projects/esp-idf/en/release-v5.0/esp32/hw-reference/chip-series-comparison.html#id5 image

Makuna commented 1 year ago

It looks like the DMA while being general purpose, still requires a hardware peripheral to be assigned to. I have yet to find a direct DMA to IO pins.

cherrydev commented 1 year ago

So, I've tested the GDMA support for the RMT channels on the S3 with esp-idfv5 and it works wonderfully. During an asynchronous transmission on a single RMT channel CPU usage drops from 18% with PIO (256 byte buffer, the max seemingly supported for PIO) to to about 0.01% with DMA (with 1k buffer). I haven't done glitch testing yet, but my belief is that with practically zero interrupt contention, the chances of glitches should be negligible, barring bugs in the ESP drivers or hardware. Using the new RMT driver that has on-the-fly encoding support also means there is nearly zero memory overhead beyond whatever buffer you're using (if any) to store pixel data (beyond the DMA buffer per channel, about 1KB). This is as opposed to the I2S method which requires an additional 12 bytes for each pixel which can add up. And while the S3 only has 4 RMT TX channels, I don't believe there's anything stopping multiplexing a single channel over multiple pins sequentially (unless changing pins was extraordinarily expensive for some reason). If you were targeting 50FPS, you'd have a theoretical max of about 650 LEDs per DMA channel (800khz / 8 bits / 3 colors / 50 hz = 666), but you could split that up among 2 or 3 pins by changing pins. And driving 4 DMA channels like this could give you 2600 LEDs over 8, 12 or even 16 pins. I don't use NeoPixelBus myself (at least not yet) but I thought I'd share my findings.

Makuna commented 1 year ago

@cherrydev There is already RMT support in this library, and it uses the translate feature. The RMT has a very limited and specific DMA buffer size on legacy ESP32 (unsure of the newest variants) and thus the translate was required so that the core RMT support could share/manage this limited resource across several uses of RMT channels (NeoPixel and/or IR remote) at the same time. I helped drive that feature.

DMA is a means for hardware peripherals to access memory without the use of the CPU and constant ISRs (load and translate ISRs). As soon as you require a user provided translate this then requires the CPU and will not need the DMA between this library and the peripheral. So, it no longer is a DMA method. I don't believe the ESP platforms supports Programable IO like the RP2040 (assembling programming that runs in the IO peripheral not on the CPU) and thus would require the CPU for translate.

This issue is specially about using the DMA methods that may have been exposed due to engineering changes in how DMA is used in the new variants and peripherals that can use it.

There have been versions/branches of this library that directly use the RMT DMA and ignore the ability to share it. They have a very limited size of the strip due to the limited memory size; but they do require the data to be pre-translated and thus much larger. Again, if its DMA, it requires native data format and thus no translate. This would effectively remove all interrupts (other than emptied DMA buffer). But it sounds like you are stating DMA direct wouldn't require this; so, I am unsure how it would translate and be DMA at the same time.

arrowcircle commented 1 year ago

Any updates on this?

Makuna commented 1 year ago

I haven't looked at it in while; S3/C3 already support RMT using DMA.

I mentioned RMT is already supported, @cherrydev mentioned testing GDMA to RMT and I don't know if that is really different as no link to code was provided.

I mentioned above that I could find nothing on GDMA to pins as the table hints at in the attacked image. If someone finds a reference to this, then I can proceed.

softhack007 commented 1 year ago

Hi, I've started this topic without knowing much of what I was talking about, mainly hoping that there would be a way to have more LED output pins on -S3.

It seems that only little information is available about the GDMA feature.

https://www.espressif.com/sites/default/files/documentation/esp32-s3_technical_reference_manual_en.pdf

Screenshot_20230606-220230_Google PDF Viewer

Screenshot_20230606-221857_Google PDF Viewer

Makuna commented 4 months ago

https://github.com/adafruit/Adafruit_NeoPXL8/blob/master/Adafruit_NeoPXL8.cpp Demonstrates the LED peripheral with DMA.