FastLED / FastLED

The FastLED library for colored LED animation on Arduino. Please direct questions/requests for help to the FastLED Reddit community: http://fastled.io/r We'd like to use github "issues" just for tracking library bugs / enhancements.
http://fastled.io
MIT License
6.5k stars 1.65k forks source link

Inefficient SPI Data Transmission for APA102 / SK9822 LEDs on ESP32 #1609

Open coryking opened 8 months ago

coryking commented 8 months ago

Summary

When using FastLED with ESP32 to drive APA102 and SK9822 LED strips, significant inefficiencies have been observed in SPI data transmission. The implementation appears to conduct separate DMA transactions for each byte of data, introducing noticeable gaps between 8-bit bursts. This behavior considerably diminishes the effective data transmission rate, closely mirroring the performance limitations characteristic of older one-wire LED models like the WS2812b.

Environment

Observations

Expected vs. Actual Behavior

Expected: For high-speed SPI communication, especially at higher data rates, FastLED should utilize continuous DMA transfers for the entire data packet of LED strip data, minimizing CPU overhead and maximizing throughput for quick and efficient updates.

SDS00004 This is what I'm calling the "inner byte gap". It's roughly 1.7 or so microseconds. It transmits 8 bits, takes a break, transmits 8 more bits... etc.

Actual: The implementation seems to trigger individual DMA transactions for every byte sent to the APA102 and SK9822 LEDs, leading to unnecessary overhead, reduced transmission rate, and increased update times, particularly for longer LED strips.

Investigation and Documentation

This issue aims to document and investigate the SPI data transmission inefficiency for APA102 and SK9822 LEDs on the ESP32 using FastLED. The goal is to explore this issue further, propose optimizations, and prevent duplication of efforts across multiple issues or pull requests.

SDS00002 The above image shows how long it took to transmit a 30 LED strip. Roughly 230uS at a "20MHz" setting in fastLED.

Additionally, this concern echoes a comment from @SCarter858 on Dec 27, 2023, which detailed similar inefficiencies and questioned the actual utilization of DMA for SPI communication within FastLED's ESP32 HW SPI support.

Additional Context

Note that this being caused by separate DMA transactions is only hypothetical. It might not be using DMA at all!

This inefficiency not only affects APA102 LEDs but also their SK9822 counterparts, which are often used interchangeably in projects. The potential lack of full DMA utilization for SPI transmission significantly throttles the performance capabilities of these chips, making high-density LED installations as slow as older one-wire LED protocols like the WS2801. An optimized implementation would greatly enhance performance, leveraging the ESP32's capabilities for LED projects using FastLED.

coryking commented 8 months ago

Here is some results when I used a completely different APA102 driver I found online: DMA Findings.md

As you can see from the oscilloscope readings in the linked document, when you use DMA it looks nothing like the ones in this ticket. Plus the speeds are significantly faster (though they appear to be "rounded" for reasons I don't fully understand.

coryking commented 8 months ago

So I need everybody's opinion on how to provide the "SPIOutput" class the number of LEDs / bytes required for a DMA buffer.

In order to get DMA working we need to allocate a buffer on the heap but to do so we need the size of the buffer. I think the best place for this is in the init() function. The problem is these "SPIOutput" classes do not know how many LEDs they are supposed to be driving, thus I have no clue how big of a buffer to allocate. So I need to get the number of LED's plumbed in and probably the number of "extra" bytes needed for start and end frames.

The problem is these SPI handlers are all supposed to look like "SPIOutput" even though they are all completely different classes. So I suspect if I make a change to init() to pass in the number of leds (or perhaps just bytes) I'll have to change all the other SPIOutput classes as well (which may not be a bad idea because any of the other SPI classes that do DMA need the same info).

So, in short:

  1. Is SPIOutput the correct place to manage the DMA buffer?
  2. What is the best way to provide SPIOutput with the size of said DMA buffer?
  3. If it is the SPIOutput::init() function like I plan, are we cool with updating all the other SPIOutputs to have the same signature?
  4. Should it get the number of LEDs or the number of bytes? I'm leaning toward the number of bytes, as I suspect that SPIOutput is a "bit pusher" and isn't supposed to care about the various LED protocols.

Pinging @samguyer and @kriegsman for their thoughts as well as @Jueff & @sheaivey who looks to have done some DMA stuff for a different platform.

To avoid getting blocked, if I don't hear anything in the next few days I'm gonna proceed with doing this in the ESP32SPIOutput class and passing the number of bytes / leds through the init() function.

coryking commented 8 months ago

Also I should add I'm going to borrow quite liberally from the led_strip_spi component from the UncleRus esp-if-lib component library. I'll make sure to adhere to its MIT License when pushing my changes.

zackees commented 3 months ago

Any update on this?

zackees commented 2 weeks ago

Make sure that you have #define FASTLED_ALL_PINS_HARDWARE_SPI before the #include "FastLED.h" part to make sure you are not doing software bit banging.

Okay so i took at look at src/platforms/esp/32/fastspi_esp32.h and indeed we are using the 8-bit spi write and just iterating through the pixels. From the documentation of ESP32, these spi transactions need to grab a lock once per byte. So if true, that would explain the pattern of inefficient transmission you are seeing - it's basically blocked on a lock.

Say you have 1k leds, that's going to translate into 3k lock/unlocks to write all that data out.

Looking through the docs I see that esp32 now provides void spiWritePixelsNL(spi_t *spi, const void *data_in, uint32_t len); as a new experimental API. Apparently, an entire buffer can now be sent to the SPI device in one shot and the lock will hold until the pixels have been flushed out through the spi device. I'd have to look more closely to see if 2 or more SPI devices can operate in parallel in FastLED.

Looking at src/platforms/esp/32/fastspi_esp32.h at the function template <uint8_t FLAGS, class D, EOrder RGB_ORDER> __attribute__((noinline)) void writePixels(PixelController<RGB_ORDER> pixels, void* context);

I see that we are writing each r,g,b byte out, then we are doing our advancedData() and stepDithering() operations after each pixel write. So sending an undefined block of data out to the spi in it's entirely won't work unless we supply a side buffer. However, we do do this for the WS2812 and other clockless data types exactly for this reason.

So the solution here is either create a side buffer that lives for the duration of the esp32 spi implementation... OR allocate a giant block of leds and send K batches for N leds. For example the batch size could be 256 pixels. This would mean that the inefficiency that you see would be cut to 1/256 of what it is now, which would almost be identical to full performance mode where everything is sent all at once.

zackees commented 2 weeks ago

I'm adding a new feature for bulk transfer through the SPI device on ESP32. You can enable it via #define FASTLED_ESP32_SPI_BULK_TRANSFER 1 before the inclusion of FastLED.

It will go out with 3.9.4. Right now it's just in master. I've compile tested, but haven't run it yet. If anyone tests it, let me know the results.

coryking commented 1 week ago

Thanks! I haven't had time to look into this for quite some time, unfortunately life events got in the way!

On Wed, Nov 13, 2024 at 6:36 PM Zachary Vorhies @.***> wrote:

I'm adding a new feature for bulk transfer through the SPI device on ESP32. You can enable it via #define FASTLED_ESP32_SPI_BULK_TRANSFER 1 before the inclusion of FastLED.

It will go out with 3.9.4. Right now it's just in master. I've compile tested, but haven't run it yet.

— Reply to this email directly, view it on GitHub https://github.com/FastLED/FastLED/issues/1609#issuecomment-2475290886, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEAPKWXIT5F6SSLI52YZI32AQEBPAVCNFSM6AAAAABEPFR7GCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZVGI4TAOBYGY . You are receiving this because you authored the thread.Message ID: @.***>