FastLED / FastLED

The FastLED library for colored LED animation on Arduino. Please direct questions/requests for help to the FastLED Reddit community: http://fastled.io/r We'd like to use github "issues" just for tracking library bugs / enhancements.
http://fastled.io
MIT License
6.36k stars 1.62k forks source link

blend8 maths is wrong #1633

Open notzed opened 3 months ago

notzed commented 3 months ago

Using real numbers this is the actual calculation desired, where m=[0.0, 1.0] is the interpolation factor.

result = round(a + (b - a) * m)                          (1)

But your blend function uses this:

result = (  A*(255-M) + A + B*M + B      ) / 256

But where did "B*M + B" even come from? It seems to have no mathematical basis.

The root of the problem is that with a scaling range of M=[0,255] with an 8 bit shift the actual range is M = [0,(255/256)] = [0.0,0.996] rather than m = [0.0,1.0]. One easy fix is to simply allow M the range [0,256].

A more serious problem is that it's truncating the result rather than rounding so for small values of A and B the interpolation is very poor. e.g. If A=0, B=1, the result is mostly 0. i.e. for M=[0,254] you will get 0, and M=255 you will get 1. I've only played with 1 led strip but this makes the already bad low intensity performance even worse.

This calculation will not reach the full range for some values if M=[0,255], but it will be smoother for all of them:

result = (((A << 8) + (B - A) * M + 0x80) >> 8           (2)

A more accurate fixed point version of (1) requires more bits and a couple more steps:

result = ((A << 16) + (B - A) * M * 257 + 0x8000) >> 16  (3)

Here (M 257) is an 8.8 fixed point approximation of (M 256 / 255) to scale M=[0,255] to the range [0,256] and 0x8000 is half of 1<<16 for rounding.

robertlipe commented 1 month ago

Just as a casual human reader of that function, I agree with your analysis. (I'm also not certain the comments match the code, but that could be either due to the density of both or my own density :-) ) Maybe it's optimized for color blending over the more common lay description. This looks like a tortured translation of some ASM code - and lib8tion is full of that. Worse, the implementation seems likely to prevent a modern optimizer from combining the expressions before a strength reduction and certainly from implenting a fused multiply/(negative) addition on it..

Code needs to be right before it's made "fast". Why is this not just the classic ( A amountOfB ) + ( B ( 1 – amountOfB ) ) / 256 ?

This library really needs some automated testing and benchmarking. I'm not sure that all the assumptions for AVR-style code exactly hold if you're running this on a modern vector/SIMD-enabled system. I hope it gets it in some way that's just not obvious.

In summary, just here from the sidelines, I agree that this function (and probably some neighbors) is worth some serioius pencil-whipping.

zackees commented 2 weeks ago

Unit testing this type of code is interesting.

I just went through this yesterday with another project and my notes are pretty much this: you have to have a sub folder of pure portable code which includes no arduino.h, esp, avr etc headers.

This pure portable code can then be unit tested with your favorite framework.

The next question is how to deploy the gcc compiler framework so it works on mac/win/linux without sudo access, as system package managers need this.

The answer to this surprisingly is to use the zig C++ compiler and install it via pip, which doesn't require elevated privileges.


As for the blend function, a lot of this stuff was written when the Arduino uno was all the rage. The strategy we have is that we have upgraded these little functors with a better version for 32 bit chips but still use the old version for avr chips, controlled by a define.

I've done a lot of work to make FastLED as easy as possible to hack on it. Now you can just clone the repo and then open it up in VSCode (+platformio) and hit compile / upload.

zackees commented 2 weeks ago

@notzed was this fix suggestion hypothetical or did you use it in code and verify that it works correctly?

robertlipe commented 2 weeks ago

Platformio has a framework to run unit tests on the device and/or natively. I've not used it, But I know it's there. Gtest.looks like a familiar option. https://docs.platformio.org/en/latest/advanced/unit-testing/index.html

I get that testing the hardware part is hard, but unit testing (and benchmarking) the libation stuff would have good value and a lovely place for someone new to the project to help out. Testing math (pure). functions should be straight forward.

Running tests from another language on another compiler hmore than zero value, but a lot of potential for distraction in chasing shiny things. I wouldn't encourage that.

On Mon, Aug 26, 2024, 11:30 AM Zachary Vorhies @.***> wrote:

@notzed https://github.com/notzed was this fix suggestion hypothetical or did you use it in code and verify that it works correctly?

— Reply to this email directly, view it on GitHub https://github.com/FastLED/FastLED/issues/1633#issuecomment-2310604011, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD37E72XYEGXQOLWZCBTZTNJZTAVCNFSM6AAAAABJHOHQ3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQGYYDIMBRGE . You are receiving this because you commented.Message ID: @.***>

zackees commented 2 weeks ago

Platformio has a framework to run unit tests on the device and/or natively.

I've used native. Anything including "Arduino.h" will fail to compile. So unless you make a fake "Arduino.h", the value is debatable on exactly what you are testing, especially since you have to verify your expectations by capturing the stdout from the device and checking string values.

Hence the reason having a lib of just pure functions / classes not calling into any platform stuff. You get to use fancy expectation stuff in the testing frameworks and possibly get to use gdb to step through the code while it executes.

What would be better is to use qemu to fake an esp32 and read back the printfs. I've looked that qemu for esp32 and there is a github action to run it. The downside is that it wants partition.csv and a lot of other stuff that is abstracted away from platformio when using the arduino core. It could probably work with esp-idf framework for platformio, though.

robertlipe commented 2 weeks ago

Native might be convenient for dev and test, but it's not really the strategic trophy to capture anyway. Sure, for testing qadd8 in a template that works on a 64-bit core (like the common $6 RISC-V parts) natively to verify that sign extension works might be nice, the real value is running it on the board you actually care about, so going through the pain of building up a test .bin and shipping it to Real Hardware to run is probably worth it - especially since you'll eventually want to test things more ambitious than libation. This is how we each did testing in a previous life in a similar case, our cross targets were just mobile devices.

There probably IS value in testing under Qemu, too, but outside of libation so much of what FastLED does is so hardware and timing dependent that you're never going to get that simulated correctly. It's not like a missed DMA interrupt that causes the 800'th LED to flicker is something you're going to catch with Qemu.

To split the difference, there's Wokwi. Here's a few non-trivial FastLED examples: https://wokwi.com/projects/295021961569894920 and https://wokwi.com/projects/289631971594732040 You can build up sims with different boards/configurations/strips. I've seen some evidence that they're using an externally compiled FastLED that may not REALLY be obeying the sizing of leds[] as cranking up pixel dimensions above some magic number just breaks things. I'm also not totally convinced that their strips and their DMA/RMT things are actually doing signal and bus-cycle level simulation here - they may be peeking into leds[] themselves and divining what LEDs to simulate 'on'. That could probably be most easily verified by selectively breaking the provided copy of FastLED. If you can remove the RMT or SPI layers and it still works, then you know the cake is a strategic lie.

They really want to sell subscriptions to Wokwi for the advanced features like being able to upload the bins right from your CI. (Someone's gotta pay for those compute cycles.) They support a load of the same SOCs that FastLED tries to care about, like many of the ESP32's, Pico, STM32, etc.

I'd hope that FastLED and Wokwi could justify a subscription or partnership for some kind of CI solution if you're into that kind of thing. They have GitHub CI actions https://docs.wokwi.com/wokwi-ci/getting-started and PlatformIO integration https://docs.wokwi.com/vscode/getting-started#platform-io-examples in their paid plans. It's not crazy expensive https://wokwi.com/club?ref=docs_club and for a project like "yours" could help automate a ton of testing across different boards. (Heck, just automating that it builds everywhere would be helpful.)

But I probably wouldn't rathole on that for this specific problem. If a GTEST framework that includes Arduino.h (that's a sentece fragment I never expected to type...) can be built by Platformio and shipped to the board that just runs a load of assertions in a pass/fail manner ( https://google.github.io/googletest/reference/assertions.html) on real hardware, there's pretty real value in that. I'll disclaim that I've not tried to actually use it, but if it's like most PlatformIO features, I'd kind of expect alligators just under the visible water. Unit testing pure functions like most of the libation stuff should be dead easy and worth reaching. Testing race conditions in DMA interrupts and flickering is just a different problem than confirming that a saturating add didn't sign extend in a dumb place.

I'm currently in overcommit, so I can't contribute much code here, but it's an area I've given some thought to and can cheerlead/coach if needed.

This might be a good "call for volunteers" project. The skills needed (PlatformIO unit testing for cross dev if not cmake/esp-idf, though the PlatformIO approach probably gets you further for automating STM32 and others, too.) aren't really FastLED internals or register bashing, so maybe the pool of volunteers is larger...or at least different.

On Mon, Aug 26, 2024 at 11:51 PM Zachary Vorhies @.***> wrote:

Platformio has a framework to run unit tests on the device and/or natively.

I've used native. Anything including "Arduino.h" will fail to compile. So unless you make a fake "Arduino.h", the value is debatable on exactly what you are testing, especially since you have to verify your expectations by capturing the stdout from the device and checking string values.

Hence the reason having a lib of just pure functions / classes not calling into any platform stuff. You get to use fancy expectation stuff in the testing frameworks and possibly get to use gdb to step through the code while it executes.

What would be better is to use qemu to fake an esp32 and read back the printfs. I've looked that qemu for esp32 and there is a github action to run it. The downside is that it wants partition.csv and a lot of other stuff that is abstracted away from platformio when using the arduino core. It could probably work with esp-idf framework for platformio, though.

— Reply to this email directly, view it on GitHub https://github.com/FastLED/FastLED/issues/1633#issuecomment-2311567062, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD3Z6LHW5O3UACFHJ5RLZTQATXAVCNFSM6AAAAABJHOHQ3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRGU3DOMBWGI . You are receiving this because you commented.Message ID: @.***>

zackees commented 2 weeks ago

Testing race conditions in DMA interrupts and flickering is just a different problem than confirming that a saturating add didn't sign extend in a dumb place.

It's doable with a custom github runner that tunnels back to a real machine. I've done it before with android. The great thing is once you break out of the VM and into real hardware you can pretty much do whatever you want. For example, the WS2812 protocol running on the ESP32 could be read back either via another micro or some USB data signal sniffer to verify that no timings are missed.

robertlipe commented 2 weeks ago

Indeed, and I've given that non-trivial thought for a couple of my other projects. Just attaching a $7 logic analyzer to the DOUT pin and capturing and decoding via SIGROK has a lot going for it. I just haven't put fingers on keyboards and molten metal on copper to make it happen.

It's an interesting problem, for sure, but a very different scope (ha!) than just running DCHECKS that qadd8 didn't return 257 and such. I'd be terrified to try to integrate things like the vectorization extensions to modern processors in the existing code just because it's ifdefed all to hell and I couldn't even compile all the paths.

Back to this PR, though. An actual PR to be reviewed and integrated would likely be helpful to the Zach-like maintainers with bonus points given that it's tested in some way - even if that's a handful of compile-time static_assert(something == true) tests (they could be runtime asserts, I suppose, as long as they're somewhere that gets run once on startup and without much overhead) that failed before and passed after would likely clear the runway to land this change.

TBC: Zach's a maintainer and I'm not. I'm just running my mouth, but that's where I'd set the bar for my own projects. I wouldn't block it upon building a framework of cross-dev unit tests and required logic analyzers, but some kind of regression "this used to fail and now it doesn't" runtime test sure seems like a good way to help keep the code moving forward and never backward. It'll also help catch assumptions that work on your ESP|STM|Teensy|Pi that fails on some other combination.

On Tue, Aug 27, 2024 at 1:26 AM Zachary Vorhies @.***> wrote:

Testing race conditions in DMA interrupts and flickering is just a different problem than confirming that a saturating add didn't sign extend in a dumb place.

It's doable with a custom github runner that tunnels back to a real machine. I've done it before with android. The great thing is once you break out of the VM and into real hardware you can pretty much do whatever you want. For example, the WS2812 protocol running on the ESP32 could be read back either via another micro or some USB data signal sniffer to verify that no timings are missed.

— Reply to this email directly, view it on GitHub https://github.com/FastLED/FastLED/issues/1633#issuecomment-2311672989, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD3ZVNEDHZBD2JYHNDSLZTQLY7AVCNFSM6AAAAABJHOHQ3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRGY3TEOJYHE . You are receiving this because you commented.Message ID: @.***>