Open DedeHai opened 1 month ago
@Brandon502 please check if this is better or worse
@DedeHai With the latest commit I am see way more holes compared to your first version. 24x24 has holes and large matrices have quite a few. I struggled finding holes on your first version.
Wavesin pattern looks pretty nice with your method of reducing center writes instead of jumping.
Wavesin pattern looks pretty nice with your method of reducing center writes instead of jumping.
how do you get the full preview? peek is always reduced for me.
how do you get the full preview? peek is always reduced for me.
You need to change the "max preview pixels" in the code: https://github.com/Aircoookie/WLED/blob/7deea9eb75e651b19c04217c5fabde3f544c542b/wled00/ws.cpp#L172
well that was easy. I have a few more ideas I want to test, so this will come in handy. I have it working at small sizes but need to test larger ones. Changes should fix the missing pixel issues hopefully on all resolutions while still reducing pixels drawn.
I spent way too much time checking different options ;) Bottom line is:
So there always will be a trade-off between looks and performance. btw: I wrote a code snippet (thx ChatGPT) to evaluate different algorithms over all sizes, printing out number of pixels drawn, number of zero pixels and number of 'dead loops' (no pixels drawn) for any square size. Current work in progress: optimize for 8x8, 16x16, 32x32 as these are most common and use more generic parameters that work on any other size. If all goes well, there should be up to 20% speed boost. Let me know if you want me to put that test-code here.
If you'd be using Bresenham's algorithm your gaps would've been known in advance. Just a thought.
I don't think Bresenhams algorithm can solve the issue. I briefly looked at it in the beginning and while it is really good at efficiently drawing a line, it does not solve the over-drawing issue, the prevention of which is the actual cause for the gaps: it is easy to just fill the whole matrix, with lines that doe not leave gaps, but that leads to something like a 3.5x overdraw. Current algorithm brings it down to about 2.2x. If my latest Ideas work, it will bring it down to 1.9x.
As we already know, painting a pixel is expensive. Iterating through an array of "painted" pixels is not. Deducting coordinates/angles from unpainted pixels (usually not that many of them) is straightforward. Why paint 1.9x when you can do it only once? The expense would be RAM for an array of "painted" pixels and a few "reverse" calculations of "unpainted" pixels.
if having an array as a "buffer", it would even be easier to pre-calculate a map. This was suggested earlier by @Brandon502 but it eats quite a lot of ram, especially if doing a large matrix, like 32x32 and larger. If ram can be expended, this would by far be the best and fastest solution. If matrix size is restricted to 256x256, a byte array would be sufficient. Edit: thinking about it, a map may also not be so easy to define. maybe need to explain, why it is a bit complicated to not overpaint: the function gets handed a single color value, not knowing anything about the future and the past it needs to convert a single point in a 1D array into a polar coordinate line. so where to start painting is the question that is unsolved. With just a 'stupid' line drawing algorithm for example, the center pixel gets painted many many times (I counted over 80 times at 32x32). Now if you skip painting it, which 1D position actually is priviledged to paint it? and what if that pixel is not set by the FX, will it stay black or does another one 'inherit' ? and how to know all that just by getting passed a color and a 1D position... Edit2: to efficiently translate, it would actually need to be an array containing variable length arrays as for each 1D coordinate, there are a certain number of X/Y coordinates to be painted and that number is not constant. But I am sure, someone already thought of a good solution as mapping to cartesian coordinates is an every-day kind of problem.
Why not just use "mean" value of the surrounding pixels? The visual "error" would be the least. Which is more important than "actual" color rendered at the missing pixel. Getting "mean" value may be expensive (9x call to getPixelColorXY) if no other measures are taken. I am just tossing ideas. I have not tried anything mentioned above.
EDIT: Memory requirements are 4kB for 64x64 matrix if using unpacked bool
values. If using packed bits (at the expense of extracting each bit of information) that's only 512 bytes.
using packed bits to keep track of painted pixels had passed my mind at one point. thanks for the ideas, that may get the best of both worlds: live painting with no over-painting with minimal use of ram. I am just asking myself, how to handle such a buffer. It should be dynamic so heap or stack? and how to prevent fragmentation with many segments using this?
4k might be a lot for stack but 512 bytes is viable as it adds no overhead. If you are talking about 128x128 matrix (largest possible ATM without code change) that's still 2k which is somewhat safe to put on stack. But I would not go beyond 2k.
It should be dynamic so heap or stack?
I think it might become tricky to get this buffer into the stack - the stack is cleared when you leave the function - so it might be needed to already allocate the buffer in strip.service() and find a way to pass down the pointer until we arrive in segment::setPixelColor.
The other problem is you might need "alloca". Because a definition like bytes drawn[seg.width * seg.height / 8]
will go to the heap (dynamically sized array) while bytes drawn[2048]
will go into the stack.
I reworked it with a different idea. Uses Bersenham's algorithm and doesn't use memory. Haven't redid getPixelColor or found the perfect amount of rays per size yet. Can likely optimize quite a bit more since my C++ isn't great.
64x64 using 36 rays. Can see a bit of banding. 64x64 using 72 rays. Seems decent, needs to be testing on physical setup.
My fork is from MM so these numbers are based off of that. But original pinwheel on MM build esp32_4MB_S 64x64 8 pins 512 ws2812b ran at 20 loops/s. 72 rays with new method is at 25 loops/s. Edit: Pixels for comparison is 26 loops/s. Bar is 30 loops/s
Haven't found any holes. Can likely improve this a bit further.
Haven't found any holes. Can likely improve this a bit further.
@Brandon502 The MM version of drawLine has an additional option to skip parts of the line (added for the new GEQ3D effect). Maybe this could be useful to skip inner parts of a line (like the previous odd-rays approach).
Should be easy to port to upstream https://github.com/MoonModules/WLED/blob/98bdbd1eb2e3c6a07a17a7f966cf1ec0cb83804b/wled00/FX_2Dfcn.cpp#L783-L795
Haven't found any holes. Can likely improve this a bit further.
@Brandon502 The MM version of drawLine has an additional option to skip parts of the line (added for the new GEQ3D effect). Maybe this could be useful to skip inner parts of a line (like the previous odd-rays approach).
Should be easy to port to upstream https://github.com/MoonModules/WLED/blob/98bdbd1eb2e3c6a07a17a7f966cf1ec0cb83804b/wled00/FX_2Dfcn.cpp#L783-L795
I took Bresenham's algorithm from drawLine. Right now a single ray does not overwrite any pixels. But each ray slightly overlaps the next. Since rays are much thicker a lot of the effects look much better in my opinion. And getPixelColor is likely much more accurate. Prev Noise4: New Noise4: Wavesins no longer has the moirè pattern though.
BTW & FYI if you'd fork AC and add MM as a remote, you'd have no problem of pulling and pushing into both (provided you have write access) or submitting PR into both without merge issues.
As for the development (and thinking) I am happy that it has turned this way. BTW the banding above is from a "still shot" it may be invisible when moving.
As an optimisation for drawing: if you can determine the length of a ray you can draw shorter rays for those that overlap.
@softhack007 et. al. I will steal this post to urge you to test 0_15 ABL so we can release b6 and soon after RC or 0.15 Then we can start merging these improvements into beta of 0.15.1
I took Bresenham's algorithm from drawLine. Right now a single ray does not overwrite any pixels. But each ray slightly overlaps the next. Since rays are much thicker a lot of the effects look much better in my opinion. And getPixelColor is likely much more accurate.
Looks really good and may likelly to be the better approach - got any code to share? I would like to compare it my current state
I took Bresenham's algorithm from drawLine. Right now a single ray does not overwrite any pixels. But each ray slightly overlaps the next. Since rays are much thicker a lot of the effects look much better in my opinion. And getPixelColor is likely much more accurate.
Looks really good and may likelly to be the better approach - got any code to share? I would like to compare it my current state
I pushed it on this branch. https://github.com/Brandon502/WLED/commit/761e2d7fc98681a54d0d0829b9e055d20c7e9295 still really rough, getPixelColor isn't working correctly on all sizes.
@Brandon502 I tested your 'rough' code and after a few fixes it runs (I had to increase listLen
).
There is some room for optimizations but all in all it works pretty well!
What I like about it is the "filling between lines" which really lets you dial in the number of angular steps without adding any holes. Also it avoids the "odd ray skipping" that is required in my updated version to gain speed. While your raw version is a bit slower than my optimized one, it looks much better on some effects (as your screenshots show). I think the Bresenham algorithmi with gap filling is the way to go. It makes tweaking much easier: my verison has 4 or 5 parameters to dial in.
Also it allows very easy optimization for ESP8266 by just lowering the angle steps for example.
I will analyze your code in more detail and integrate it into my updated code and then add some optimizations and push to this PR for testing.
Edit:
Theser are the numbers I get for "overdraw":
Pixels drawn with 32x32 matrix (1024 pixels):
Vanilla 0.15: 2344
My version (Optimized for 32x32 so best case): 1804
Bresenham, 60 rays: 2054
I am sorry I never bothered with Pinwheel expansion but I see floating point math now that I took a peek (at @Brandon502 implementation).
When drawing lines, you always know the starting and ending pixels (as you are drawing from center pixel to the edge pixel). So the maximum number of lines is known in advance (2xW + 2xH). It is true that this may produce uneven angle between them but IMO that is a negligible error that may not greatly affect the output. Another optimisation is, that, since all lines start at the same point in center, every other line can be half the length (as does the current pinwheel expansion in 0_15). You could further reduce the length on very large matrices IMO.
Just a thought.
you are right about all these points. the floating points will not influence the speed (and also: I will replace most of them with fixed point). The "reducing every second line" is something that leads to ugly artefacts if effects draw every second pixel, as many do. Not reducing the rays is one of the advantages. I will look into tracking drawn pixels as previously discussed but I am still looking for a highly efficient way to map the x/y coordinates to a bit field: if those calculateions take more than a few CPU cycles it may not speed up the process at all. i.e. it must be way faster than doing a setPixelColor() call. So doing something like: `#define BIT_INDEX(x, y, width) ((y) * (width) + (x)) / 32
(not really good code, just as a quick example of what I mean)` may already be too many computations... I would like something that is quicker for line drawing, where you can start at a base value (x0/y0) calculate the index and the mask and then just do additions and mask-shifts as x and y progress (not sure it is feasible).
@DedeHai Do you have your newest version uploaded anywhere? Would like to see the code and test the look/performance. Weird that lineLen needed changing, what size broke it? I originally had extra space, but it never was used so got rid of it and it seemed fine.
Last night I added center jumping and slightly change the angle of consecutive rays which gives a decent boost, but made gPC much worse. https://github.com/Brandon502/WLED/commit/2c7fe518f88eabd60adc34f696a93cc248d0d97f. @blazoncek the jump wasn't able to be half length but somewhere between 1/3 - 1/4, maybe that can be improved. Here's draw counts for 64x64. 4096 pixels. 72 rays. I can get around 28 loops/s now. | Description | count |
---|---|---|
Current MM pinwheel | 10948 | |
New pinwheel | 6417 | |
With jump | 5949 | |
Reduce overlap with jump | 5252 |
I tested a very small amount of rays just to see speed. Turns out it is much worse which I don't understand. The draw count shot up for some reason when I expected way fewer writes.
So gPC I think is impossible to get perfect on rectangle setups. The short side is so squished that odd and even rays constantly overwrite which breaks the effects that rely on it like DJ Light/Freqmatrix. The current pinwheel is also not working with these on certain rectangles. @softhack007 MM block and circle also seem broken. Would it be possible to just store all 72 color values somewhere and have gPC retrieve there? <300 bytes to store, but each segment would require its' own. Ideally it would only need space if gPC is being called on an effect.
Quickly tested the writes for 32x32. 60 rays - 1751 writes 45 rays - 1561 writes - 99 loops/s using 256 leds/pin
Do you have your newest version uploaded anywhere?
Doh, I commited it to this PR but forgot to push. Will do it within the hour.
Weird that lineLen needed changing, what size broke it?
I am testing on S3, 32x32 and it took me quite a while to figure out, what the problem was. increasing the size seemed to fix it, I did not yet check if it was something else I might have changed at the same time.
the jump wasn't able to be half length but somewhere between 1/3 - 1/4
I tested adaptive jumping, depending on size and position, something like jump = i % maxXY/8
which removes some overdraw but not as much as a 1/3 every other ray.
I tested a very small amount of rays just to see speed. Turns out it is much worse which I don't understand. The draw count shot up for some reason when I expected way fewer writes.
I think that is still a bug in your code somewhere, initially saw that too, it dropped to 7FPS
The current pinwheel is also not working with these on certain rectangles. @softhack007 MM block and circle also seem broken.
@Brandon502 yes MM block and MM circle are somewhat special, because they support "virtual strip" mode so it can get complicated to find a good reference pixel for gPC. I'll need to discuss with @ewoudwijma who created the features initially. Thanks for the hint.
Would it be possible to just store all 72 color values somewhere and have gPC retrieve there?
Yes I think that would be the right approach - an extra buffer that's aligned with the pinwheel "length" will remove all these problems with gPC. We could also reduce program size a bit because we won't need to reproduce the drawing algo in gPC any more.
3 * length
bytes does not seem too much on esp32, not even with the HUB75 driver that's extremely memory hungry. I can try something this weekend - should be enough to add an extra CRGB* to the segment class, and allocate the buffer at the first sPC with pinwheel mapping. Segment copy/move constructors might be a challenge.
Not sure about 8266 - but 8266 is not a good idea for large leds count any way (24x24 = 576 pixels), so buffering should be possible there, too.
I have working code using a 1bit map to check if a pixel has been drawn. it is fast!. I have it working on a single line draw only and it does not look too good unfortunately. But I almost get full FPS (compared to other mappings) even when drawing hundreds of lines. Need to adapt it back to the two-line-fill-between approach, I hope that is possible without too much modification. If I get it working, I think this will be the way to go.
I have working code using a 1bit map to check if a pixel has been drawn. it is fast!. I have it working on a single line draw only and it does not look too good unfortunately. But I almost get full FPS (compared to other mappings) even when drawing hundreds of lines. Need to adapt it back to the two-line-fill-between approach, I hope that is possible without too much modification. If I get it working, I think this will be the way to go.
I quickly added a bit field into mine and just used the get and set bit functions I used in Game of Life in MM, so it probably isn't the fastest method. But now it draws exactly 4096 on 64x64 with no loops/s improvement Edit: maybe +1 loop/s.
Regarding getPixelColor()
: You only need to sample edges of the matrix as those represent end of each ray and a ray is drawn with the same color.
As mentioned above that's only 2H + 2W pixels when using approach I proposed.
As mentioned above that's only 2H + 2W pixels when using approach I proposed
Hi @blazoncek, its true that this the maximum resolution we'd get from gPC without extra buffering. And it's basically what the current code implements (but based on angles).
However, there is a "twist" here (literally) - if you take the edges of the panel as "length" and you draw a line from the center to each edge point, that's not a "pinwheel" anymore. The reason is that angles are not distributed uniformly with this method - angles will be squeezed at the corners, while being wider apart at the center cross. Unfortunately, the human eye is very un-forgiving about such inaccuracies. It means that some overdrawing is inevitable to achieve evenly distributed angles AND not miss any pixels in the corners.
we've tried a few ways without buffer, and they all failed with "scrolling" 1D effects (i.e. most audioreactive 1D effects):
so we ended up repeating the "ray casting" algo in gPC, but this is still not working for all resolutions.
That's why my conclusion was - let's spend a buch of bytes that will ensure gPC will always return the pixel color that was assigned to exactly that ray (=angle) with sPC.
Then we can concentrate on optimizing the sPC part, because pinwheel gPC does not depend on the drawing method any more.
I think I got it now, may need some fin-tuning. Only tested on S3. No more floats (sin16/cos16 can later be replaced by my new version) Optimized for speed as much as possible. Putting the bit mapping to fix overdrawing on the stack leads to crashes on larger sizes, maybe someone knows why (write access fault) on heap it works.
@DedeHai I may have fixed overdraw issue and the bit array may not be needed. 64x64 I had about 100 overdraws. Rectangles will likely have more than squares but need to test it more. Will also use your optimized calculations when I'm able to test later today. If I'm lucky gPC may work correctly but I'm not hopeful.
If I'm lucky gPC may work correctly but I'm not hopeful.
Hi @Brandon502, in case you can't get gPC to work in all cases, I'd say don't spend too much time with it. To me it still looks like adding a dedicated pixel buffer for pinwheel (and maybe other mapping modes) is the way to go. We can address that topic in a new PR.
I merged @DedeHai's optimizations in my branch and tweaked a few things. Removed bitmap it should no longer be needed. Added redraw avoidance which reduces redrawing a ton. Only 32 extra draws on 64x64 rainbow. This varies slightly on effects and panel shape. Here's latest loop/s on esp32. Not the biggest improvement from a few days ago, but no more floats or overdrawing may help other chips.
Size | MM | Latest |
---|---|---|
64x64 | 20 | 28 |
32x32 | 83 | 113 |
I changed dynamic ray numbers to static for testing on an actual panel. Latest changes here: https://github.com/Brandon502/WLED/tree/pinwheel2 @softhack007 this should be good to test on your setup when you have time. Hopefully no holes appear due to rounding differences. If 72 rays is too few for 64x64 you can increase to 90 or swap to Dedhai's dynamic function which I left in.
@Brandon502 the calculation improvements I made (and the avoiding floats) will mostly affect ESP32C3 and ESP8266 as they do floats in software. In your latest approach: couldchanging the ray width (or delta angle between the lines) from currently one full step to something a little smaller achieve something similar? What I do not like about the approach tracking rays left and right: it only works, if the rays are drawn in consecutive order (which a lot but not all FX do). FX "Two Dots" for example jumps back and forth when drawing. Would it work, if you always skip drawing pixels of the first line, just drawing the second + infill? On the other hand: the bit-mapping has the disadvantage, that each ray can only be drawn once. if there is FX (not sure there are any) that rely on overdrawing, they will not work. Did you compare the looks of my latest version and your approach? Does your latest version also work well on smaller sizes and odd numbered width/height? What I like about the variant using a bit-map: it is very insensitive to changing the number of rays and in general "always works". If ray width is increased from what I currently have, I think also the gPC should work. In my gPC test, the ray width was too small, the edge colors would sometimes overlap. The drawback of having to add a buffer is mostly to heap fragmentation, not sure how problematic that is (a small buffer is created before a segment is drawn, deleted afterwards).
The drawback of having to add a buffer is mostly to heap fragmentation, not sure how problematic that is (a small buffer is created before a segment is drawn, deleted afterwards).
Well heap fragmentation is generally problematic, but it really depends on the size of blocks and on the frequency of malloc/free. We have several features that contribute to fragmentation. My esp32 often becomes unstable (random crash) when max used head goes above 80%. So I'd recommend if you can do something against fragmentation (like holding a block as long as possible, or creating memory pools like effect data) then it's worth a try.
In your latest approach: couldchanging the ray width (or delta angle between the lines) from currently one full step to something a little smaller achieve something similar?
I mainly tested on 64x64 but overdraw seems pretty dependent on how many rays there are. 72 seems perfect, hopefully it doesn't have banding on actual display. But once you put 100+ rays overdraw is minimized. I tried 720 as an extreme test and it didn't overdraw much.
What I do not like about the approach tracking rays left and right: it only works, if the rays are drawn in consecutive order (which a lot but not all FX do). FX "Two Dots" for example jumps back and forth when drawing. Would it work, if you always skip drawing pixels of the first line, just drawing the second + infill?
I tried second + infill and it was worse. I think two dots doesn't necessarily over draw since the rays are apart there is no overlapping. They just might be slightly wider if there are no surrounding rays if you consider that overdraw. I tested on a lot of effects and haven't see any crazy draw numbers, although bitmap would obviously always be perfect.
Did you compare the looks of my latest version and your approach? Does your latest version also work well on smaller sizes and odd numbered width/height? What I like about the variant using a bit-map: it is very insensitive to changing the number of rays and in general "always works".
I compared a few effects between our different versions and I can't really say one looks better or worse. They're probably not identical but they both look good to me. Every size seems to work on mine on esp32, can't speak for others chips.
If ray width is increased from what I currently have, I think also the gPC should work. In my gPC test, the ray width was too small, the edge colors would sometimes overlap.
When I tried your version DJ Lights doesn't work on 32x32 and 64x64 only sizes I tried. I lowered your step factor to 1 which works on square matrices, but rectangles easily breaks it. I don't think gPC is ever going to work on rectangles unless you have giant bands.
Here is probably the best way to visualize the differences. Main differences is with bitmap there are thicker lines on axis and the center isn't symmetrical. 72 rays no bitmap: 64*1.6 102 rays? with bitmap: 72 rays with bitmap:
Edit: Found the best ray options for no bitmap method and decreased over drawing further. On square matrices only one extra draw for effects that draw 0-n or reverse.
On the other hand: the bit-mapping has the disadvantage, that each ray can only be drawn once. if there is FX (not sure there are any) that rely on overdrawing, they will not work.
Went through almost all the effects and there were quite a few that did draw multiple times in the same frame which breaks the bitmap approach. Also did weird things on my version before fixing.
@blazoncek Not sure if this is a known bug, but effects like scanner and lighthouse don't always hit the edge of the strip. This affects all expand options but is really noticeable on pinwheel since the edge in going across the center.
@Brandon502 did you note which FX draw multiple times?
Some effects are weird! Android is one of them. 😄 I did work on Scanner but I do not remember Lighthouse. If they do not hit the edge always then that's a flaw, otherwise it is a feature. IMO not all effects will look good in every expansion.
Some effects are weird! Android is one of them.
This was one. Effects that draw the background as a palette then draw over that, Chase and Sparkle I believe, there were a few others I'm forgetting.
I did work on Scanner but I do not remember Lighthouse. If they do not hit the edge always then that's a flaw, otherwise it is a feature.
Pretty sure it is a flaw. Some loops are completely fine, then one will skip the last/first pixel.
Tested on a 128x64 HUB75 display and can confirm no holes and 30+ fps on rainbow. Low ray count looks similar to high ray count on gradient type effects. The main down side of lower ray count is movement is not quite as smooth, but the upside is effects like bouncing balls look much better.
https://github.com/user-attachments/assets/f17579b5-fee5-4e95-9286-ce0183a63f6b
Left super high ray count (720) Right lower ray count (72)
@Brandon502 can a ray count be user selectable?
@Brandon502 can a ray count be user selectable?
I don't know how to add to the UI, but that would definitely work. Just need to have a minimum of 4. All values above that work, multiples of 8 are the most efficient in terms of drawing.
@Brandon502 looks to me like your latest approach is the best compromise between overdraw and FX compatibility. Having overdraw for some FX is acceptable as its still better than the current 0.15 variant. Is your repo up to date? How can we make sure there are no holes on any selected ray number on any size? Users also use things like 8x256 or 7x33 for example. Manually checking all combos is not feasible.
@Brandon502 looks to me like your latest approach is the best compromise between overdraw and FX compatibility. Having overdraw for some FX is acceptable as its still better than the current 0.15 variant. Is your repo up to date? How can we make sure there are no holes on any selected ray number on any size? Users also use things like 8x256 or 7x33 for example. Manually checking all combos is not feasible.
Just pushed one last change. It seems your setPinwheelParameters function doesn't always distribute angles between each step to completely wrap around the matrix in certain cases. Likely due to rounding errors. This didn't pop up until I made ray count dynamic to vW/vH, capped at 90 there was never a gap.
Here's an example of 128x64 with 136 rays.
The final ray doesn't quite wrap back around to 0 leaving certain pixels off. This can get better/worse with different ray counts. I made a change to ensure the final ray always wraps back to 0, you may know a better way to make sure all angles are used. This method can create a noticeable wider band at the end if using a super high ray count (user adjustable if implemented). With that change I'm not seeing any holes on weird matrix sizes like 127x57 / 128x5.
Edit: Found a better fix. Just needed to round base angle and everything is working better.
Thx @Brandon502, this looks really good. I did notice that gap in one of my versions and solved it by simply increasing the angle a little, your fix is a better approach. @softhack007 the current version from Brandon would benefit from a pre-draw color array you proposed for gPC as drawing could always be done starting at 0 and moving up, many checks could be simplified or removed. So I would like to tackle that as well, then do a squash and have it all in a new PR. To create the color array, the same approach I used for the bit-map could be used (create before calling FX functions, delete after), preferably on stack but I have seen crashes, so we need to limit the maximum number of rays. I don't know at what buffer-size it will run into trouble, especially ESP8266 may be problematic. A buffer of 128 colors would be 512 bytes which may still work. Any suggestions on how to do it? Can free stack-size be checked? Putting it in heap works for sure but there is the fragmentation problem.