Lokathor / wide

A crate to help you go wide. By which I mean use SIMD stuff.
https://docs.rs/wide
zlib License
279 stars 23 forks source link

feature request: u8x4 #81

Closed LoganDark closed 3 years ago

LoganDark commented 3 years ago

ARGB colors

Lokathor commented 3 years ago

This doesn't match up with any hardware supported SIMD. It's too small, only 32 bits.

You could use something like u8x16 and process four pixels at a time though.

LoganDark commented 3 years ago

You don't have f32x16, which I would need because I do conversions between sRGB and linear RGB.

Both u8x4 and f32x16 are supported by the unstable stdsimd feature, though.

After comparing results, I found out that my attempt at SIMD was both flawed (wrong results), and slower than the non-SIMD approach.

Lokathor commented 3 years ago

f32x16 is only available via avx-512 CPUs. Otherwise you'll just get emulated results.

Generally for color manipulations, you need to pick X many pixels you want to handle at once (usually 4 or 8), then "transpose" the channels so that instead of 4 colors or 8 colors, you have one simd value per channel (RGB or RGBA), and it holds that channel for all the pixels (eg: all the red chanenls). Then you can perform the color ops. At the end, you re-transpose the values back into their standard byte form.

There's actually a brief example of this in the tests/ folder, https://github.com/Lokathor/wide/blob/main/tests/t_usefulness.rs

LoganDark commented 3 years ago

Oh well, that's a shame. My CPU supports AVX2 but not AVX-512...

I think for now I'll stick to non-SIMD stuff, but thank you for the info, I'll make sure to try that if I try SIMD again

LoganDark commented 3 years ago

Also, here's a cool trick (probably just for non-SIMD stuff though): instead of bit shifting, you can use u32::from_be_bytes([a, r, g, b]). Opposite for unpacking a u32.