Closed LoganDark closed 3 years ago
This doesn't match up with any hardware supported SIMD. It's too small, only 32 bits.
You could use something like u8x16
and process four pixels at a time though.
You don't have f32x16
, which I would need because I do conversions between sRGB and linear RGB.
Both u8x4
and f32x16
are supported by the unstable stdsimd
feature, though.
After comparing results, I found out that my attempt at SIMD was both flawed (wrong results), and slower than the non-SIMD approach.
f32x16
is only available via avx-512 CPUs. Otherwise you'll just get emulated results.
Generally for color manipulations, you need to pick X many pixels you want to handle at once (usually 4 or 8), then "transpose" the channels so that instead of 4 colors or 8 colors, you have one simd value per channel (RGB or RGBA), and it holds that channel for all the pixels (eg: all the red chanenls). Then you can perform the color ops. At the end, you re-transpose the values back into their standard byte form.
There's actually a brief example of this in the tests/
folder, https://github.com/Lokathor/wide/blob/main/tests/t_usefulness.rs
Oh well, that's a shame. My CPU supports AVX2 but not AVX-512...
I think for now I'll stick to non-SIMD stuff, but thank you for the info, I'll make sure to try that if I try SIMD again
Also, here's a cool trick (probably just for non-SIMD stuff though): instead of bit shifting, you can use u32::from_be_bytes([a, r, g, b])
. Opposite for unpacking a u32.
ARGB colors