Lokathor / wide

A crate to help you go wide. By which I mean use SIMD stuff.
https://docs.rs/wide
zlib License
289 stars 24 forks source link

SSE2 support in i32x8/u32x8 #76

Closed RazrFalcon closed 3 years ago

RazrFalcon commented 3 years ago

I'm trying to switch from f32x4 to f32x8 in my project, which is fairly straight-forward thanks to wide's fallback mechanism. But I also use a lot of i32x8/u32x8, which means that on SSE2 I'm stuck with scalar, which is very slow.

Is there a reason why i32x8/u32x8 doesn't support SSE2 fallback? f32x8 also uses two m128, but requires only SSE2 and not SSSE3. I can try implementing SSE2 fallback for i32x8/u32x8 if you're interested.

Lokathor commented 3 years ago

No reason it just wasn't implemented that way. I'm open to a PR which does additional fallback cases if there's sse2 but no avx2