I'm trying to switch from f32x4 to f32x8 in my project, which is fairly straight-forward thanks to wide's fallback mechanism. But I also use a lot of i32x8/u32x8, which means that on SSE2 I'm stuck with scalar, which is very slow.
Is there a reason why i32x8/u32x8 doesn't support SSE2 fallback? f32x8 also uses two m128, but requires only SSE2 and not SSSE3.
I can try implementing SSE2 fallback for i32x8/u32x8 if you're interested.
I'm trying to switch from f32x4 to f32x8 in my project, which is fairly straight-forward thanks to wide's fallback mechanism. But I also use a lot of i32x8/u32x8, which means that on SSE2 I'm stuck with scalar, which is very slow.
Is there a reason why i32x8/u32x8 doesn't support SSE2 fallback?
f32x8
also uses twom128
, but requires only SSE2 and not SSSE3. I can try implementing SSE2 fallback for i32x8/u32x8 if you're interested.