Lokathor / wide

A crate to help you go wide. By which I mean use SIMD stuff.
https://docs.rs/wide
zlib License
251 stars 22 forks source link

AVX version of `u32x8::cmp_lt` refers to cmp_eq_mask_i32_m256i #120

Closed remyoudompheng closed 1 year ago

remyoudompheng commented 1 year ago

This looks incorrect and makes it return different results from the SSE2 implementation.

  pub fn cmp_lt(self, rhs: Self) -> Self {
    pick! {
      if #[cfg(target_feature="avx2")] {
        Self { avx2: cmp_eq_mask_i32_m256i(self.avx2, rhs.avx2 ) }
      } else if #[cfg(target_feature="sse2")] {
        Self { sse0: cmp_lt_mask_i32_m128i(self.sse0,rhs.sse0), sse1: cmp_lt_mask_i32_m128i(self.sse1,rhs.sse1), }

https://github.com/Lokathor/wide/blob/v0.7.5/src/u32x8_.rs#L350

Lokathor commented 1 year ago

Ah, yeah that's definitely incorrect. I'll see if I can fix it soon, or I could accept a PR for it.

Lokathor commented 1 year ago

It looks like there aren't SIMD less-than ops for u32 in safe arch, which means probably x86_64 doesn't have them at all? We'd have to always use a for loop I guess.