When manually implementing ops that simdeez does not include, it would be nice to specialize based on the Simd type. For instance, you could check if the Simd input is of a given type, and if so, transmute_copy its underlying type and call certain architecture-specific intrinsics on it.
For instance, gather operations are supported in AVX, but not other instruction sets. I could implement a specialized implementation like so, letting the compiler optimize out the constant branch and copies:
pub fn gather_32<S: Simd>(arr: &[i32], indices: S::Vi32) -> S::Vi32 {
if /* indices is a __m256i */ {
unsafe {
let indices = std::mem::transmute_copy::<_, __m256i>(&indices.underlying_value());
let gathered = _mm256_i32gather_epi32::<1>(arr.as_ptr(), indices);
return S::Vi32::from_underlying_value(std::mem::transmute_copy::<_, <S::Vi32 as SimdConsts>::UnderlyingType>(&gathered));
}
}
// fallback implementation
let width = S::Vi32::WIDTH;
let mut dst = S::Vi32::zeroes();
for i in 0..width {
dst[i] = arr[indices[i] as usize];
}
dst
}
When manually implementing ops that simdeez does not include, it would be nice to specialize based on the
Simd
type. For instance, you could check if theSimd
input is of a given type, and if so,transmute_copy
its underlying type and call certain architecture-specific intrinsics on it.For instance, gather operations are supported in AVX, but not other instruction sets. I could implement a specialized implementation like so, letting the compiler optimize out the constant branch and copies: