arduano / simdeez

easy simd
MIT License
331 stars 25 forks source link

Best way to create mask #14

Closed AndreaCatania closed 1 year ago

AndreaCatania commented 4 years ago

After few SIMD operations, I need to store the value.

I'm using a mask, created in this way:

const ON: i32 = 1_i32 << 31;
const OFF: i32 = 0;

let used_elements = right_matrix.columns - other_c;
let mut mask_array = [ON;  8];
mask_array.iter_mut().skip(used_elements).for_each(|v| *v = OFF);
let mask = S::load_epi32(&mask_array[0]);

I've searched for a way to create the mask using a SIMD api, but I didn't found anything.

What's the correct way of creating it?

jackmott commented 4 years ago

Is this something you would know how to do using raw simd intrinsics but you don't see a way to do it with simdeez? I'm not totally sure what you need yet, its possible the "blendv" functions will be useful for selecting an answer from the mask, or the various bitwise operations for creating one.

AndreaCatania commented 4 years ago

Basically I need to create a mask, to avoid touching unwanted data.

I don't know how to do it using raw SIMD and the only way that I found is through loading a vector as you can see in the first post.

So the question is: what is it the best way of creating a mask?

rdaum commented 1 year ago

If I get what you're asking, @AndreaCatania, the intrinsics for x86 at least give you: e.g. _mm_movemask_epi8 and mm256_movemask_epi8 (avx2). But 8-bit vectors are not supported in simdeez at this point so I'm not sure this is possible with simdeez.

The movemask intrinsic "returns a mask of the most significant bit of each element in a."

ARM NEON is more complicated. There is no equivalent to movemask, and you have to jump through hoops.

You can see an example of how I've used them SSE2 + bitmaps (and NEON) here: https://github.com/rdaum/rart-rs/blob/c3b12abe42433d8d820ee9f34efe438ac81828a9/rart/src/utils/u8_keys.rs -- raw intrinsics, I haven't been able to use simdeez yet because of missing the 8-bit vector support.

And you can see the NEON version is fairly convoluted, and not really worth it, a linear scan ends up being just as or more fast at least for small vectors.

FWIW AVX512 gives a lot more bitmask operations but those intrinsics are not available outside of nightly rust at this point, and processor support generally is also very spotty.

rdaum commented 1 year ago

Actually looking at the simdeez code, I see get_mask on i8:https://github.com/arduano/simdeez/blob/master/src/ops/i8.rs#L527

And he even includes an implementation for NEON. So probably what you want to do is feasible.

AndreaCatania commented 1 year ago

Thx for the reply and info! Though, it was so long ago that I completely forgotten about this :) I'm going to close this issue. I hope it will be helpful to someone else! 🤙