Lokathor / wide

A crate to help you go wide. By which I mean use SIMD stuff.
https://docs.rs/wide
zlib License
251 stars 22 forks source link

Combination of `i8x16::move_mask()` and `.trailing_zeros()` is a footgun #165

Open whitequark opened 2 weeks ago

whitequark commented 2 weeks ago

For example, consider this code, extracted from one of my libraries:

for &group in samples.array_chunks::<16>() {
    let mask = predicate(i8x16::new(group));
    offset += mask.move_mask().trailing_zeros() as usize;
    if mask.any() { break }
}
offset

The intent here is to move offset by the fractional part of the group that matched the predicate. Unfortunately, if the predicate didn't match at all, the moved mask would have 32 zeroes, throwing off the pointer.

(I know about iter.remainder()--in this particular application it can't be used because by the time the remainder is examined, it is already advanced past the group being matched by the predicate.)

Making i8x16::move_mask() be an u16 would solve this problem, but won't work for i32x4. Adding leading_zeros() and trailing_zeros() to the SIMD types, or perhaps boolean mask types from #43, would also solve it.

Lokathor commented 2 weeks ago

I wouldn't have come up with a loop like that, I think. Though, I see your point about how things can get mixed up if you write it that way.

Given the amount of time I have to work on wide lately (not much), I'd prefer to avoid solutions that are breaking changes in the library.

I'm not totally sure that I understand the proposed loop as well. If there's any elements matching the predicate, you add some to the offset and then break the loop immediately? So the only time you don't break the loop is if "no elements match", so wouldn't the move_mask of a "no elements match" always be a 0? So can't you make it an if-else with adding a constant 16 per loop that continues, and then add move_mask when things don't continue?

whitequark commented 2 weeks ago

So can't you make it an if-else with adding a constant 16 per loop that continues, and then add move_mask when things don't continue?

I can! I'm reporting this as an UI issue, not a correctness issue. Feel free to close if this isn't wanted.

Lokathor commented 1 week ago

It can stay open, but basically it's sorta "needs design" I guess.