Open whitequark opened 2 weeks ago
I wouldn't have come up with a loop like that, I think. Though, I see your point about how things can get mixed up if you write it that way.
Given the amount of time I have to work on wide
lately (not much), I'd prefer to avoid solutions that are breaking changes in the library.
I'm not totally sure that I understand the proposed loop as well. If there's any elements matching the predicate, you add some to the offset and then break the loop immediately? So the only time you don't break the loop is if "no elements match", so wouldn't the move_mask of a "no elements match" always be a 0? So can't you make it an if-else
with adding a constant 16 per loop that continues, and then add move_mask when things don't continue?
So can't you make it an
if-else
with adding a constant 16 per loop that continues, and then add move_mask when things don't continue?
I can! I'm reporting this as an UI issue, not a correctness issue. Feel free to close if this isn't wanted.
It can stay open, but basically it's sorta "needs design" I guess.
For example, consider this code, extracted from one of my libraries:
The intent here is to move
offset
by the fractional part of the group that matched the predicate. Unfortunately, if the predicate didn't match at all, the moved mask would have 32 zeroes, throwing off the pointer.(I know about
iter.remainder()
--in this particular application it can't be used because by the time the remainder is examined, it is already advanced past the group being matched by the predicate.)Making
i8x16::move_mask()
be anu16
would solve this problem, but won't work fori32x4
. Addingleading_zeros()
andtrailing_zeros()
to the SIMD types, or perhaps boolean mask types from #43, would also solve it.