Open Quuxplusone opened 4 years ago
Attached newtest.ll
(652 bytes, text/plain): IR in question
The PSRAD with immediate instruction only supports 8-bits worth of shift so we masked 268435456 to 8-bits and got 0. The same issue exists for shift left and shift right for the MMX builtin. We do better with SSE shifts.
I'm inclined to fix this by just clamping the shift amount to 31 for all three shift amounts. We could do better for left/right shifts and the result to 0 explicitly, but I'm not sure we care to optimize MMX that well.
(In reply to Craig Topper from comment #1)
> The PSRAD with immediate instruction only supports 8-bits worth of shift so
> we masked 268435456 to 8-bits and got 0. The same issue exists for shift
> left and shift right for the MMX builtin. We do better with SSE shifts.
>
> I'm inclined to fix this by just clamping the shift amount to 31 for all
> three shift amounts. We could do better for left/right shifts and the result
> to 0 explicitly, but I'm not sure we care to optimize MMX that well.
How tricky would it be to match what we do for SSE for out of bounds values?
Not that tricky. We need to clamp for psrai. And for the other two emit an i32 zero and an x86isd node to move i32 to mmx. That will pattern match to the pxor zero idiom I think.
I've committed the simple clamp to 255 for all 8 affected intrinsics in 641d2e5232b423a7dd81afac94dd3db4412a4971. Using 255 avoids needing to decode the element size from the intrinsic.
Simon, do you think I should optimize it to produce 0s when possible? Or should we start working on MMX->SSE
(In reply to Craig Topper from comment #4)
> I've committed the simple clamp to 255 for all 8 affected intrinsics in
> 641d2e5232b423a7dd81afac94dd3db4412a4971. Using 255 avoids needing to decode
> the element size from the intrinsic.
>
> Simon, do you think I should optimize it to produce 0s when possible? Or
> should we start working on MMX->SSE
Returning 0s for out of range logical shifts should be a relatively small
change, making a dent in [Bug #42320] is likely to be a much bigger issue.
newtest.ll
(652 bytes, text/plain)