Open Quuxplusone opened 12 years ago
Attached bad.ll
(3592 bytes, application/octet-stream): examples
I see two possible optimizations:
The code below is NO-OP. A simple InstCombine optimization should eliminate it.
%iter_val42 = add <8 x i32> %smear_counter41, <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
Second, reducing vectors to scalars requires a more sophisticated optimization, especially in cases where the vector goes into a PHI node. I also encountered a similar problem in user code.
I ended up writing some code to do the transformation to scalar code in ispc; the relevant stuff is here: https://github.com/ispc/ispc/blob/master/llvmutil.cpp#L1392-1508. (BSD license, so feel free to grab anything useful.)
(In reply to Nadav Rotem from comment #1)
> I see two possible optimizations:
>
> The code below is NO-OP. A simple InstCombine optimization should eliminate
> it.
>
> %iter_val42 = add <8 x i32> %smear_counter41, <i32 0, i32 undef, i32 undef,
> i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
6+ years later...
This (and many similar other) splat-with-undef patterns are folded in IR. In
this case, instsimplify can kill the add.
> Second, reducing vectors to scalars requires a more sophisticated
> optimization, especially in cases where the vector goes into a PHI node. I
> also encountered a similar problem in user code.
This still doesn't happen. We have several scalarization folds for
extractelement in instcombine, but not one that would hoist the extract ahead
of a phi.
Current codegen: https://godbolt.org/z/coNFgE
bad.ll
(3592 bytes, application/octet-stream)