Open georgemitenkov opened 3 years ago
This is good, we now have x2 less instructions! However, we still observe certain redundancy:
%78 = load double, double* %10, align 8
%.splatinsert4 = insertelement <4 x double> undef, double %78, i32 0
%.splat5 = shufflevector <4 x double> %.splatinsert4, <4 x double> undef, <4 x i32> zeroinitializer
I suspect that LLVM does not pick this up cause of vectorised code, more complicated vector instructions like shufflevector
. If we look at what has generated this code, we see that actually it is loop invariant:
%78 = mech->dt // constant for all loop iterations
By adapting some kind of loop invariant code motion on AST level (or the way the kernel is constructed) we would be able to remove the duplicated code blocks.
%78 = mech->dt // constant for all loop iterations
yeah, this dt
would be trivial to avoid. I will do this in next PR.
This is a placeholder for discussion. I am not 100% up-to-date with what optimisations are done on the AST level, but catching up :)
Consider the following kernel that is vectorised with vector width of 4:
The corresponding LLVM is
Running
opt <llvm_file> -o3 -S -o -
gives