Open regehr opened 7 months ago
LLVM is buggy. The generated assembly is:
src:
and w8, w2, #0x10
strb w1, [x0, w8, uxtw]
ret
It may store a byte OOB where the original code did not.
ah.... thanks Nuno, we missed that. they're not going to be happy about this since fixing it will add instructions, unless the fix is to make OOB InsertElement instructions UB.
I'll leave this issue open for the performance problem, but of course close if you want
we need to improve perf for this:
define void @src(ptr %0, i8 zeroext %1, i32 %2) {
%4 = load <16 x i8>, ptr %0, align 16
%5 = and i32 %2, 15
%6 = insertelement <16 x i8> %4, i8 %1, i32 %5
store <16 x i8> %6, ptr %0, align 16
ret void
}
define void @tgt(ptr nocapture writeonly %0, i8 %1, i32 %2) {
%a2_2 = and i32 %2, 15
%a2_3 = zext nneg i32 %a2_2 to i64
%x3 = getelementptr i8, ptr %0, i64 %a2_3
store i8 %1, ptr %x3, align 1
ret void
}
Essentially the refinement query uses a forall-quantified idx to go through the memories. It should be unrolled in small (all?) cases.
here's an interesting test case that @tanmaytirpankar found:
it exposes two issues. first, if the insertelement index is OOB, the result is a vector full of poison. however, on the ARM side, we get something like:
of course at this level, an OOB store is UB. so we don't have a refinement, and moreover, I don't know that we can easily fix this in the general case, as we can for other target-side UB like divide by zero and shift-past-bitwidth.
but that's not all! if we force the index to be in bounds like this:
then of course we don't have any problems with UB, but we still can't prove refinement. here's a src-tgt pair that doesn't seem too difficult, but it runs for 99 minutes and then I get "ERROR: SMT Error: smt tactic failed to show goal to be sat/unsat memout":