Open Quuxplusone opened 3 years ago
ping!
Could you perhaps disable these on SystemZ? That way it would allow you to set up the bot (and avoid further regressions).
Yes I'd suggest the same, let's mark this test as XFAIL on system-Z and add the bot, so that we avoid further regressions/failures on this platform.
@Jonas: Did you look into setting it up such? A current change (https://reviews.llvm.org/D107896) could effect it (created for different reason and not focused on this) but without build bot I don't think we'd notice a difference.
Sorry for the delay on this... we will try to set up the build-bot soon, while
also trying to understand this test failure.
I found that with this function (taking from the failing test):
func @main() {
// CHECK: inf
%inf = constant 0x7f800000 : f32
%exp_inf = math.exp %inf : f32
vector.print %exp_inf : f32
return
}
I get '0' printed, but if I disable the inliner I get 'inf' as expected:
mlir-opt tc.mlir -test-math-polynomial-approximation -convert-vector-to-llvm -
convert-std-to-llvm | bin/mlir-cpu-runner -e main -entry-point-result=void -O0 -
shared-libs=lib/libmlir_c_runner_utils.so -shared-
libs=lib/libmlir_runner_utils.\
so
0
bin/mlir-opt tc.mlir -test-math-polynomial-approximation -convert-vector-to-
llvm -convert-std-to-llvm | bin/mlir-cpu-runner -e main -entry-point-
result=void -O0 -shared-libs=lib/libmlir_c_runner_utils.so -shared-
libs=lib/libmlir_runner_ut\
ils.so -inline-threshold=0
inf
This is what I see the inliner doing:
define void @main() !dbg !3 {
%1 = call float @llvm.floor.f32(float 0x7FF0000000000000), !dbg !7
%2 = fmul float %1, 0x3FE62E4300000000, !dbg !9
%3 = fsub float 0x7FF0000000000000, %2, !dbg !10
%4 = fmul float %3, %3, !dbg !11
%5 = fmul float %4, %4, !dbg !12
%6 = call float @llvm.fma.f32(float 1.000000e+00, float %3, float 1.000000e+00), !dbg !13
%7 = call float @llvm.fma.f32(float 0x3FC5993C80000000, float %3, float 0x3FDFFB2B40000000), !dbg !14
%8 = call float @llvm.fma.f32(float 0x3F8AEAFAC0000000, float %3, float 0x3FA2C8FC60000000), !dbg !15
%9 = call float @llvm.fma.f32(float %7, float %4, float %6), !dbg !16
%10 = call float @llvm.fma.f32(float %8, float %5, float %9), !dbg !17
%11 = fptosi float %1 to i32, !dbg !18
%12 = add i32 %11, 127, !dbg !19
%13 = shl i32 %12, 23, !dbg !20
%14 = bitcast i32 %13 to float, !dbg !21
%15 = fmul float %10, %14, !dbg !22
%16 = icmp sle i32 %11, 127, !dbg !23
%17 = icmp sge i32 %11, -127, !dbg !24
%18 = and i1 %16, %17, !dbg !25
%19 = select i1 %18, float %15, float 0x7FF0000000000000, !dbg !26
call void @printF32(float %19), !dbg !27
call void @printNewline(), !dbg !28
ret void, !dbg !29
}
define void @_mlir_main(i8** %0) {
call void @main()
ret void
}
=>
*** IR Dump After Function Integration/Inlining (inline) ***
Printing <null> Function
*** IR Dump After Function Integration/Inlining (inline) ***define void @main()
!dbg !3 {
%1 = call float @llvm.floor.f32(float 0x7FF0000000000000), !dbg !7
%2 = fmul float %1, 0x3FE62E4300000000, !dbg !9
%3 = fsub float 0x7FF0000000000000, %2, !dbg !10
%4 = fmul float %3, %3, !dbg !11
%5 = fmul float %4, %4, !dbg !12
%6 = call float @llvm.fma.f32(float 1.000000e+00, float %3, float 1.000000e+00), !dbg !13
%7 = call float @llvm.fma.f32(float 0x3FC5993C80000000, float %3, float 0x3FDFFB2B40000000), !dbg !14
%8 = call float @llvm.fma.f32(float 0x3F8AEAFAC0000000, float %3, float 0x3FA2C8FC60000000), !dbg !15
%9 = call float @llvm.fma.f32(float %7, float %4, float %6), !dbg !16
%10 = call float @llvm.fma.f32(float %8, float %5, float %9), !dbg !17
%11 = fptosi float %1 to i32, !dbg !18
%12 = add i32 %11, 127, !dbg !19
%13 = shl i32 %12, 23, !dbg !20
%14 = bitcast i32 %13 to float, !dbg !21
%15 = fmul float %10, %14, !dbg !22
%16 = icmp sle i32 %11, 127, !dbg !23
%17 = icmp sge i32 %11, -127, !dbg !24
%18 = and i1 %16, %17, !dbg !25
%19 = select i1 %18, float %15, float 0x7FF0000000000000, !dbg !26
call void @printF32(float %19), !dbg !27
call void @printNewline(), !dbg !28
ret void, !dbg !29
}
*** IR Dump After Function Integration/Inlining (inline) ***define void
@_mlir_main(i8** %0) {
call void @printF32(float poison), !dbg !3
call void @printNewline(), !dbg !9
ret void
}
So it seems that during inlining, the computations in main is for some reason
replaced with a poison value. So the '0' is really just poison, not '0'
necessarily, it seems.
Does anyone know what could be the problem here?
Can you post the entire .ll
file and the opt
command to reproduce?
(the .ll
file will include the SystemZ target triple / data layout, which I won't get if I run cmake --build . --target check-mlir
on my system).
(In reply to Mehdi Amini from comment #6)
Can you post the entire
.ll
file and theopt
command to reproduce? (the.ll
file will include the SystemZ target triple / data layout, which I won't get if I runcmake --build . --target check-mlir
on my system).
Not sure how to dump the .ll file with mlir-cpu-runner...?
The output from the first command above (mlir-opt) was:
builtin.module attributes {llvm.data_layout = ""} { llvm.func @printNewline() llvm.func @printF32(f32) llvm.func @main() { %0 = llvm.mlir.constant(0.693147182 : f32) : f32 %1 = llvm.mlir.constant(1.000000e+00 : f32) : f32 %2 = llvm.mlir.constant(0.499705136 : f32) : f32 %3 = llvm.mlir.constant(0.168738902 : f32) : f32 %4 = llvm.mlir.constant(0.0366896503 : f32) : f32 %5 = llvm.mlir.constant(1.314350e-02 : f32) : f32 %6 = llvm.mlir.constant(23 : i32) : i32 %7 = llvm.mlir.constant(0x7F800000 : f32) : f32 %8 = llvm.mlir.constant(127 : i32) : i32 %9 = llvm.mlir.constant(-127 : i32) : i32 %10 = "llvm.intr.floor"(%7) : (f32) -> f32 %11 = llvm.fmul %10, %0 : f32 %12 = llvm.fsub %7, %11 : f32 %13 = llvm.fmul %12, %12 : f32 %14 = llvm.fmul %13, %13 : f32 %15 = "llvm.intr.fma"(%1, %12, %1) : (f32, f32, f32) -> f32 %16 = "llvm.intr.fma"(%3, %12, %2) : (f32, f32, f32) -> f32 %17 = "llvm.intr.fma"(%5, %12, %4) : (f32, f32, f32) -> f32 %18 = "llvm.intr.fma"(%16, %13, %15) : (f32, f32, f32) -> f32 %19 = "llvm.intr.fma"(%17, %14, %18) : (f32, f32, f32) -> f32 %20 = llvm.fptosi %10 : f32 to i32 %21 = llvm.add %20, %8 : i32 %22 = llvm.shl %21, %6 : i32 %23 = llvm.bitcast %22 : i32 to f32 %24 = llvm.fmul %19, %23 : f32 %25 = llvm.icmp "sle" %20, %8 : i32 %26 = llvm.icmp "sge" %20, %9 : i32 %27 = llvm.and %25, %26 : i1 %28 = llvm.select %27, %24, %7 : i1, f32 llvm.call @printF32(%28) : (f32) -> () llvm.call @printNewline() : () -> () llvm.return } }
Does that help you?
Attached tc_preinl.ll
(3648 bytes, text/plain): reduced test case for inliner
It seems that when I run this test case on SystemZ with opt and inliner the
result is always inlined to poison even when passing x86/powerPC target triples.
In gdb I see InstCombiner (from CloneFunction.cpp:394) simplifying the
instructions like:
%1 = call float @llvm.floor.f32(float 0x7FF0000000000000)
=> 0x7FF0000000000000 +Inf
%2 = fmul float %1, 0x3FE62E4300000000
=> 0x7FF0000000000000 +Inf
%3 = fsub float 0x7FF0000000000000, %2
=> 0x7FF8000000000000 NaN
%4 = fmul float %3, %3
=> 0x7FF8000000000000 NaN
%5 = fmul float %4, %4
=> 0x7FF8000000000000 NaN
%6 = call float @llvm.fma.f32()
=> 0x7FF8000000000000 NaN
%7 = call float @llvm.fma.f32()
=> 0x7FF8000000000000 NaN
%8 = call float @llvm.fma.f32()
=> 0x7FF8000000000000 NaN
%9 = call float @llvm.fma.f32()
=> 0x7FF8000000000000 NaN
%10 = call float @llvm.fma.f32()
=> 0x7FF8000000000000 NaN
%11 = fptosi float %1 to i32 => poison
%12 = add i32 %11, 127 => poison
%13 = shl i32 %12, 23 => poison
%14 = bitcast i32 %13 to float => float poison
%15 = fmul float %10, %14 => float poison
%16 = icmp sle i32 %11, 127 => poison
%17 = icmp sge i32 %11, -127 => poison
%18 = and i1 %16, %17 => poison
%19 = select i1 %18, float %15, float 0x7FF0000000000000 => float poison
call void @printF32(float %19)
=> printF32(float poison)
- The subtraction of +Inf with +Inf resulting in NaN surprised me just a
little...
- The fptosi from +Inf results in poison.
- %19 is a select betwen float poison and +Inf. The i1 input is poison, and the
expected result of the test case is +Inf. For some reason InstCombiner chooses
the LHS.
I am not sure if any mathematical result here is invalid or not... Could
something be target specific?
I have now built this test case also on my x86 laptop, and it seems like there
is something wrong in the first place with ExpApproximation::matchAndRewrite().
This function (same as before):
func @main() {
// CHECK: inf
%inf = constant 0x7f800000 : f32
%exp_inf = math.exp %inf : f32
vector.print %exp_inf : f32
return
}
Is expanded my by mlir-opt (ExpApproximation) to a sequence which seems
suspicious - it contains:
...
%7 = llvm.mlir.constant(0x7F800000 : f32) : f32 // +Inf
...
%10 = "llvm.intr.floor"(%7) : (f32) -> f32 // +Inf
...
%20 = llvm.fptosi %10 : f32 to i32 // poison!
%21 = llvm.add %20, %8 : i32
%22 = llvm.shl %21, %6 : i32
%23 = llvm.bitcast %22 : i32 to f32
%24 = llvm.fmul %19, %23 : f32
%25 = llvm.icmp "sle" %20, %8 : i32
%26 = llvm.icmp "sge" %20, %9 : i32
%27 = llvm.and %25, %26 : i1
%28 = llvm.select %27, %24, %7 : i1, f32
llvm.call @printF32(%28) : (f32) -> ()
So also on x86 I see:
*** IR Dump After Function Integration/Inlining (inline) ***define void
@_mlir_main(i8** %0) {
call void @printF32(float poison), !dbg !3
call void @printNewline(), !dbg !9
ret void
}
# *** IR Dump After X86 Speculative Execution Side Effect Suppression (x86-
seses) ***:
# Machine code for function _mlir_main: NoPHIs, TracksLiveness, NoVRegs,
TiedOpsRewritten
Function Live Ins: $rdi
bb.0 (%ir-block.1):
liveins: $rdi
frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
CFI_INSTRUCTION def_cfa_offset 16
renamable $rax = MOV64ri @printF32, debug-location !27; <stdin>:34:5
renamable $xmm0 = IMPLICIT_DEF
CALL64r killed renamable $rax, <regmask ...
renamable $rax = MOV64ri @printNewline, debug-location !28; <stdin>:35:5
CALL64r killed renamable $rax, <regmask ...
$rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
CFI_INSTRUCTION def_cfa_offset 8
RETQ
So at this point it looks like this by chance prints inf on x86, while it
prints 0 on SystemZ (and without inlining on SystemZ it by chance printed
inf...).
The ExpApproximation of a constant infinity seems to involve an fptosi of +Inf,
which results in poison, so it looks like there is a bug here, or?
ping!
The SystemZ test has been marked XFAIL, but we should still try to understand the underlying problem before closing this...
Sorry I wasn't getting notification on this, Can you post the ll output with/without inliner?
I think what is happening is related to some target specific +inf arithmetic which we can avoid.
(In reply to Ahmed S. Taei from comment #12)
> Sorry I wasn't getting notification on this, Can you post the ll output
> with/without inliner?
>
> I think what is happening is related to some target specific +inf arithmetic
> which we can avoid.
The test case I posted in comment #8 is the actual input to the inliner. The
function containing the fp instructions (@main) is inlined into @mlir_main()
into just a poison value. So, with the inliner, a poison value is printed,
without inlining, the function is called as it is (the "ll output without
inliner" is identical to the function as shown in the test case)
As I wrote in following comments after that, this seems to fail in the same way
on both X86 and SystemZ, but by chance it prints the right value some times
(it's printing an undef).
I still think it's the fp -> SInt conversion of infinity that stands out (see
comment #9)... If this would be target dependent as you said, could this be the
point of the problem?
I see that if I run 'opt early-cse' on the test case (tc_preinl.ll), the
@main() function is also reduced to just a poison value. So it seems that the
function is potentially optimized into a poison value, but if that optimization
never happens, the fp instructions remain and the SystemZ backend emits
instructions (including the fp->sint conversion) and prints the expected
result...
@Ahmed: Did my reply help you understand the problem or is there anything else you need from me?
Right, I can change the approximation to avoid any arithmetic on +inf/-inf. I Will send a diff you can try on SystemZ from your side.
tc_preinl.ll
(3648 bytes, text/plain)