AliveToolkit / alive2

Automatic verification of LLVM optimizations
MIT License
766 stars 97 forks source link

False negative with the sign bit of nans #1037

Closed dtcxzyw closed 3 months ago

dtcxzyw commented 4 months ago

Alive2: https://alive2.llvm.org/ce/z/g8marn


----------------------------------------
define i1 @src1() {
#0:
  %x = sqrt float -2494298614702763169545800705376256.000000
  %y = bitcast float %x to i32
  %cmp = icmp slt i32 %y, 0
  ret i1 %cmp
}
=>
define i1 @tgt1() {
#0:
  ret i1 1
}
Transformation seems to be correct!

----------------------------------------
define i1 @src2() {
#0:
  %x = sqrt float -2494298614702763169545800705376256.000000
  %y = bitcast float %x to i32
  %cmp = icmp slt i32 %y, 0
  ret i1 %cmp
}
=>
define i1 @tgt2() {
#0:
  ret i1 0
}
Transformation seems to be correct!

Summary:
  2 correct transformations
  0 incorrect transformations
  0 failed-to-prove transformations
  0 Alive2 errors

sqrt with a negative value (except for -0) returns nan. But we cannot make assumptions about the sign bit of return values.

See also https://github.com/llvm/llvm-project/issues/92217

cc @regehr @zhengyang92

dtcxzyw commented 4 months ago

Tests with non-constant input:

define i1 @src1(float nofpclass(zero nan pinf pnorm psub) %a) {
  %x = call float @llvm.sqrt.f32(float %a)
  %y = bitcast float %x to i32
  %cmp = icmp slt i32 %y, 0
  ret i1 %cmp
}

define i1 @tgt1(float %a) {
  ret i1 true
}

define i1 @src2(float nofpclass(zero nan pinf pnorm psub) %a) {
  %x = call float @llvm.sqrt.f32(float %a)
  %y = bitcast float %x to i32
  %cmp = icmp slt i32 %y, 0
  ret i1 %cmp
}

define i1 @tgt2(float %a) {
  ret i1 false
}
nunoplopes commented 3 months ago

I think Alive2 is correct per the last consensus of NaN payload. When a value is NaN, the payload has a non-deterministic sign, which allows refinement to a positive or negative number. There was a long discussion here: https://discourse.llvm.org/t/stronger-floating-point-nan-guarantees/72165 Which ended up in this PR: https://github.com/llvm/llvm-project/pull/66579/files

cc @jyknight @RalfJung

dtcxzyw commented 3 months ago

cc @arsenm

RalfJung commented 3 months ago

But we cannot make assumptions about the sign bit of return values.

Indeed. Specifically, a programmer cannot make such assumptions. The compiler is allowed to make the sign go either way. (This explains why the programmer cannot make any assumptions.) And that's what alive shows.

This is one of these cases where it is crucial to think of compilation in terms of refinement (the final program has fewer behaviors than the original program), not equivalence.

In particular, there is no guarantee (and no intent to guarantee) that the sign bit matches "whatever the hardware does". Such a guarantee would be very tricky to realize, since it would mean that const-folding of float operations has to know exactly which NaN is produced by the hardware we are currently compiling for.

dtcxzyw commented 3 months ago

Indeed. Specifically, a programmer cannot make such assumptions. The compiler is allowed to make the sign go either way. (This explains why the programmer cannot make any assumptions.) And that's what alive shows.

This is one of these cases where it is crucial to think of compilation in terms of refinement (the final program has fewer behaviors than the original program), not equivalence.

If we allow LLVM to assume that return value of llvm.sqrt is never negative (except for -0), we cannot refine a libcall sqrt into an intrinsic call llvm.sqrt. See the discussion in https://github.com/llvm/llvm-project/issues/92217#issuecomment-2111743008. @arsenm points out that we should follow the IEEE standard, which doesn't allow interpreting the signbit of a NaN.

BTW, I don't know why alive2 complains about the following refinement: https://alive2.llvm.org/ce/z/hHiosd

define i1 @src1(float nofpclass(zero norm inf sub) %x) {
  %y = bitcast float %x to i32
  %cmp = icmp slt i32 %y, 0
  ret i1 %cmp
}

define i1 @tgt1(float %x) {
  ret i1 true
}
----------------------------------------
define i1 @src1(float nofpclass(1020) %x) {
#0:
  %y = bitcast float nofpclass(1020) %x to i32
  %cmp = icmp slt i32 %y, 0
  ret i1 %cmp
}
=>
define i1 @tgt1(float %x) {
#0:
  ret i1 1
}
Transformation doesn't verify!

ERROR: Value mismatch

Example:
float nofpclass(1020) %x = #x7f800200 (SNaN)

Source:
i32 %y = #x7f800200 (2139095552)
i1 %cmp = #x0 (0)

Target:
Source value: #x0 (0)
Target value: #x1 (1)

In particular, there is no guarantee (and no intent to guarantee) that the sign bit matches "whatever the hardware does". Such a guarantee would be very tricky to realize, since it would mean that const-folding of float operations has to know exactly which NaN is produced by the hardware we are currently compiling for.

IIUC this constant folding is incorrect without nnan.

RalfJung commented 3 months ago

If we allow LLVM to assume that return value of llvm.sqrt is never negative (except for -0),

We don't allow LLVM to assume that. We allow LLVM to decide over the sign of any NaN produced by basic float operations (of which sqrt is one). That's not the same thing.

Sometimes LLVM makes that decision via const folding, sometimes it doesn't care and leaves the decision to hardware.

dtcxzyw commented 3 months ago

If we allow LLVM to assume that return value of llvm.sqrt is never negative (except for -0),

We don't allow LLVM to assume that. We allow LLVM to decide over the sign of any NaN produced by basic float operations (of which sqrt is one). That's not the same thing.

Sometimes LLVM makes that decision via const folding, sometimes it doesn't care and leaves the decision to hardware.

I agree. But the fundamental problem is how we interpret the semantic of libcall to sqrt. Specifically, we use target-specific intrinsic _mm256_sqrt_ps in https://github.com/llvm/llvm-project/issues/92217. If we allow this refinement, we cannot convert _mm256_sqrt_ps into llvm.sqrt.

RalfJung commented 3 months ago

AFAIK the vendor intrinsics for some other operations are also lowered to corresponding generic LLVM operations? Like basic per-lane arithmetic. So this is not specific to sqrt, the question is how closely the vendor intrinsics have to match the underlying hardware. Making them match 100% is not great for anyone who wants optimizations. After all, there always is the option of inline asm if you need an exact match with hardware semantics and are willing to pay the perf cost.

Vendor intrinsics will already not honor the FP environment; this is in the same broad class of questions. (And that's largely a clang question, not an alive/LLVM question, so this is the wrong issue tracker for that.)