AliveToolkit / alive2

Automatic verification of LLVM optimizations
MIT License
766 stars 97 forks source link

False positive and different refined value of `zext undef` #1025

Closed XChy closed 5 months ago

XChy commented 6 months ago

Link: https://alive2.llvm.org/ce/z/Jdnydf For the example below:

define i32 @src(i16 %a, i32 %b) {
entry:
  %a.zext = zext i16 %a to i32
  %cmp37 = icmp ugt i32 %a.zext, %b
  %.pre111 = and i32 %b, 65535
  %spec.select = select i1 %cmp37, i32 %.pre111, i32 0
  ret i32 %spec.select
}

define i32 @tgt(i16 %a, i32 %b) {
entry:
  %a.zext = zext i16 %a to i32
  %cmp37 = icmp ugt i32 %a.zext, %b
  %spec.select = select i1 %cmp37, i32 %b, i32 0
  ret i32 %spec.select
}

Alive2 reports:

ERROR: Value mismatch

Example:
i16 %a = undef
i32 %b = undef

Source:
i32 %a.zext = #x00000000 (0)    [based on undef value]
i1 %cmp37 = #x0 (0)
i32 %.pre111 = #x00000000 (0)   [based on undef value]
i32 %spec.select = #x00000000 (0)

Target:
i32 %a.zext = #x00000001 (1)
i1 %cmp37 = #x1 (1)
i32 %spec.select = #x08000000 (134217728)
Source value: #x00000000 (0)
Target value: #x08000000 (134217728)

For input %a = undef, %b = undef, it seems that a.zext in src is refined to 0, but a.zext in tgt is refined to 1.

nunoplopes commented 5 months ago

That's the beauty of undef: it can be refined to whatever you like. Plus it can have different values on each use. The transformation is clearly wrong in the presence of undef.

XChy commented 5 months ago

That's the beauty of undef: it can be refined to whatever you like. Plus it can have different values on each use. The transformation is clearly wrong in the presence of undef.

Thanks for explanation! But I don't really understand why the point "undef can be refined to anything" indicates the transformation is wrong, if we take two identical functions below:

define i1 @src(i16 %a, i32 %b) {
entry:
  %a.zext = zext i16 %a to i32      ; i32 %a.zext = #x00000000 (0)  [based on undef value]
  %cmp37 = icmp ugt i32 %a.zext, %b ; i1 %cmp37 = #x0 (0)
  ret i1 %cmp37
}

define i1 @tgt(i16 %a, i32 %b) {
entry:
  %a.zext = zext i16 %a to i32         ; i32 %a.zext = #x00000001 (1)
  %cmp37 = icmp ugt i32 %a.zext, %b    ; i1 %cmp37 = #x1 (1)
  ret i1 %cmp37
}

For input %a = undef, %b = 0, in src, zext of a is refined to 0, while in tgt, zext of a is refined to 1. Thus the result in src is false, and the result in tgt is true. This is contrary to my intuition. Could you further explain it? Thanks.

Or undef is refined in the order? For example, the first undef in src should be refined to the identical value as the first undef in tgt is refined to? (It seems that the example in title doesn't obey it)

nunoplopes commented 5 months ago

Well, you need to read up about undef & refinement.

undef can be changed to anything both source and target. there's no connection between them. An optimization is correct if the target exhibits a subset of behaviors of the source (refinement). In the oposite direction, an optimization is wrong if we can select a value for undef in target such that it produces a behavior that is not observable in source no mater what we values we select for each undef use.

Make sure you understand why the following is correct:

%x = undef
%y = %x +x %x
  =>
%y = undef