Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[GVN] Miscompile: incorrect replacement of redundant load with PHIs #41703

Closed Quuxplusone closed 4 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR42733
Status RESOLVED FIXED
Importance P normal
Reported by Daniil Suchkov (suc-daniil@yandex.ru)
Reported on 2019-07-24 00:20:20 -0700
Last modified on 2020-03-02 00:35:42 -0800
Version trunk
Hardware PC Linux
CC dantrushin@gmail.com, evgueni.brevnov@gmail.com, florian_hahn@apple.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments gvn.log (97369 bytes, text/x-log)
Blocks
Blocked by
See also PR31651
In this case GVN eliminates following load:
  %tmp25 = load atomic i64, i64 addrspace(1)* %tmp24 unordered, align 8

by replacing it with a sequence of PHIs and the problem is that the first phi
in this sequence contains undef:
  bb6:                                              ; preds = %bb19, %bb
    %tmp251 = phi i64 [ undef, %bb ], [ %tmp252, %bb19 ]

This bug is not reproducible without !tbaa metadata on this load:
  %tmp22 = load atomic i64, i64 addrspace(1)* %tmp21 unordered, align 8, !tbaa !0
even though only BasicAA is used.

From what I was able to find, it looks like this PHI contains undef because for
some reason MemoryDependenceAnalysis reports incomplete list of dependencies
for that load.

The reproducer:
========================================================
define i64 @foo(i64 addrspace(1)** %arg, i1 %arg1, i1 %arg2, i1 %arg3, i32
%arg4) {
bb:
  %tmp = load atomic i64 addrspace(1)*, i64 addrspace(1)** %arg unordered, align 8
  %tmp5 = getelementptr inbounds i64, i64 addrspace(1)* %tmp, i64 8
  store atomic i64 0, i64 addrspace(1)* %tmp5 unordered, align 8
  br label %bb6

bb6:                                              ; preds = %bb19, %bb
  %tmp7 = phi i64 [ 0, %bb ], [ %tmp22, %bb19 ]
  br i1 %arg1, label %bb19, label %bb8

bb8:                                              ; preds = %bb6
  %tmp9 = load atomic i64 addrspace(1)*, i64 addrspace(1)** %arg unordered, align 8
  br i1 %arg2, label %bb11, label %bb10

bb10:                                             ; preds = %bb8
  br label %bb15

bb11:                                             ; preds = %bb8
  br i1 %arg3, label %bb12, label %bb18

bb12:                                             ; preds = %bb11
  %tmp13 = phi i64 addrspace(1)* [ %tmp9, %bb11 ]
  %tmp14 = getelementptr inbounds i64, i64 addrspace(1)* %tmp13, i64 8
  store atomic i64 1, i64 addrspace(1)* %tmp14 unordered, align 8
  ret i64 0

bb15:                                             ; preds = %bb26, %bb10
  %tmp16 = phi i64 addrspace(1)* [ %tmp9, %bb10 ], [ %tmp27, %bb26 ]
  %tmp17 = phi i64 [ %tmp7, %bb10 ], [ 0, %bb26 ]
  switch i32 %arg4, label %bb19 [
    i32 0, label %bb26
    i32 1, label %bb23
  ]

bb18:                                             ; preds = %bb11
  br label %bb19

bb19:                                             ; preds = %bb18, %bb15, %bb6
  %tmp20 = phi i64 addrspace(1)* [ %tmp16, %bb15 ], [ null, %bb6 ], [ %tmp9, %bb18 ]
  %tmp21 = getelementptr inbounds i64, i64 addrspace(1)* %tmp20, i64 8
  %tmp22 = load atomic i64, i64 addrspace(1)* %tmp21 unordered, align 8, !tbaa !0
  br label %bb6

bb23:                                             ; preds = %bb15
  %tmp24 = getelementptr inbounds i64, i64 addrspace(1)* %tmp16, i64 8
  %tmp25 = load atomic i64, i64 addrspace(1)* %tmp24 unordered, align 8
  call void @baz(i64 %tmp25) #0
  ret i64 0

bb26:                                             ; preds = %bb15
  call void @bar()
  %tmp27 = load atomic i64 addrspace(1)*, i64 addrspace(1)** %arg unordered, align 8
  %tmp28 = getelementptr inbounds i64, i64 addrspace(1)* %tmp27, i64 8
  %tmp29 = load atomic i64, i64 addrspace(1)* %tmp28 unordered, align 8
  %tmp30 = getelementptr inbounds i64, i64 addrspace(1)* %tmp27, i64 40
  store atomic i64 %tmp29, i64 addrspace(1)* %tmp30 unordered, align 4
  br label %bb15
}

declare void @bar()

; Function Attrs: inaccessiblememonly readonly
declare void @baz(i64) #0

attributes #0 = { inaccessiblememonly readonly }

!0 = !{!1, !2, i64 8}
!1 = !{!"Name", !2, i64 8}
!2 = !{!"tbaa_local_fields", !3, i64 0}
!3 = !{!"tbaa-access-type"}
========================================================
$ opt -aa-pipeline=basic-aa -passes=gvn -S repro.ll -debug-only=gvn
GVN iteration: 0
GVN: load i64 addrspace(1)* %tmp has unknown dependence
GVN: load i64 addrspace(1)* %tmp9 is clobbered by   store atomic i64 0, i64
addrspace(1)* %tmp5 unordered, align 8
GVN: load i64 addrspace(1)* %tmp9 is clobbered by   store atomic i64 %tmp29,
i64 addrspace(1)* %tmp30 unordered, align 4
GVN: load i64 addrspace(1)* %tmp27 is clobbered by   call void @bar()
GVN: load i64 %tmp29 is clobbered by   call void @bar()
GVN removed:   %tmp13 = phi i64 addrspace(1)* [ %tmp9, %bb11 ]
GVN: load i64 %tmp22 is clobbered by   store atomic i64 0, i64 addrspace(1)*
%tmp5 unordered, align 8
GVN iteration: 1
GVN: load i64 addrspace(1)* %tmp has unknown dependence
GVN: load i64 addrspace(1)* %tmp9 is clobbered by   store atomic i64 0, i64
addrspace(1)* %tmp5 unordered, align 8
GVN: load i64 addrspace(1)* %tmp9 is clobbered by   store atomic i64 %tmp29,
i64 addrspace(1)* %tmp30 unordered, align 4
GVN REMOVING NONLOCAL LOAD:   %tmp25 = load atomic i64, i64 addrspace(1)*
%tmp24 unordered, align 8
GVN removed:   %0 = load atomic i64, i64 addrspace(1)* %tmp24 unordered, align 8
GVN: load i64 addrspace(1)* %tmp27 is clobbered by   call void @bar()
GVN: load i64 %tmp29 is clobbered by   call void @bar()
GVN: load i64 %tmp22 is clobbered by   store atomic i64 0, i64 addrspace(1)*
%tmp5 unordered, align 8
GVN iteration: 2
GVN: load i64 addrspace(1)* %tmp has unknown dependence
GVN: load i64 addrspace(1)* %tmp9 is clobbered by   store atomic i64 0, i64
addrspace(1)* %tmp5 unordered, align 8
GVN: load i64 addrspace(1)* %tmp9 is clobbered by   store atomic i64 %tmp29,
i64 addrspace(1)* %tmp30 unordered, align 4
GVN: load i64 addrspace(1)* %tmp27 is clobbered by   call void @bar()
GVN: load i64 %tmp29 is clobbered by   call void @bar()
GVN: load i64 %tmp22 is clobbered by   store atomic i64 0, i64 addrspace(1)*
%tmp5 unordered, align 8
; ModuleID = 'repro.ll'
source_filename = "repro.ll"

define i64 @foo(i64 addrspace(1)** %arg, i1 %arg1, i1 %arg2, i1 %arg3, i32
%arg4) {
bb:
  %tmp = load atomic i64 addrspace(1)*, i64 addrspace(1)** %arg unordered, align 8
  %tmp5 = getelementptr inbounds i64, i64 addrspace(1)* %tmp, i64 8
  store atomic i64 0, i64 addrspace(1)* %tmp5 unordered, align 8
  br label %bb6

bb6:                                              ; preds = %bb19, %bb
  %tmp251 = phi i64 [ undef, %bb ], [ %tmp252, %bb19 ]
  %tmp7 = phi i64 [ 0, %bb ], [ %tmp22, %bb19 ]
  br i1 %arg1, label %bb19, label %bb8

bb8:                                              ; preds = %bb6
  %tmp9 = load atomic i64 addrspace(1)*, i64 addrspace(1)** %arg unordered, align 8
  br i1 %arg2, label %bb11, label %bb10

bb10:                                             ; preds = %bb8
  br label %bb15

bb11:                                             ; preds = %bb8
  br i1 %arg3, label %bb12, label %bb18

bb12:                                             ; preds = %bb11
  %tmp14 = getelementptr inbounds i64, i64 addrspace(1)* %tmp9, i64 8
  store atomic i64 1, i64 addrspace(1)* %tmp14 unordered, align 8
  ret i64 0

bb15:                                             ; preds = %bb26, %bb10
  %tmp25 = phi i64 [ %tmp251, %bb10 ], [ %tmp29, %bb26 ]
  %tmp16 = phi i64 addrspace(1)* [ %tmp9, %bb10 ], [ %tmp27, %bb26 ]
  %tmp17 = phi i64 [ %tmp7, %bb10 ], [ 0, %bb26 ]
  switch i32 %arg4, label %bb19 [
    i32 0, label %bb26
    i32 1, label %bb23
  ]

bb18:                                             ; preds = %bb11
  br label %bb19

bb19:                                             ; preds = %bb18, %bb15, %bb6
  %tmp252 = phi i64 [ %tmp25, %bb15 ], [ %tmp251, %bb6 ], [ %tmp251, %bb18 ]
  %tmp20 = phi i64 addrspace(1)* [ %tmp16, %bb15 ], [ null, %bb6 ], [ %tmp9, %bb18 ]
  %tmp21 = getelementptr inbounds i64, i64 addrspace(1)* %tmp20, i64 8
  %tmp22 = load atomic i64, i64 addrspace(1)* %tmp21 unordered, align 8, !tbaa !0
  br label %bb6

bb23:                                             ; preds = %bb15
  %tmp24 = getelementptr inbounds i64, i64 addrspace(1)* %tmp16, i64 8
  call void @baz(i64 %tmp25) #0
  ret i64 0

bb26:                                             ; preds = %bb15
  call void @bar()
  %tmp27 = load atomic i64 addrspace(1)*, i64 addrspace(1)** %arg unordered, align 8
  %tmp28 = getelementptr inbounds i64, i64 addrspace(1)* %tmp27, i64 8
  %tmp29 = load atomic i64, i64 addrspace(1)* %tmp28 unordered, align 8
  %tmp30 = getelementptr inbounds i64, i64 addrspace(1)* %tmp27, i64 40
  store atomic i64 %tmp29, i64 addrspace(1)* %tmp30 unordered, align 4
  br label %bb15
}

declare void @bar()

; Function Attrs: inaccessiblememonly readonly
declare void @baz(i64) #0

attributes #0 = { inaccessiblememonly readonly }

!0 = !{!1, !2, i64 8}
!1 = !{!"Name", !2, i64 8}
!2 = !{!"tbaa_local_fields", !3, i64 0}
!3 = !{!"tbaa-access-type"}
========================================================

Any further simplification of the reproducer breaks it.
Quuxplusone commented 5 years ago

I think we are missing some MemDepAnalysis cache invalidation here as well. Before we find the incorrect dependencies, GVN replace uses of %tmp13 = phi i64 addrspace(1)* [ %tmp9, %bb11 ] with %tmp9. If you replace the uses before running GVN, it won't replace the load.

Similar to PR31651, we have to find the place in GVN where we do not invalidate the right thing.

Quuxplusone commented 4 years ago
While approach suggested in PR31651 might work here, It seems to me that this
problem is a little bit different.

My (perhaps wrong) understanding is this:

On first iteration, we start translating

%tmp24 = getelementptr inbounds i64, i64 addrspace(1)* %tmp16, i64 8
%tmp25 = load atomic i64, i64 addrspace(1)* %tmp24 unordered, align 8

in bb23;

PHI translation then walks to bb15 with address %tmp24, starts translating
%tmp16 and replace it with %tmp9 (from bb8)
The it returns back to processing GEP %tmp24.
It visits all users of its translated first operand (%tmp9), searching for
equivalent GEP. Dummy PHI node %tmp13 hides equivalent GEP %tmp14:

bb12:                                             ; preds = %bb11
  %tmp13 = phi i64 addrspace(1)* [ %tmp9, %bb11 ]
  %tmp14 = getelementptr inbounds i64, i64 addrspace(1)* %tmp13, i64 8

so finally PHI translation returns nullptr as final %tmp24 translation.
Then getNonLocalPointerDepFromBB() considers this as failure ands empty
non-local MemDepResult to the result list.
Finally, nonlocal deps for %tmp5 computed as:

NonLocalDeps:
 Address:   %tmp28 = getelementptr inbounds i64, i64 addrspace(1)* %tmp27, i64 8
 Entry  : BB bb26 MDR: Def          : I =   %tmp29 = load atomic i64, i64 addrspace(1)* %tmp28 unordered, align 8

 Address: NULL
 Entry  : BB bb10 MDR: Unknown      : I = NULL <----- XXX

On second iteration dummy phi

%tmp13 = phi i64 addrspace(1)* [ %tmp9, %bb11 ]

is eliminated and GEP %tmp14 is direct user of %tmp9 AND is cached:

{Val:   %tmp14 = getelementptr inbounds i64, i64 addrspace(1)* %tmp9, i64 8,
 RO: 1}
    FirstBlock: bb10 skip: 0
    BB: bb10 MDR: NonLocal     : I = NULL

Not that it is non-local dep, due to PHI translation not resolving across bb10.
The above process repeats, but this time we find valid cached entry for %tmp14.
But since is it non-local, getNonLocalPointerDependency() simply skips it!
And GVN gets incomplete dependence list.

It appears to me that we have interface/capability mismatch here:
GVN needs to have all dependencies to properly eliminate loads;
Function comment for getNonLocalPointerDependency() says it only returns
Def/Clobber dependencies. This does not seem correct to me.
Either PHI translation must work harder to find Def/Clobber dependencies
instead of bailing out or MemDep analysis must provide another interface for
GVN, exposing _all_ dependencies?

Is there anyone who understand memory dependence analysis good enough to answer
this questions? Looking at hung https://reviews.llvm.org/D65204, I guess no one?
:(
Quuxplusone commented 4 years ago

Attached gvn.log (97369 bytes, text/x-log): (verbose) log of memdep analysis performed on this testcase

Quuxplusone commented 4 years ago

For those who are interested there is a proposed fix https://reviews.llvm.org/D73032. Note it still marked as WIP since I need to work on adding regression tests.

Quuxplusone commented 4 years ago

The fix is ready for review https://reviews.llvm.org/D73032

Quuxplusone commented 4 years ago

Fixed by https://reviews.llvm.org/rGb0761bbc7639d0901d623e1fbf53ccf6ce066b16