Closed Quuxplusone closed 4 years ago
I think we are missing some MemDepAnalysis cache invalidation here as well. Before we find the incorrect dependencies, GVN replace uses of %tmp13 = phi i64 addrspace(1)* [ %tmp9, %bb11 ]
with %tmp9
. If you replace the uses before running GVN, it won't replace the load.
Similar to PR31651, we have to find the place in GVN where we do not invalidate the right thing.
While approach suggested in PR31651 might work here, It seems to me that this
problem is a little bit different.
My (perhaps wrong) understanding is this:
On first iteration, we start translating
%tmp24 = getelementptr inbounds i64, i64 addrspace(1)* %tmp16, i64 8
%tmp25 = load atomic i64, i64 addrspace(1)* %tmp24 unordered, align 8
in bb23;
PHI translation then walks to bb15 with address %tmp24, starts translating
%tmp16 and replace it with %tmp9 (from bb8)
The it returns back to processing GEP %tmp24.
It visits all users of its translated first operand (%tmp9), searching for
equivalent GEP. Dummy PHI node %tmp13 hides equivalent GEP %tmp14:
bb12: ; preds = %bb11
%tmp13 = phi i64 addrspace(1)* [ %tmp9, %bb11 ]
%tmp14 = getelementptr inbounds i64, i64 addrspace(1)* %tmp13, i64 8
so finally PHI translation returns nullptr as final %tmp24 translation.
Then getNonLocalPointerDepFromBB() considers this as failure ands empty
non-local MemDepResult to the result list.
Finally, nonlocal deps for %tmp5 computed as:
NonLocalDeps:
Address: %tmp28 = getelementptr inbounds i64, i64 addrspace(1)* %tmp27, i64 8
Entry : BB bb26 MDR: Def : I = %tmp29 = load atomic i64, i64 addrspace(1)* %tmp28 unordered, align 8
Address: NULL
Entry : BB bb10 MDR: Unknown : I = NULL <----- XXX
On second iteration dummy phi
%tmp13 = phi i64 addrspace(1)* [ %tmp9, %bb11 ]
is eliminated and GEP %tmp14 is direct user of %tmp9 AND is cached:
{Val: %tmp14 = getelementptr inbounds i64, i64 addrspace(1)* %tmp9, i64 8,
RO: 1}
FirstBlock: bb10 skip: 0
BB: bb10 MDR: NonLocal : I = NULL
Not that it is non-local dep, due to PHI translation not resolving across bb10.
The above process repeats, but this time we find valid cached entry for %tmp14.
But since is it non-local, getNonLocalPointerDependency() simply skips it!
And GVN gets incomplete dependence list.
It appears to me that we have interface/capability mismatch here:
GVN needs to have all dependencies to properly eliminate loads;
Function comment for getNonLocalPointerDependency() says it only returns
Def/Clobber dependencies. This does not seem correct to me.
Either PHI translation must work harder to find Def/Clobber dependencies
instead of bailing out or MemDep analysis must provide another interface for
GVN, exposing _all_ dependencies?
Is there anyone who understand memory dependence analysis good enough to answer
this questions? Looking at hung https://reviews.llvm.org/D65204, I guess no one?
:(
Attached gvn.log
(97369 bytes, text/x-log): (verbose) log of memdep analysis performed on this testcase
For those who are interested there is a proposed fix https://reviews.llvm.org/D73032. Note it still marked as WIP since I need to work on adding regression tests.
The fix is ready for review https://reviews.llvm.org/D73032
gvn.log
(97369 bytes, text/x-log)