Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[SelectionDAGIsel] Long series of chained stores may cause severe compile-time regression in instruction selection lowering #21758

Open Quuxplusone opened 9 years ago

Quuxplusone commented 9 years ago
Bugzilla Link PR21759
Status NEW
Importance P normal
Reported by Chad Rosier (mcrosier@codeaurora.org)
Reported on 2014-12-05 09:15:50 -0800
Last modified on 2018-02-06 09:10:17 -0800
Version trunk
Hardware PC Linux
CC apazos@codeaurora.org, bob.wilson@apple.com, chandlerc@gmail.com, hfinkel@anl.gov, james@jamesmolloy.co.uk, liujiangning1@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@ndave.org, mcrosier@codeaurora.org, t.p.northover@gmail.com, zhaoshiz@codeaurora.org, zinob@codeaurora.org
Fixed by commit(s)
Attachments bad.ll (16808 bytes, application/octet-stream)
bad.instcombine.ll (15426 bytes, application/octet-stream)
Blocks
Blocked by
See also
Created attachment 13437
Bad IR

Unfortunately, I can only reproduce this on our internal branch, but the
problem exists on mainline as well.

Given the following loop:

void foo (int a[])
{
  int *ptr = a;
  for (int i = 0; i <= 127; i++)
    *ptr++ = 0xff;
}

Our internal branch will completely unroll the loop during the second call to
the LoopUnrollPass.  The resulting IR (see attached bad.ll) causes a severe
compile-time regression when instruction selection begins recursively calling
WalkChainUsers in SelectionDAGISel.cpp.

If unrolling occurs during the first call to the LoopUnrollPass a later call to
InstCombine removes the chain by having all the gep use base + offset (see
attached bad.instcombine.ll).

I imagine the simplest solution would be to run InstCombine after the second
loop unrolling pass, but I'd like to get feedback before moving forward.
Quuxplusone commented 9 years ago

Attached bad.ll (16808 bytes, application/octet-stream): Bad IR

Quuxplusone commented 9 years ago

Attached bad.instcombine.ll (15426 bytes, application/octet-stream): Bad IR after InstCombine

Quuxplusone commented 9 years ago

Some additional context. Internally, if we see a loop that has a known iteration count, and the body of the loop could be simplified by complete unrolling, we increase the threshold to make the loop more likely to be unrolled. Unlike the community branch, we double the threshold on the second call to the LoopUnrollPass, which is exposing the issue.

Quuxplusone commented 9 years ago
One of our internal developers has proposed caching the chain results.

Does the below look like a reasonable solution?  If so, I'll upload a complete
patch and
associated test cases.

----------------------------------------------------------------------------------------

diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
b/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
index a610e57..5f43296 100755
--- a/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
@@ -1980,6 +1980,8 @@ enum ChainResult {
   CR_LeadsToInteriorNode
 };

+typedef DenseMap<const SDNode *, ChainResult> TFCacheMapT;
+
 /// WalkChainUsers - Walk down the users of the specified chained node that is
 /// part of the pattern we're matching, looking at all of the users we find.
 /// This determines whether something is an interior node, whether we have a
@@ -1992,7 +1994,8 @@ enum ChainResult {
 static ChainResult
 WalkChainUsers(const SDNode *ChainedNode,
                SmallVectorImpl<SDNode*> &ChainedNodesInPattern,
-               SmallVectorImpl<SDNode*> &InteriorChainedNodes) {
+               SmallVectorImpl<SDNode*> &InteriorChainedNodes,
+               TFCacheMapT &TFCacheMap) {
   ChainResult Result = CR_Simple;

   for (SDNode::use_iterator UI = ChainedNode->use_begin(),
@@ -2073,7 +2076,16 @@ WalkChainUsers(const SDNode *ChainedNode,
     // as a new TokenFactor.
     //
     // To distinguish these two cases, do a recursive walk down the uses.
-    switch (WalkChainUsers(User, ChainedNodesInPattern, InteriorChainedNodes))
{
+
+    ChainResult UserResult;
+    if (TFCacheMap.count(User)) {
+      UserResult = TFCacheMap[User];
+    } else {
+      UserResult = WalkChainUsers(User, ChainedNodesInPattern,
+                                  InteriorChainedNodes, TFCacheMap);
+      TFCacheMap[User] = UserResult;
+    }
+    switch (UserResult) {
     case CR_Simple:
       // If the uses of the TokenFactor are just already-selected nodes, ignore
       // it, it is "below" our pattern.
@@ -2114,9 +2126,10 @@ HandleMergeInputChains(SmallVectorImpl<SDNode*>
&ChainNodesMatched,
   // users of the chain result. This adds any TokenFactor nodes that are caught
   // in between chained nodes to the chained and interior nodes list.
   SmallVector<SDNode*, 3> InteriorChainedNodes;
+  TFCacheMapT TFCacheMap;
   for (unsigned i = 0, e = ChainNodesMatched.size(); i != e; ++i) {
     if (WalkChainUsers(ChainNodesMatched[i], ChainNodesMatched,
-                       InteriorChainedNodes) == CR_InducesCycle)
+                       InteriorChainedNodes, TFCacheMap) == CR_InducesCycle)
       return SDValue(); // Would induce a cycle.
   }
Quuxplusone commented 8 years ago

http://reviews.llvm.org/D16340