Open Quuxplusone opened 7 years ago
Bugzilla Link | PR32817 |
Status | NEW |
Importance | P enhancement |
Reported by | Eric Christopher (echristo@gmail.com) |
Reported on | 2017-04-26 12:46:00 -0700 |
Last modified on | 2017-07-05 22:07:21 -0700 |
Version | trunk |
Hardware | PC All |
CC | carrot@google.com, hfinkel@anl.gov, kit.barton@gmail.com, llvm-bugs@lists.llvm.org, nemanja.i.ibm@gmail.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
It is expression reassociation problem. But in PPC backend:
1 PPCInstrInfo::getMachineCombinerPatterns enables it with -O3 only.
2 PPCInstrInfo::isAssociativeAndCommutative enables it for a very limited number of floating point operations.
Carrot, are you working on this? If so, can you please assign it to yourself?
If not, someone from our team can take a look at it. Thanks.
Feel free to take it.
We can certainly turn on reassociation to try to address this problem. But reassociation currently only considers instructions that have the same opcode. We'd need to make it understand the relation between reg/reg and reg/imm forms - for example PPC::ADD4/PPC::ADDI.
Actually MachineCombiner may have a very hard time with something like this due
to a couple of reasons:
- We only want to run it at -O3
- Different opcodes
It is actually reassociation on the SDAG (in the DAGCombiner) that messes this
up to begin with. It is done without regard to whether it will increase path
length.
I don't really see how this reduction in ILP is a gain on any processor.
This patch fixes this particular issue and if you think it is worth posting on
Phabricator, I can certainly do so (with this test case and possibly some for
the other affected nodes):
Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/DAGCombiner.cpp (revision 304396)
+++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp (working copy)
@@ -880,7 +880,9 @@ SDValue DAGCombiner::ReassociateOps(unsigned Opc,
return DAG.getNode(Opc, DL, VT, N0.getOperand(0), OpNode);
return SDValue();
}
- if (N0.hasOneUse()) {
+ // Reassociation will increase the path length if N0 and N1 use the same
+ // value. Don't reassociate in this case.
+ if (N0.hasOneUse() && !N0.getOperand(0).isOperandOf(N1.getNode())) {
// reassoc. (op (op x, c1), y) -> (op (op x, y), c1) iff x+c1 has one
// use
SDValue OpNode = DAG.getNode(Opc, SDLoc(N0), VT, N0.getOperand(0), N1);
@@ -900,7 +902,9 @@ SDValue DAGCombiner::ReassociateOps(unsigned Opc,
return DAG.getNode(Opc, DL, VT, N1.getOperand(0), OpNode);
return SDValue();
}
- if (N1.hasOneUse()) {
+ // Reassociation will increase the path length if N0 and N1 use the same
+ // value. Don't reassociate in this case.
+ if (N1.hasOneUse() && !N1.getOperand(0).isOperandOf(N0.getNode())) {
// reassoc. (op x, (op y, c1)) -> (op (op x, y), c1) iff x+c1 has one
// use
SDValue OpNode = DAG.getNode(Opc, SDLoc(N0), VT, N0, N1.getOperand(0));
P.S. The patch does not cause any failures on a double bootstrap build.
(In reply to Nemanja Ivanovic from comment #5)
> Actually MachineCombiner may have a very hard time with something like this
> due to a couple of reasons:
> - We only want to run it at -O3
> - Different opcodes
>
> It is actually reassociation on the SDAG (in the DAGCombiner) that messes
> this up to begin with. It is done without regard to whether it will increase
> path length.
>
> I don't really see how this reduction in ILP is a gain on any processor.
> This patch fixes this particular issue and if you think it is worth posting
> on Phabricator, I can certainly do so (with this test case and possibly some
> for the other affected nodes):
>
> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> ===================================================================
> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp (revision 304396)
> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp (working copy)
> @@ -880,7 +880,9 @@ SDValue DAGCombiner::ReassociateOps(unsigned Opc,
> return DAG.getNode(Opc, DL, VT, N0.getOperand(0), OpNode);
> return SDValue();
> }
> - if (N0.hasOneUse()) {
> + // Reassociation will increase the path length if N0 and N1 use the
> same
> + // value. Don't reassociate in this case.
> + if (N0.hasOneUse() && !N0.getOperand(0).isOperandOf(N1.getNode())) {
> // reassoc. (op (op x, c1), y) -> (op (op x, y), c1) iff x+c1 has
> one
> // use
> SDValue OpNode = DAG.getNode(Opc, SDLoc(N0), VT, N0.getOperand(0),
> N1);
> @@ -900,7 +902,9 @@ SDValue DAGCombiner::ReassociateOps(unsigned Opc,
> return DAG.getNode(Opc, DL, VT, N1.getOperand(0), OpNode);
> return SDValue();
> }
> - if (N1.hasOneUse()) {
> + // Reassociation will increase the path length if N0 and N1 use the
> same
> + // value. Don't reassociate in this case.
> + if (N1.hasOneUse() && !N1.getOperand(0).isOperandOf(N0.getNode())) {
> // reassoc. (op x, (op y, c1)) -> (op (op x, y), c1) iff x+c1 has
> one
> // use
> SDValue OpNode = DAG.getNode(Opc, SDLoc(N0), VT, N0,
> N1.getOperand(0));
>
> P.S. The patch does not cause any failures on a double bootstrap build.
The gcc version is faster is not because of one of (op x, (op y, c1)) and (op
(op x, y), c1) is faster than another one, the reason is it has shorter
dependence chain.
OK, I called it "path length" you call it "dependence chain". They're the same
thing aren't they? With that DAG combine, you can't do the shift and the add-
immediate in parallel. Disabling the DAG combine allows that to happen. The
code LLVM emits with that patch is:
srw 4, 4, 3
addi 3, 3, 1
add 3, 3, 4
clrldi 3, 3, 32
I see, your patch specifically test following case
((k + 1) + (x >> k))
It looks too constrained. If I have following DAG
((x >> k) + k) + 1
Or a little different expression
((m + 1) + (x >> k))
LLVM still can't handle them.
(In reply to Carrot from comment #8)
> I see, your patch specifically test following case
> ((k + 1) + (x >> k))
>
> It looks too constrained. If I have following DAG
>
> ((x >> k) + k) + 1
>
> Or a little different expression
>
> ((m + 1) + (x >> k))
>
> LLVM still can't handle them.
Fair enough, the last one still has the problem. Of course, completely
disabling this reassociation in the SDAG makes the problem go away on all of
them.
So I guess we should decide how to proceed with this. I don't think the
MachineCombiner has the infrastructure to easily undo this. We have a few
options:
- Modify what we do in the MachineCombiner to allow us to handle this
- Add a target hook to DAGCombiner to disable this on PPC
- Add a PPC DAG Combine to undo this
- Add a peephole to "shorten dependency chains"
- Something else I haven't thought of
Anyone have an opinion on which approach we want to pursue to solve this issue?