Open Quuxplusone opened 5 years ago
Bugzilla Link | PR39679 |
Status | NEW |
Importance | P enhancement |
Reported by | Guille D. Canas (guille@berkeley.edu) |
Reported on | 2018-11-15 12:43:12 -0800 |
Last modified on | 2021-10-26 08:54:36 -0700 |
Version | trunk |
Hardware | PC All |
CC | andrew.kaylor@intel.com, blitzrakete@gmail.com, dgregor@apple.com, erik.pilkington@gmail.com, fj8765ah@aa.jp.fujitsu.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk, ueno.masakazu@jp.fujitsu.com, yuanfang.chen@sony.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
If someone can point me to the file(s) that handle this pragma I could try to see if I can find the bug. I'm not familiar with the code base but I can give it a shot.
I looked in the code. Looks like an easy to correct error.
File 'lib/CodeGen/BackendUtil.cpp:420' in the clang source seems to contain the
error:
[...]
case LangOptions::FPC_Off:
// Preserve any contraction performed by the front-end. (Strict performs
// splitting of the muladd instrinsic in the backend.)
Options.AllowFPOpFusion = llvm::FPOpFusion::Standard; <-----
break;
case LangOptions::FPC_On:
Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
break;
case LangOptions::FPC_Fast:
Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;
break;
[...]
It seems line 420:
'llvm:FPOpFusion::Standard' should be -> 'llvm::FPOpFusion::Strict'.
According to the llvm source (include/llvm/CodeGen/CommandFlags.inc:180),
FPOpFusion::Strict corresponds to no fp-contraction)
If someone can verify, this should be a trivial fix.
Sadly the above modification (alone) did not fix the reported bug.
I think that there are the following two problems when judging whether to
perform FMA conversion.
(1) Even if there is pragma, it is judged by the translation option.
(2) There is an error in the process of setting pragma information.
For example, in the following program, I think it is correct that FMA
conversion is not performed.
Is this right?
$cat a.c
float foo(float b, float c, float d) {
#pragma clang fp contract(off)
return b * c + d;
}
Assuming it is correct, I will attach my patch below.
------------------------------------------------------
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp
b/clang/lib/CodeGen/CGExprScalar.cpp
index 3d082de..37ffc63 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -221,7 +221,7 @@ static Value *propagateFMFlags(Value *V, const BinOpInfo
&Op) {
if (auto *I = dyn_cast<llvm::Instruction>(V)) {
llvm::FastMathFlags FMF = I->getFastMathFlags();
updateFastMathFlags(FMF, Op.FPFeatures);
- I->setFastMathFlags(FMF);
+ I->copyFastMathFlags(FMF);
}
return V;
}
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index b8af674..ffc7441 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -11258,6 +11258,11 @@ static bool isContractable(SDNode *N) {
return F.hasAllowContract() || F.hasAllowReassociation(); }
+static bool isContractableOnly(SDNode *N) {
+ SDNodeFlags F = N->getFlags();
+ return F.hasAllowContract();
+}
+
/// Try to perform FMA combining on a given FADD node.
SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
SDValue N0 = N->getOperand(0);
@@ -11287,6 +11292,9 @@ SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
if (!AllowFusionGlobally && !isContractable(N))
return SDValue();
+ if (!isContractableOnly(N))
+ return SDValue();
+
const SelectionDAGTargetInfo *STI = DAG.getSubtarget().getSelectionDAGInfo();
if (STI && STI->generateFMAsInMachineCombiner(OptLevel))
return SDValue();
@@ -11499,6 +11507,9 @@ SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {
if (!AllowFusionGlobally && !isContractable(N))
return SDValue();
+ if (!isContractableOnly(N))
+ return SDValue();
+
const SelectionDAGTargetInfo *STI = DAG.getSubtarget().getSelectionDAGInfo();
if (STI && STI->generateFMAsInMachineCombiner(OptLevel))
return SDValue();
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 1e089ff..5f5d371 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -3466,9 +3466,7 @@ static bool isCombineInstrCandidateFP(const MachineInstr
&Inst) {
case AArch64::FSUBv2f32:
case AArch64::FSUBv2f64:
case AArch64::FSUBv4f32:
- TargetOptions Options = Inst.getParent()->getParent()->getTarget().Options;
- return (Options.UnsafeFPMath ||
- Options.AllowFPOpFusion == FPOpFusion::Fast);
+ return Inst.getFlag(MachineInstr::FmContract);
}
return false;
}
------------------------------------------------------
The result of applying the above patch is shown below.
Specification option:
clang [-O1|-O2|-O3|-Ofast] (-ffp-contract={off|on|fast}) a.c
Before correction:
+--------+---------------------------------------------+
| | -ffp-contract= |
+--------+---------------+---------+---------+---------+
| | Not specified | off | on | fast |
+--------+---------------+---------+---------+---------+
| -O1 | not fma | not fma | not fma | fma |
+--------+---------------+---------+---------+---------+
| -O2 | not fma | not fma | not fma | fma |
+--------+---------------+---------+---------+---------+
| -O3 | not fma | not fma | not fma | fma |
+--------+---------------+---------+---------+---------+
| -Ofast | fma | fma | fma | fma |
+--------+---------------+---------+---------+---------+
After applying my patch:
+--------+---------------------------------------------+
| | -ffp-contract= |
+--------+---------------+---------+---------+---------+
| | Not specified | off | on | fast |
+--------+---------------+---------+---------+---------+
| -O1 | not fma | not fma | not fma | not fma |
+--------+---------------+---------+---------+---------+
| -O2 | not fma | not fma | not fma | not fma |
+--------+---------------+---------+---------+---------+
| -O3 | not fma | not fma | not fma | not fma |
+--------+---------------+---------+---------+---------+
| -Ofast | not fma | not fma | not fma | not fma |
+--------+---------------+---------+---------+---------+