Open Quuxplusone opened 7 years ago
Bugzilla Link | PR34437 |
Status | NEW |
Importance | P enhancement |
Reported by | Dmitry Vyukov (dvyukov@google.com) |
Reported on | 2017-09-03 03:44:22 -0700 |
Last modified on | 2017-09-12 09:36:10 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | dvyukov@google.com, glider@google.com, llvm-bugs@lists.llvm.org |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
Yep. The only option you have right now is to use -O0, where the compiler
doesn't perform short-circuiting (I know this sucks).
Another option for us is to move the coverage instrumentation to the point
before short-circuiting happens.
There is already the Clang-based coverage instrumentation that works in clang,
i.e. before any optimization:
http://clang.llvm.org/docs/SourceBasedCodeCoverage.html
I've briefly tried using that one for libFuzzer (see projects/compiler-
rt/lib/fuzzer/FuzzerClangCounters.cpp) and the results were worse than with
SanitizerCoverage, but I've only tried one fuzzing benchmark.
Once we have a proper A/B testing (https://github.com/google/fuzzer-test-
suite/tree/master/engine-comparison), we'll do a better experiment.
(And the next experiment is to use both at the same time: clang coverage and
sanitizer coverage).
> Yep. The only option you have right now is to use -O0, where the compiler
doesn't perform short-circuiting (I know this sucks).
You mean where is _does_ perform short-circuiting?
But foo is short-circuited and there is even a basic block (BB#1) between
evaluation of the conditions. Why don't we insert the callback there?
(In reply to Dmitry Vyukov from comment #2)
> > Yep. The only option you have right now is to use -O0, where the compiler
doesn't perform short-circuiting (I know this sucks).
>
>
> You mean where is _does_ perform short-circuiting?
short-circuiting is a transformation a&&b => a&b.
At >= -O1 llvm does it sometimes at -O0 it never does.
> But foo is short-circuited and there is even a basic block (BB#1) between
> evaluation of the conditions. Why don't we insert the callback there?
Ah, that's a different question.
So, in "foo" we have BB#1 with two successors: LBB0_2 and BB#3
Both successors *are* instrumented, so also instrumenting BB#1 won't give us
any new signal.
Strange thing is that we don't instrument this BB even with -fsanitize-
coverage=trace-pc,no-prune where pruning is disabled. Looking.
> So, in "foo" we have BB#1 with two successors: LBB0_2 and BB#3
Both successors *are* instrumented, so also instrumenting BB#1 won't give us
any new signal.
Nope. It clearly gives us new signal. If we insert callback at BB#1 we will be
able to understand when we guessed x correctly.
This is the IR that SanitizerCoverage gets:
%cmp = icmp eq i32 %x, 57005
%cmp1 = icmp eq i32 %y, 48879
%or.cond = and i1 %cmp, %cmp1
br i1 %or.cond, label %if.then, label %if.end
So, there is no BB here, and we can't insert the BB callback (we do insert
__sanitizer_cov_trace_const_cmp4, but that's not what you are asking about)
This happens *very* early in the optimization pipeline, at the first "Simplify
the CFG". So, essentially the only place where we can insert the callbacks as
you want is Clang -- and that's precisely what clang coverage will give you.
Later this BB gets unsplit back into two BBs (after "X86 DAG->DAG Instruction
Selection") but it's too late.
>> If we insert callback at BB#1 we will be able to understand when we guessed
x correctly.
Yes, you are right.
Is it possible to easily disable the merging pass if coverage is enabled? Is it possible to run the splitting pass before the coverage? This severe hurts coverage guiding (and almost makes everything we tell about coverage-guided fuzzing a lie...).
(In reply to Dmitry Vyukov from comment #6)
> Is it possible to easily disable the merging pass if coverage is enabled?
If I modify lib/Transforms/Scalar/SimplifyCFGPass.cpp by returning early in
iterativelySimplifyCFG I get no short-circuiting.
I don't know off the top of my head if there is a flag to do it, or what
consequences of this will be.
> Is
> it possible to run the splitting pass before the coverage?
Unlikely.
> This severe hurts
> coverage guiding (and almost makes everything we tell about coverage-guided
> fuzzing a lie...).
Only to some extent. In real code you *usually* don't get too much short-
circuiting. But surely, I agree, this hurts some of the cases, which is why I
was experimenting with clang coverage recently.
I've applied this change (which seems to be the minimal change that fixes the
reproducer):
Index: lib/Transforms/Utils/SimplifyCFG.cpp
===================================================================
--- lib/Transforms/Utils/SimplifyCFG.cpp (revision 312156)
+++ lib/Transforms/Utils/SimplifyCFG.cpp (working copy)
@@ -298,7 +298,7 @@
const TargetTransformInfo &TTI) {
assert(isSafeToSpeculativelyExecute(I) &&
"Instruction is not safe to speculatively execute!");
- return TTI.getUserCost(I);
+ return TargetTransformInfo::TCC_Expensive;
}
/// If we have a merge point of an "if condition" as accepted above,
@@ -5744,6 +5744,8 @@
}
bool SimplifyCFGOpt::SimplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder) {
+ return false;
+
BasicBlock *BB = BI->getParent();
For foo it changes number of coverage callbacks from 3 to 4, and for bar --
from 1 to 3.
For Linux kernel it adds 6.8% of coverage callbacks (401279->428496). And I
would expect most of that is very useful signal for fuzzing: complex,
speculatable logical conditions (i.e. not if (p && p->x)).
> which is why I was experimenting with clang coverage recently.
Have you considered:
1. Disabling/enabling some transformations in llvm (like the change above)
2. Running before SimplifyCFG
?
Both look like reasonable options (no need to write another instrumentation on
AST level, still have access to powerful llvm analysis).
(In reply to Dmitry Vyukov from comment #8)
> I've applied this change (which seems to be the minimal change that fixes
> the reproducer):
If you can make change with a proper flag (and if you think it's useful)
we can enable such flag under -fsanitize-coverage=*
>
>
> Index: lib/Transforms/Utils/SimplifyCFG.cpp
> ===================================================================
> --- lib/Transforms/Utils/SimplifyCFG.cpp (revision 312156)
> +++ lib/Transforms/Utils/SimplifyCFG.cpp (working copy)
> @@ -298,7 +298,7 @@
> const TargetTransformInfo &TTI) {
> assert(isSafeToSpeculativelyExecute(I) &&
> "Instruction is not safe to speculatively execute!");
> - return TTI.getUserCost(I);
> + return TargetTransformInfo::TCC_Expensive;
> }
>
> /// If we have a merge point of an "if condition" as accepted above,
> @@ -5744,6 +5744,8 @@
> }
>
> bool SimplifyCFGOpt::SimplifyCondBranch(BranchInst *BI, IRBuilder<>
> &Builder) {
> + return false;
> +
> BasicBlock *BB = BI->getParent();
>
>
> For foo it changes number of coverage callbacks from 3 to 4, and for bar --
> from 1 to 3.
>
> For Linux kernel it adds 6.8% of coverage callbacks (401279->428496). And I
> would expect most of that is very useful signal for fuzzing: complex,
> speculatable logical conditions (i.e. not if (p && p->x)).
I would be surprised if short circuiting for code like if (p && p->x) happens
frequently.
(roughly speaking) it may only happen if the operands are side-effect-free.
>
>
> > which is why I was experimenting with clang coverage recently.
>
> Have you considered:
> 1. Disabling/enabling some transformations in llvm (like the change above)
I tried running with -O0 a few times to see if this helps (precisely for this
reason). On small puzzles it definitely helps, but on larger benchmarks and on
longer runs I did not see any benefit.
But my measurements were never "scientific" -- I am waiting for our A/B testing
infra.
> 2. Running before SimplifyCFG
No tried.
> ?
>
> Both look like reasonable options (no need to write another instrumentation
> on AST level, still have access to powerful llvm analysis).
I would be surprised if short circuiting for code like if (p && p->x) happens frequently. (roughly speaking) it may only happen if the operands are side-effect-free.
That's exactly what I mean. These additional cases are not if (p && p->x), that's something more like if (x && y).
But my measurements were never "scientific" -- I am waiting for our A/B testing infra.
Will it be able to account for difference in performance between -O2 and -O0? If we get better coverage, but make it 10x slower, it's unsurprising that it will be worse in a limited-time run. Fixed-time run between -O2 and -O0 won't be apples to apples.
> > But my measurements were never "scientific" -- I am waiting for our A/B
testing infra.
>
> Will it be able to account for difference in performance between -O2 and
> -O0?
I hope so.
There are two major metrics: time and # of iterations.
When comparing O2 vs O0, time is less useful, but we still have # of iterations
> If we get better coverage, but make it 10x slower, it's unsurprising
> that it will be worse in a limited-time run. Fixed-time run between -O2 and
> -O0 won't be apples to apples.