sanitize-coverage: incorrectly instruments logical operations

Quuxplusone commented 7 years ago


Bugzilla Link	PR34437
Status	NEW
Importance	P enhancement
Reported by	Dmitry Vyukov (dvyukov@google.com)
Reported on	2017-09-03 03:44:22 -0700
Last modified on	2017-09-12 09:36:10 -0700
Version	trunk
Hardware	PC Linux
CC	dvyukov@google.com, glider@google.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

clang version 5.0.0 (trunk 303084)

Program:

void bad();
void foo(int x, int y) {
  if (x == 0xdead && y == 0xbeef)
    bad();
}
int bar(int x, int y) {
  return x == 0xdead && y == 0xbeef;
}

$ clang test.c -O2 -c -S -o - -fsanitize-coverage=trace-pc

foo:

        movl    %esi, %ebx
        movl    %edi, %ebp
        callq   __sanitizer_cov_trace_pc
        #APP
        #NO_APP
        cmpl    $57005, %ebp            # imm = 0xDEAD
        jne     .LBB0_2
# BB#1:                                 # %entry
        cmpl    $48879, %ebx            # imm = 0xBEEF
        jne     .LBB0_2
# BB#3:                                 # %if.then
        callq   __sanitizer_cov_trace_pc
        #APP
        #NO_APP
        xorl    %eax, %eax
        addq    $8, %rsp
        popq    %rbx
        popq    %rbp
        jmp     bad                     # TAILCALL
.LBB0_2:                                # %if.end
        callq   __sanitizer_cov_trace_pc
        #APP
        #NO_APP
        addq    $8, %rsp
        popq    %rbx
        popq    %rbp
        retq
.Lfunc_end0:
        .size   foo, .Lfunc_end0-foo
        .cfi_endproc

bar:

        movl    %esi, %ebx
        movl    %edi, %ebp
        callq   __sanitizer_cov_trace_pc
        #APP
        #NO_APP
        xorl    $57005, %ebp            # imm = 0xDEAD
        xorl    $48879, %ebx            # imm = 0xBEEF
        xorl    %eax, %eax
        orl     %ebp, %ebx
        sete    %al
        addq    $8, %rsp
        popq    %rbx
        popq    %rbp
        retq

For foo there must be an additional callback at BB#1, for bar there must be an
additional callback somewhere.

This has negative effect on coverage-guided fuzzers as they need to guess N
values simultaneously in order to make progress, rather then guess values one-
by-one and persist progress.

Quuxplusone commented 7 years ago

Yep. The only option you have right now is to use -O0, where the compiler
doesn't perform short-circuiting (I know this sucks).

Another option for us is to move the coverage instrumentation to the point
before short-circuiting happens.

There is already the Clang-based coverage instrumentation that works in clang,
i.e. before any optimization:
http://clang.llvm.org/docs/SourceBasedCodeCoverage.html

I've briefly tried using that one for libFuzzer (see projects/compiler-
rt/lib/fuzzer/FuzzerClangCounters.cpp) and the results were worse than with
SanitizerCoverage, but I've only tried one fuzzing benchmark.

Once we have a proper A/B testing (https://github.com/google/fuzzer-test-
suite/tree/master/engine-comparison), we'll do a better experiment.
(And the next experiment is to use both at the same time: clang coverage and
sanitizer coverage).

Quuxplusone commented 7 years ago

> Yep. The only option you have right now is to use -O0, where the compiler
doesn't perform short-circuiting (I know this sucks).

You mean where is _does_ perform short-circuiting?
But foo is short-circuited and there is even a basic block (BB#1) between
evaluation of the conditions. Why don't we insert the callback there?

Quuxplusone commented 7 years ago

(In reply to Dmitry Vyukov from comment #2)
> > Yep. The only option you have right now is to use -O0, where the compiler
doesn't perform short-circuiting (I know this sucks).
>
>
> You mean where is _does_ perform short-circuiting?

short-circuiting is a transformation a&&b => a&b.
At >= -O1 llvm does it sometimes  at -O0 it never does.

> But foo is short-circuited and there is even a basic block (BB#1) between
> evaluation of the conditions. Why don't we insert the callback there?

Ah, that's a different question.

So, in "foo" we have BB#1 with two successors: LBB0_2 and BB#3
Both successors *are* instrumented, so also instrumenting BB#1 won't give us
any new signal.

Strange thing is that we don't instrument this BB even with -fsanitize-
coverage=trace-pc,no-prune where pruning is disabled. Looking.

Quuxplusone commented 7 years ago

> So, in "foo" we have BB#1 with two successors: LBB0_2 and BB#3
Both successors *are* instrumented, so also instrumenting BB#1 won't give us
any new signal.

Nope. It clearly gives us new signal. If we insert callback at BB#1 we will be
able to understand when we guessed x correctly.

Quuxplusone commented 7 years ago

This is the IR that SanitizerCoverage gets:
  %cmp = icmp eq i32 %x, 57005
  %cmp1 = icmp eq i32 %y, 48879
  %or.cond = and i1 %cmp, %cmp1
  br i1 %or.cond, label %if.then, label %if.end

So, there is no BB here, and we can't insert the BB callback (we do insert
__sanitizer_cov_trace_const_cmp4, but that's not what you are asking about)

This happens *very* early in the optimization pipeline, at the first "Simplify
the CFG". So, essentially the only place where we can insert the callbacks as
you want is Clang -- and that's precisely what clang coverage will give you.

Later this BB gets unsplit back into two BBs (after "X86 DAG->DAG Instruction
Selection") but it's too late.

>> If we insert callback at BB#1 we will be able to understand when we guessed
x correctly.

Yes, you are right.

Quuxplusone commented 7 years ago

Is it possible to easily disable the merging pass if coverage is enabled? Is it possible to run the splitting pass before the coverage? This severe hurts coverage guiding (and almost makes everything we tell about coverage-guided fuzzing a lie...).

Quuxplusone commented 7 years ago

(In reply to Dmitry Vyukov from comment #6)
> Is it possible to easily disable the merging pass if coverage is enabled?

If I modify lib/Transforms/Scalar/SimplifyCFGPass.cpp by returning early in
iterativelySimplifyCFG I get no short-circuiting.
I don't know off the top of my head if there is a flag to do it, or what
consequences of this will be.

> Is
> it possible to run the splitting pass before the coverage?

Unlikely.

> This severe hurts
> coverage guiding (and almost makes everything we tell about coverage-guided
> fuzzing a lie...).

Only to some extent. In real code you *usually* don't get too much short-
circuiting. But surely, I agree, this hurts some of the cases, which is why I
was experimenting with clang coverage recently.

Quuxplusone commented 7 years ago

I've applied this change (which seems to be the minimal change that fixes the
reproducer):

Index: lib/Transforms/Utils/SimplifyCFG.cpp
===================================================================
--- lib/Transforms/Utils/SimplifyCFG.cpp    (revision 312156)
+++ lib/Transforms/Utils/SimplifyCFG.cpp    (working copy)
@@ -298,7 +298,7 @@
                                        const TargetTransformInfo &TTI) {
   assert(isSafeToSpeculativelyExecute(I) &&
          "Instruction is not safe to speculatively execute!");
-  return TTI.getUserCost(I);
+  return TargetTransformInfo::TCC_Expensive;
 }

 /// If we have a merge point of an "if condition" as accepted above,
@@ -5744,6 +5744,8 @@
 }

 bool SimplifyCFGOpt::SimplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder) {
+  return false;
+
   BasicBlock *BB = BI->getParent();

For foo it changes number of coverage callbacks from 3 to 4, and for bar --
from 1 to 3.

For Linux kernel it adds 6.8% of coverage callbacks (401279->428496). And I
would expect most of that is very useful signal for fuzzing: complex,
speculatable logical conditions (i.e. not if (p && p->x)).

> which is why I was experimenting with clang coverage recently.

Have you considered:
1. Disabling/enabling some transformations in llvm (like the change above)
2. Running before SimplifyCFG
?

Both look like reasonable options (no need to write another instrumentation on
AST level, still have access to powerful llvm analysis).

Quuxplusone commented 7 years ago

(In reply to Dmitry Vyukov from comment #8)
> I've applied this change (which seems to be the minimal change that fixes
> the reproducer):

If you can make change with a proper flag (and if you think it's useful)
we can enable such flag under -fsanitize-coverage=*

>
>
> Index: lib/Transforms/Utils/SimplifyCFG.cpp
> ===================================================================
> --- lib/Transforms/Utils/SimplifyCFG.cpp  (revision 312156)
> +++ lib/Transforms/Utils/SimplifyCFG.cpp  (working copy)
> @@ -298,7 +298,7 @@
>                                         const TargetTransformInfo &TTI) {
>    assert(isSafeToSpeculativelyExecute(I) &&
>           "Instruction is not safe to speculatively execute!");
> -  return TTI.getUserCost(I);
> +  return TargetTransformInfo::TCC_Expensive;
>  }
>
>  /// If we have a merge point of an "if condition" as accepted above,
> @@ -5744,6 +5744,8 @@
>  }
>
>  bool SimplifyCFGOpt::SimplifyCondBranch(BranchInst *BI, IRBuilder<>
> &Builder) {
> +  return false;
> +
>    BasicBlock *BB = BI->getParent();
>
>
> For foo it changes number of coverage callbacks from 3 to 4, and for bar --
> from 1 to 3.
>
> For Linux kernel it adds 6.8% of coverage callbacks (401279->428496). And I
> would expect most of that is very useful signal for fuzzing: complex,
> speculatable logical conditions (i.e. not if (p && p->x)).

I would be surprised if short circuiting for code like if (p && p->x) happens
frequently.
(roughly speaking) it may only happen if the operands are side-effect-free.

>
>
> > which is why I was experimenting with clang coverage recently.
>
> Have you considered:
> 1. Disabling/enabling some transformations in llvm (like the change above)

I tried running with -O0 a few times to see if this helps (precisely for this
reason). On small puzzles it definitely helps, but on larger benchmarks and on
longer runs I did not see any benefit.
But my measurements were never "scientific" -- I am waiting for our A/B testing
infra.

> 2. Running before SimplifyCFG

No tried.

> ?
>
> Both look like reasonable options (no need to write another instrumentation
> on AST level, still have access to powerful llvm analysis).

Quuxplusone commented 7 years ago

I would be surprised if short circuiting for code like if (p && p->x) happens frequently. (roughly speaking) it may only happen if the operands are side-effect-free.

That's exactly what I mean. These additional cases are not if (p && p->x), that's something more like if (x && y).

But my measurements were never "scientific" -- I am waiting for our A/B testing infra.

Will it be able to account for difference in performance between -O2 and -O0? If we get better coverage, but make it 10x slower, it's unsurprising that it will be worse in a limited-time run. Fixed-time run between -O2 and -O0 won't be apples to apples.

Quuxplusone commented 7 years ago

> > But my measurements were never "scientific" -- I am waiting for our A/B
testing infra.
>
> Will it be able to account for difference in performance between -O2 and
> -O0?

I hope so.
There are two major metrics: time and # of iterations.
When comparing O2 vs O0, time is less useful, but we still have # of iterations

> If we get better coverage, but make it 10x slower, it's unsurprising
> that it will be worse in a limited-time run. Fixed-time run between -O2 and
> -O0 won't be apples to apples.

Quuxplusone / LLVMBugzillaTest

sanitize-coverage: incorrectly instruments logical operations #33409