Dart VM code coverage reports may include false negatives

rmacnak-google commented 4 years ago

Code coverage reports from the VM are based on invocation counters on functions (Function) and call sites (ICData). If the invocation counter is non-zero, the position is a hit, otherwise a miss. When code coverage was originally added, the optimizer was much simpler. A call site would not be inlined unless it was executed, so every call site that ever executed would increment the ICData usage counter before being optimized away (probably modulo a small set or recognized static functions). Since then, the optimizer has gotten smarter and may inline calls with a zero usage count (during AOT compilation all usage counts are zero). The language has also changed to enforced static types, giving the optimizer an opportunity to inline away calls before any type feedback is collected. If the VM inlines away a call site before its first invocation, code coverage will report it as a miss. This means currently the only way to get accurate coverage data is to run with the VM with the optimizer disabled, --optimization_counter_threshold=-1.

This comes a heavy performance cost. We should add a flag that specifically disables only the optimizations that interfere with code coverage: inlining or otherwise replacing call sites with a zero usage count.

@bkonyi @MichaelRFairhurst

mraleph commented 4 years ago

I think we should consider emitting coverage collecting instructions into the graph (similar to DebugBreak) and decouple it from an already existing machinery - I think this might simplify both coverage and compiler itself.

Optimiser then could contain a very simple canonicalisation rule which removes coverage instructions which already have triggered.

rmacnak-google commented 1 year ago

@liamappelbe

rmacnak-google commented 1 year ago

The situation is probably somewhat improved after, i.e., 889689431595e16310b6af6fa380d2407abf1c07 and 2eb4ea5b34eda70f750d08ae70443d8ee3bee48a, though I'm not sure if this handles everything that might be optimized away.

stereotype441 commented 3 months ago

Note that this recently game up in https://dart-review.googlesource.com/c/sdk/+/368920. See analysis from @jensjoha in this comment: https://dart-review.googlesource.com/c/sdk/+/368920/comment/74b0ce83_a454627a/

jensjoha commented 3 months ago

https://github.com/dart-lang/sdk/commit/889689431595e16310b6af6fa380d2407abf1c07 doesn't really work either because (I think) RecordCoverageInstr goes away with optimization --- see https://dart-review.googlesource.com/c/sdk/+/369920

/cc @mraleph

jensjoha commented 3 months ago

https://dart-review.googlesource.com/c/sdk/+/370220 could make it not go away with optimization.

liamappelbe commented 2 months ago

We already insert some of these instructions behind the --branch-coverage flag. So I guess what we're proposing here is to add the instruction in way more places, to totally replace the ICData based coverage logic.

We'll still want to gate this behind a flag, because the extra instructions will significantly bloat the binary. So either way, users will have to pass a VM flag to collect coverage. The only difference between a --coverage flag and the existing --optimization_counter_threshold=-1 flag is that --coverage will have a much smaller performance impact.

jensjoha commented 2 months ago

Also worth noting is that --optimization_counter_threshold=-1 for whatever reason does not produce the right output.

mraleph commented 2 months ago

We'll still want to gate this behind a flag, because the extra instructions will significantly bloat the binary.

It's unclear if that really matters that much. I think the only binary we could potentially care about is app-jit snapshot size - so we could add a flag which we would disable during app-jit training, but keep enabled by default otherwise. JIT snapshots for tools are on their way out anyway, so it's a really niche thing.

The size of unoptimized code might be of some consideration (we should probably measure the impact), but I am not sure if it is worth worrying about it too much (and it is already rather bloated).

liamappelbe commented 2 months ago

It's unclear if that really matters that much.

I think I've missed something. Are these coverage instructions being removed in AOT builds somehow?

mraleph commented 2 months ago

It's unclear if that really matters that much.

I think I've missed something. Are these coverage instructions being removed in AOT builds somehow?

Right now they are never emitted in AOT in the first place. See SupportsCoverage:

https://github.com/dart-lang/sdk/blob/d0998d69965ddec4848ca6308990aa082e503105/runtime/vm/compiler/frontend/base_flow_graph_builder.cc#L24-L30

FWIW AOT builds don't use ICs for method dispatching and all the code is specialized and optimized - so you can't collect good coverage data with the current approach anyway. With RecordCoverage instructions we can actually support collecting coverage from AOT builds if we wanted. (and if that's indeed something that we wanted we could also do few local optimizations to significantly limit code size blow out as well)

dart-lang / sdk

Dart VM code coverage reports may include false negatives #42061