cranelift: line coverage gaps in ISLE-generated lowering

mmcloughlin commented 1 year ago

I was musing about whether it would be possible for us to generate inputs to ISLE lowering that would exercise all code paths. So it occurred to me to check what coverage the cranelift unit/integration tests already have.

The line coverage reports suggest high coverage, but there are some gaps that might be worth discussion.

Coverage Generation

I found the taiki-e/cargo-llvm-cov tool to be very easy to use for coverage report generation.

In the cranelift directory:

cargo llvm-cov --disable-default-ignore-filename-regex --ignore-filename-regex='cargo/' --html

By default the llvm-cov tool won't report coverage for the ISLE generated files. The --disable-default-ignore-filename-regex was necessary to disable this default behavior, and then --ignore-filename-regex='cargo/' ignores coverage from external dependencies.

Results

High-level coverage metrics for the ISLE generated files (columns are function/line/region):

Screen Shot 2023-11-15 at 8 00 46 PM

There's a lot of uncovered code to sift through, a lot of which is clearly not a concern (including uncovered unreachable! statements). But there seem to be some cases in here that might be worth looking into. For example, in constructor_to_amode_add:

Screen Shot 2023-11-15 at 8 02 50 PM

Questions

Are any of these coverage gaps interesting, and worth adding tests for?
Are these gaps covered by fuzzing? (related #6293)
Is 100% line coverage actually a reasonable goal here? What is the right metric? (related #1151)

cc @jameysharp @fitzgen

mmcloughlin commented 1 year ago

To answer the fuzzing question, here's a recent report:

https://storage.googleapis.com/oss-fuzz-coverage/wasmtime/reports/20231114/linux/src/wasmtime/target/debug/build/cranelift-codegen-292e7cb08674c89f/out/isle_x64.rs.html

It appears the to_amode_add case above is covered by the fuzzer. I haven't looked any deeper into the differences though.

(Fuzzing report from https://introspector.oss-fuzz.com/project-profile?project=wasmtime)

alexcrichton commented 1 year ago

Cranelift I think gets a few primary sources of coverage via our testing, but they're unfortunately not all in the same place:

Wasmtime-based fuzzing (e.g. the differential fuzzer)
Cranelift-based fuzzing (the cranelift-fuzzgen fuzzer)
Cranelift's filetest-based test suite
Wasmtime's own test test suite which includes the upstream WebAssembly spec test suite

If something isn't covered by any of those then I think it'd be good to enhance. That being said while it would be nice for at least one of those suites to cover everything it's probably not the most practical. I suspect the best place to add new tests would be the filetest-based test suite of Cranelift.

IMO coverage of certain files is more important than others. For example I think the lowering rules are a great showcase of what should ideally be all covered. Runtime bits in Wasmtime I also think should all hopefully be covered via one way or another. I haven't ever found a good way to manage things like unreachable!() which discounts from looking like you're entirely covered but you're still practically covered.

fitzgen commented 1 year ago

As I mentioned offline, I think the important part for ISLE-generated code isn't reaching 100% line coverage, but 100% rule RHS coverage. If we do the former, we will also do the latter, but the latter is what is important IMO.

I could imagine emitting some sort of probes/counters in the ISLE-generated code to track this kind of thing.

Even better would be a test case generator that takes an ISLE lowering rule and creates a clif filetest and we could run that generator on each of our backends' complete rule sets, as you were musing about, to get 100% rule coverage. Big plus one to that idea from me!

bytecodealliance / wasmtime