runtime/coverage: Eliminate dominator Coverage Units

EstebanOlmedo commented 1 year ago

The current building for coverage implementation works by rewriting the source files adding some instrumentation statements that later are used to determine the code lines that were executed during runtime, in the following code snippets we can see an example of this workflow.

package main

import "fmt"

func main() {
    a := 10
    if a >= 10 {
        a -= 10
    }
    fmt.Println(a)
}

package main

import "fmt"

func main() {x_0[0] = 3 ; x_0[1] = x_P ; x_0[2] = 0; x_0[3] = 1; // Counters and metadata
    a := 10
    if a >= 10 {x_0[5]=1;
        a -= 10
    }
    x_0[4]=1;fmt.Println(a)
}

There are 3 coverable units in the example above, but only two of them are needed to cover the whole main function in our sample program. By knowing the state of x_0[4] and x_0[5] one can infer the state of x_0[3]:

If x_0[4] has a value of 1, then x_0[3] must also have it, because it’s impossible to get to the block of code that x_0[4] covers without passing first through the code lines that x_0[3] covers.
The same logic applies to x_0[5].

Following this assumption, one can see that some of the cover statements can be inferred from the value of others. One can know if a unit is inferable by generating the dominator tree out of the flow graph of the function (there’s going to be a one to one relationship between the coverable units and the nodes of the graph), if its corresponding node isn’t a leaf then the unit is inferable. In the figure below there’s the corresponding flow graph (first image) for the sample code and its dominator tree (second image).

Flow graph

Dominator Tree

Limitations

With this proposal a new issue arise when we have a noninstrumented package which calls os.Exit() as we’re erasing some of the cover statements based in the flow of execution of the instrumented function, there might be cases when the information dumped is incomplete and there’s no way to recover it. In the following code snippets there’s an example of this behavior, after building the main package erasing inferable units and executing the program, we’ll get as result that any unit was hit, when that’s not the case.

package noninstrumented

import “os”

func DoSomething(x int) bool{
    if x == 10 {
        os.Exit(0)
    }
    return true
}

package main

import “noninstrumented”

func main() { x_0[0] = 3 ; x_0[1] = x_P ; x_0[2] = 0;
    a := 0
    if noninstrumented.DoSomething(10) { x_0[5] = 1;
         a += 20
    }
    x_0[4] = 1;a++
}

Motivation

One of the purposes of this proposal is to reduce the overhead of runtime instrumentation to detect dead code across codebases by running instrumented binaries during a certain amount of time and then gathering a report of the executed code lines.

ianlancetaylor commented 1 year ago

We use proposals for API or language changes. I don't see a reason for this to be a proposal, so taking it out of the proposal process.

ianlancetaylor commented 1 year ago

CC @thanm

thanm commented 1 year ago

@EstebanOlmedo it is an interesting idea, although potentially tricky to implement. FYI there is a research paper that gives an algorithm for doing this in a systematic way.

I am curious as to how much of the overhead from a "go build -coverage" run comes from the instrumentation code (e.g. counter updates) and how much is from the code that writes out the files (covdata and covcounters) for your use case.

golang / go

runtime/coverage: Eliminate dominator Coverage Units #62379

Limitations

Motivation