questions regarding appending and saving coverage data

fuzzah commented 2 years ago

Hello. Thanks for the project!

I need to collect coverage from multiple runs of the same application. I use mode of operation "instrument now, test later". Q1. Does coverage data get appended or overwritten? Can I control this behavior?

After every run the coverage data gets flushed for 10 whole seconds. I have a giant code base (CoreCLR) and I need to perform few thousand runs and I can't afford wasting the whole day just on collecting coverage. Q2. Can I do something to improve this speed?

SteveGilham commented 2 years ago

Sorry about the delay getting back to you.

If you instrument a set of files, that includes the location of the coverage file. Each time that code is executed that creates a set of .acv coverage data files (written in parallel to avoid concurrency bottlenecks), and the collection phase then adds that new coverage data to the current state of the coverage file, one file at a time. The collection phase --outputFile=VALUE parameter (and equivalents) allows you to redirect the accumulated coverage to a second file, rather than overwriting the original.

So if you do run test t1, collect, run test 2, collect, the accumulated coverage ends up in the file specified at instrumentation time. If you do run test t1, collect -o file1, run test 2, collect -o file2 then file1 will have the coverage for test1, and file2 that for tests2, with the original file untouched. Other cases are allowed - run test t1, collect, run test 2, collect -o file2 will update the base file for test1 and put the accumulated coverage in file2; and run test t1, collect -o file1, run test 2, collect which puts test1's coverage in file1, and test2's into the original coverage file specified when instrumenting.

Also, if you do run test t1, run test 2, collect, the intermediate coverage data will accumulate and all be applied together

The default collection method just streams each visit to the intermediate data file as it happens. The --single option cuts down the size of the coverage data by only recording the first visit to each code or branch point. Alternatively, --defer option accumulates the data in memory so that it coalesces visits, and then outputs the data during process shutdown -- this turns N instances of "point X visited once" to one "point X was visited N times", but is limited by any constraints on the OnShutdown event handler run duration.

Adding to the coverage report file can be a bottleneck as the raw coverage data are essentially random access references, so the whole report has to be loaded into memory to apply the recorded visits. I know the XML parser is not especially speedy; I have not done any experiments at scale with the current JSON parser.

Then there are the obvious things : targeted filtering -- such as excluding third-party assemblies -- to reduce the size of the report as well as the amount of data collected; and not using options like --callContext which increase it.

fuzzah commented 2 years ago

Thank you for explanation. Currently I am indeed testing the XML parser from System.Xml namespace :) Filtering is already applied as the assembly I test is the only MSIL assembly I have.

I will try options --single and --defer and come back if it won't help. Thanks again. Closing for now.

SteveGilham / altcover

questions regarding appending and saving coverage data #158