Retain and access unprocessed coverage data?

ferdnyc commented 3 months ago

Whenever unittest-parallel is called with --coverage or the other coverage-related arguments it accepts, it will generate coverage data for each of the parallel jobs into a temporary directory, then automatically combine them, report the coverage stats, and (optionally) generate a detailed report in the format(s) requested.

But what if I just want the raw coverage data, unprocessed and un-combined?

For example, in one project I'm using unittest-parallel in combination with tox and tox-gh to execute tests under a range of Python versions and OSes in a GitHub Actions CI workflow.

To collect complete coverage, the data from all of the CI jobs in the workflow matrix needs to be combined in a separate workflow job, after all of the test runs are completed.

Coverage.py's coverage run command even has an argument (-p) that facilitates this, by changing the default name of the raw sqlite data file from .coverage to .coverage.$HOSTNAME.$PID.$RANDOM so that the filenames won't collide when the data is aggregated.

It would be helpful if unittest-parallel provided an option that similarly disabled the implicit coverage combine and coverage report steps (or their moral equivalents via the Python API), and instead of automatically deleting the raw coverage data files, made them available at the end of the run for further aggregation.

ferdnyc commented 3 months ago

In fact, looking at the coverage.py source code...

coverage.Coverage() can be passed a data_suffix=True argument, instead of the data_file argument, to switch on -p mode for the default output filename (changing it from .coverage to .coverage.unique-identifier)
coverage.combine() will by default aggregate all of the data files matching its base filename or default, if not passed a list of filenames
coverage.combine() also automatically deletes all of the files it combines, unless it's also passed a keep=True argument to prevent that deletion behavior.

So, rather than using the temporary directory and generating its own temporary filenames within it for the coverage data, it might be easier and more flexible if unittest-parallel just passed coverage.Coverage() the data_suffix=True argument to get unique filenames for each job in the current directory (which is coverage's standard behavior), or in another location if specified by the user. And then used cov.combine(None) instead of supplying a filename list, to let coverage process (and delete) all of those default-named files.

A new commandline argument (--coverage-preserve or something, maybe) could then be used to add keep=True to the combine() call, so that the raw files wouldn't be deleted and would remain available after the run.

craigahobbs commented 3 months ago

Thanks for the suggestion. I think this will work and simplify things. I don't remember why I used the temporary directory anymore. I'll do some experimentation and, if it looks good, will push a branch with the new option so you can try it out.

craigahobbs / unittest-parallel

Retain and access unprocessed coverage data? #21