Merging code coverage scales very poorly for large coverage files

AArnott commented 2 years ago

@SteveBush reports f064b9a150eb529716d393fc0f1f60b55a301eb4 made merging code coverage files much slower:

From https://github.com/AArnott/Library.Template/commit/85d970c258bafda482b003ba79f4510a491a570d#commitcomment-76344374:

The addition of normalizing the file path separators increases the build time of this step in my build from 36 seconds to 19 minutes, 30 seconds. It would be great to have a parameter to just do the {reporoot} replacement and skip the path parameter substitution.

This is obviously a problem, and must be one of scale because it works fast for me.

Not normalizing file paths means the reportgenerator tool treats the files as separate so we don't get a properly merged code coverage report. But the perf is also unacceptable.

AArnott commented 2 years ago

@SteveBush Is there any chance you could share a directory of coverage files that I could use to test fixes?

AArnott commented 2 years ago

Options:

Normalize paths on the agent, before collecting it as an artifact. This would move the problem rather than solve it. But if there were several agents, the work would be split across those several agents.
We could parse the xml files and navigate straight to the one attribute that needs to be normalized and fix that. Then re-serialize the xml file. This could eliminate the regex entirely. But even if we used it, it would cut its use from every line of every file to just a fraction of a line, and only a small fraction of the lines in the file. But we'd be parsing the xml. But surely that can be done faster than this regex is taking because the merger tool itself parses the xml very quickly.

Any other ideas?

AArnott / Library.Template

Merging code coverage scales very poorly for large coverage files #165