AdamaJava / adamajava

Other
14 stars 4 forks source link

qp2 kmers tweak #331

Closed holmeso closed 1 year ago

holmeso commented 1 year ago

Description

When profiling qp2 running against a BAM, it was observed that the consumer threads were spending a large amount of time in KmersSummary.parseKmers method.

This PR contains a change to the way that the kmers information is tallied. Instead of populating 144 byte arrays of length 6 (for a 150 length read), and then looping through each byte in each array to calculate the position in the array score, this new method walks through the read once, and after each byte updates an int, and uses a bit mask to get the appropriate part of the int to get the position in array score.

The resulting code produces the same output (once you have taken into account qp2's random read names)

A change to BamSummaryReport was also made to cut down on the number of calls to SAMRecord.getReadGroup as this method performs some work.

Type of change

How Has This Been Tested?

New and existing unit tests pass. Code was run against a WGS BAM file and the output was compared against a run using the current code. Results were the same (apart from non-deterministic parts of qp2 output)

Are WDL Updates Required?

No changes to the invocation of qp2 were made and so no wdl updates required.

Checklist: