broadinstitute / viral-pipelines

viral-ngs: complete pipelines
Other
51 stars 28 forks source link

expanded parameterization of align_and_count and additional output metrics #525

Closed tomkinsc closed 6 months ago

tomkinsc commented 6 months ago

Summary

This PR adds functionality to optionally filter reads after mapping in the align_and_count task, so the counts of mapped reads are comparable to those following filtering during genome assembly. It also adds new numeric outputs relevant for general QC purposes.

New input parameters

The filtering has the following parameters:

New output metrics

This PR also adds new numeric output metrics to align_and_count:

The new outputs are exposed in several of the workflows that have singular outputs from align_and_count. A few other workflows call align_and_count, but output an aggregate report with info from multiple inputs.

Recommended usage

The following values are recommended for most use cases, to count high-quality read mappings with duplicates included.