expanded parameterization of align_and_count and additional output metrics

Summary

This PR adds functionality to optionally filter reads after mapping in the align_and_count task, so the counts of mapped reads are comparable to those following filtering during genome assembly. It also adds new numeric outputs relevant for general QC purposes.

New input parameters

The filtering has the following parameters:

filter_bam_to_proper_primary_mapped_reads: enable filtering
- default: false — no filtering is performed
do_not_require_proper_mapped_pairs_when_filtering: do not exclude reads lacking the "proper pair" bit; this is helpful/necessary to set to true when using single-end reads as input if filtering is enabled
- default: false — reads are filtered to proper pairs if filtering is enabled
keep_singletons_when_filtering: singleton reads from paired-end data are kept; this does not affect single-end reads
- default: false — singleton reads are excluded during filtering
keep_duplicates_when_filtering: reads marked as duplicates are kept; this does not supersede exclusion for violations of other criteria
- default: false — duplicate reads are excluded during filtering

New output metrics

This PR also adds new numeric output metrics to align_and_count:

pct_total_reads_mapped: the percent of input reads mapping to any of the input reference sequences
- this is helpful for assessing the fraction of reads in a sample originating from sources corresponding to the reference sequences
pct_lesser_hits_of_mapped: of the reads mapping to reference sequences input to align_and_count, the percent mapping to hits that are not the top hit
- this is helpful for assessing cross-talk between hits

The new outputs are exposed in several of the workflows that have singular outputs from align_and_count. A few other workflows call align_and_count, but output an aggregate report with info from multiple inputs.

Recommended usage

The following values are recommended for most use cases, to count high-quality read mappings with duplicates included.

filter_bam_to_proper_primary_mapped_reads=true
keep_duplicates_when_filtering=true

broadinstitute / viral-pipelines