broadinstitute / viral-ngs

Viral genomics analysis pipelines
Other
190 stars 67 forks source link

add optional QC check to filter_lastal_bam #961

Closed tomkinsc closed 5 years ago

tomkinsc commented 5 years ago

This adds an optional QC check to taxon_filter.py::filter_lastal_bam() to raise a QCError() if the sample name (bam file basename) begins with any number of negative control prefixes and lastal has identified reads to keep after filtering, where the number of reads is above a thresold of a minimum acceptable read count. This can be enabled via --errorOnReadsInNegControl. The readcount threshold can be set via --negativeControlReadsThreshold and the negative control prefixes can be set via --negControlPrefixes with defaults of "neg, water, NTC". The check has been added to the corresponding WDL file as well (off by default). A new errors.py file has been added to store error classes that may be useful across files.

If we end up liking this feature, the QC functionality can be abstracted into a separate class to inspect bam files and be included in various places in the codebase, along with an argparse base object that configures arguments for the QC checker.

tomkinsc commented 5 years ago

Not sure we need to add errors.py to the travis/coveralls/rtd scripts just yet; maybe once there's more functionality there than renamed base classes?