fulcrumgenomics / fgbio

Tools for working with genomic and high throughput sequencing data.
http://fulcrumgenomics.github.io/fgbio/
MIT License
309 stars 67 forks source link

Warn or raise an exception when CollectDuplexSeqMetrics is run on a consensus BAM #992

Closed clintval closed 1 month ago

clintval commented 3 months ago

It is possible to run the tool CollectDuplexSeqmetrics on a consensus BAM file but the results will not be intuitively correct since metrics collection is expecting the input be raw reads (and not consensus sequences) annotated with the MI tag:

https://github.com/fulcrumgenomics/fgbio/blob/d5b38ca053d7b6dcd3661c7d2b6192cfc40f57b1/src/main/scala/com/fulcrumgenomics/umi/CollectDuplexSeqMetrics.scala#L223-L225

We should either support gathering metrics over the consensus BAM or raise an exception for unsupported behavior if any of the consensus SAM tags are encountered.