broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
984 stars 368 forks source link

New metric: undetermined barcode count #1722

Open matthdsm opened 3 years ago

matthdsm commented 3 years ago

Feature request

Tool(s) involved

ExtractIlluminaBarcodes and IlluminaBasecallsToSam

Description

Currently we use the barcodes file from ExtractIlluminaBarcodes to get the most prevalent undetermined barcodes. Since we can now skip ExtractIlluminaBarcodes by using MATCH_BARCODES_INLINE in IlluminaBasecallsToSam, it would be useful if the metrics would include a count for the undetermined barcodes.

The count can be retrieved using the following script https://gist.github.com/matthdsm/41efcc38c5acb4125de773bd2aa57c5e

Thanks M

matthdsm commented 3 years ago

To clarify, I'm thinking of a top 100 of counts/barcode found, similar to the output from bcl2fastq

matthdsm commented 3 years ago

Hi @jacarey,

Any idea on when this might make it into a release? We'd love to start skipping ExtractIlluminaBarcodes, but we do need the barcode counts.

Thanks Matthias