fulcrumgenomics / fgbio

Tools for working with genomic and high throughput sequencing data.
http://fulcrumgenomics.github.io/fgbio/
MIT License
309 stars 67 forks source link

Documentation for custom SAM tags from CallDuplexConsensusReads #959

Open BrettLiddell opened 7 months ago

BrettLiddell commented 7 months ago

Hello,

Is there a wiki that explains the custom SAM tags in the output BAM files from CallDuplexConsensusReads? Do they have any significance? For example in my BAM file output, after a read, I have:

aD:i:1 bD:i:0 cD:i:1 aE:f:0 bE:f:0 cE:f:0 RG:Z:A MI:Z:1 aM:i:1 bM:i:0 cM:i:1 RX:Z:TTGGC-GTCAC ac:Z:CTGCTGCGGTGGCGGCAGAGGAGGGATGGAGTCTGACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATG ad:B:s,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ae:B:s,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 aq:Z:==============================================================================================

I was going to filter out the B type tags so I can use Gemini as a read stitcher but was wondering if I should retain them for any reason?

nh13 commented 7 months ago

It's a bit hidden on our wiki as well as in code. Basically there are per-read tags and per-base tags: the second letter in the tag is lower case if it is per-base, upper case if it is per-read.

These tags are useful for understanding how many reads were used to call each read/base, among other things, and are used in FilterConsensusReads per our best practices.

Does that help?

BrettLiddell commented 7 months ago

Yes, much appreciated!