RWilton / Arioc

Arioc: GPU-accelerated DNA short-read alignment
BSD 3-Clause "New" or "Revised" License
59 stars 8 forks source link

XA Tag #25

Closed karlkashofer closed 2 years ago

karlkashofer commented 2 years ago

Other aligners (bwa in this case) use the XA tag to output alternative alignments for a read, like:

HWI-A01349_BSF_0941:1:2472:14507:8234#A274_16_S81560 1171 chr1 9997 0 51S100M = 10032 -65 CCCACCACTACCCTCCTCCAGCGCCGACGGCTGCGCCTGAGGCGTATTATACCGATAACCCTAACCCTTACCCTAACACTATCCCTTACACTACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCACCCTAACCC F,,::F,,,,::F,F,,,,,,:,,F:,,,,F,,F:,F,,,,,,,,:,,:,,,,,,:F,:,F,F,F,F,,:,,,,,FF,FFF,,:,F,:,,F,,,:,F:,,,,FF:F,,,,:F,F,FF,,FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF XA:Z:chrX,+156030605,96M55S,6;chr5,-10067,55S96M,7;chr5,-10085,55S96M,7;chr5,-10025,55S96M,7;chr5,-10001,55S96M,7;chr5,-10061,55S96M,7;chr5,-10055,55S96M,7;chr5,-10103,55S96M,7;chr5,-10007,55S96M,7;chr5,-10337,55S96M,7;chr5,-10037,55S96M,7;chr5,-10049,55S96M,7;chr5,-10097,55S96M,7;chr5,-10091,55S96M,7;chr5,-11357,55S96M,7;chr5,-10043,55S96M,7;chr5,-10079,55S96M,7;chr5,-10073,55S96M,7;chr5,-10013,55S96M,7;chr5,-10019,55S96M,7;chr5,-10343,55S96M,7;chr5,-10031,55S96M,7;chrX,+156030683,59M92S,1;chr3,+198173892,59M92S,1;chr12,-10195,92S59M,1;chr3,-10005,94S52M1I4M,1;chr12_GL877875v1_alt,-195,92S59M,1;chr3_KI270784v1_alt,-61610,92S59M,1; MC:Z:79M72S MD:Z:17A8C3A4A2C3A46A10 PG:Z:MarkDuplicates RG:Z:A274_16 NM:i:7 MQ:i:0 AS:i:65 XS:i:66

I cant find any of these tags in my Arioc mapped BAM. Are they not supported ? I have several downstream tools that ask for these tags.

RWilton commented 2 years ago

Arioc does not emit an XA field, and there are no plans to support XA in the future.

In fact, XA is useful mainly for debugging the aligner itself. It's not part of the SAM specification, it has no syntax, and it contains no reliable information for several reasons, the main reason being that it's an unordered list of whatever the aligner wants to put there. As your example illustrates, each item in BWA's XA list is evidently RNAME, POS, and CIGAR (plus a mystery integer), but the list is unordered and without context such as the corresponding opposite-mate mappings (if any).

If you are truly interested in secondary mappings, you should ask the aligner to report them. Bowtie and Arioc support that explicitly; BWA might only support it for unmapped reads (I don't remember offhand).

I have seen online comments to the effect that one can use XA to determine whether a mapping is unique (e.g., by looking for missing or null XA tags). But you can get the same result by filtering on MAPQ, by looking for the XS field, or by filtering mappings where a secondary mapping exists.