alexdobin / STAR

RNA-seq aligner
MIT License
1.83k stars 504 forks source link

Understanding variants SAM tags #2121

Closed Yenaled closed 5 months ago

Yenaled commented 5 months ago

I'm using STAR WASP to map variants in reads, and outputting the relevant SAM tags.

I'm having trouble interpreting them. For example, I get the following:

vA:B:c,1,2 vG:B:i,16570789,16570893 vW:i:1

What does vA:B:c,1,2 mean?

Other times, I'll see stuff like vA:B:c,1,2,3,1 or vA:B:c,4,1,1. What do those mean?

I am using a VCF file containing two strains of interest (placed the last two columns of the VCF file). But which of the vA:B: output is the ref allele? Which one is the first strain allele? Which one is the second strain allele?

Yenaled commented 5 months ago

OK, I understand now. Look at the GT in the VCF file. The GT in the VCF is X/Y and therefore in vA tag, 1=X while 2=Y (and the rest are for special cases) -- it seems vA is agnostic to what is REF or ALT (which is good, since we might want to map against two non-REF variants). If there are multiple strains in the VCF file, only the GTs in the first column are considered.

I've answered this on biostars (and let me know if there are any problems with my answer). Closing this for now.

https://www.biostars.org/p/9592236/