PacificBiosciences / trgt

Tandem repeat genotyping and visualization from PacBio HiFi data
Other
107 stars 9 forks source link

Wrong Short Tandem Repeat size in vcf file #30

Open pailloufat-stack opened 7 months ago

pailloufat-stack commented 7 months ago

Hi,

I work on 16 mice samples. I look at STR variants in 13 / 16 of them, which are heterozygous ; the 3 others are wild homozygous. What I'm interested in are the size differences of the STRs between these 13 samples. The STRs found in the 13 samples with the same size do not interest me.

I ran TRGT, and I created a merged VCF file. I modified it a bit to get the information I want (with the "MS" field).

I noticed some errors. For example, I have this STR (I reduce the numbers of samples to 3 to make it clearer) : I normally have three deletions in the STR with 3 different sizes :

chr2 154720638 0/6;0(0-219),0(0-75) 0/1;0(0-219),0(0-63) 0/5;0(0-219),0(0-105)

When I look at the IGV track, I have one deletion but at the same size (146pb) , which is not reflected in the VCF :

image

I should have : 219-75 pb = 146 pb for sample 1, 158 pb for sample 2 and 114 pb for sample 3.

Do I miss something? Best

hdashnow commented 7 months ago

Do these mice carry a humanized HTT sequence or just mouse sequence? What is the TRGT definition for this locus? What are the sequences of those inserted and deleted bases? What was the full allele sequence reported by TRGT?

pailloufat-stack commented 7 months ago

They only carry mouse sequences. The TRGT definition for this locus is (CCTCTG)n . About the inserted sequences, you talk about the 61bp insertion?

I show you the full line of the initial VCF (which is pretty unreadable, tell me if you want the file) :

chr2 154720638 . CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTC CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGAGGTGCCACATTCACCTGGTGACCTTTTAGCTCAGGCTGTTCTCATGACTCCTGTCTTTATC,CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTC,CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGAGGTGCCACATTCACCTGGTGACCTTTTAGCTCAGGCTGTTCTCATGACTCCTGTCTTTATC,CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGAGGTGCCACATTCACCTGGTGACCTTTTAGCTCAGGCTGTTCTCATGACTCCTGTCTTTATC,CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGGCTCTGGCTGTGCCTCTTTATC,CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGAGGTGCCACATTCACCTGGTGACCTTTTAGCTCAGGCTGTTCTCATGACTCCTGTCTT.. AC=2,1,8,1,1,1;AN=32;END=154720856;MOTIFS=CCTCTG;SF=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15;STRUC=(CCTCTG)n;TRID=MOUSE_STR_936638 GT:ALLR:AP:MS:AL:MC:SD:AM 0/0:37-230,219-219:0.9125,0.9125:0(0-219),0(0-219):219,219:40,40:125,125:.,. 0/6:189-228,127-143:0.9125,0.551471:0(0-219),0(0-75):219,133:40,13:139,111:.,. 0/0:197-230,219-219:0.9125,0.9125:0(0-219),0(0-219):219,219:40,40:125,125:.,. 0/3:194-226,131-147:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:158,92:.,. 0/0:177-240,219-219:0.9125,0.9125:0(0-219),0(0-219):219,219:40,40:117,117:.,. 0/1:206-226,125-140:0.9125,0.492188:0(0-219),0(0-63):219,125:40,11:149,82:.,. 0/3:204-228,131-143:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:145,105:.,. 0/5:212-231,79-149:0.9125,0.868421:0(0-219),0(0-105):219,105:40,19:168,82:.,. 0/3:204-231,131-146:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:128,60:.,. 0/3:203-230,131-144:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:109,49:.,. 0/3:198-225,130-173:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:150,100:.,. 1/2:125-150,205-233:0.492188,0.898374:0(0-63),0(0-221):125,221:11,41:102,148:.,. 0/3:205-238,129-137:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:79,28:.,. 0/3:195-228,131-143:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:137,90:.,. 0/3:205-230,131-138:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:92,76:.,. 0/4:201-241,131-143:0.9125,0.514925:0(0-219),0(0-69):219,131:40,12:132,43:.,.

egor-dolzhenko commented 7 months ago

Thank you for reporting this. Would you be open to sharing BAM slices containing this repeat for these three samples? If yes, please feel free to share them by email.

pailloufat-stack commented 7 months ago

I just contacted you. Thanks

Actually, I noticed many wrong interpretations in the VCF file comparing to the BAM files. For example, the sample12 is 1/2;0(0-63),0(0-221) , where I should get two "new" alleles here but I still have the wild allele and the 146pb deletion in the IGV track.