bioinformatics-centre / BayesTyper

A method for variant graph genotyping based on exact alignment of k-mers
86 stars 7 forks source link

added "*" in ALT after genotyping #8

Open songtaogui opened 5 years ago

songtaogui commented 5 years ago

Hi,

After running BayesTyper genotype, the output vcf file added * allele to the ALT field.

For example: This is the input record, the ALT field only contains T allele :

1   119885  .   TCTCTTTTTCTCGAACACGCAGGAGAACTGTGCGTCATTATATTAAGAGGAAAAAGGTCCCAAGTGGACTAAGAAAACAAAGTGCCCGAGAAGGCAGCAAACGAGAAGAGGGGGACAAAAAGAAAAAAAGAAAGAAACCAAAATAAAAGAAAGAAAAACTAGAAACTAGAAACAAGGGGGGGGGGTGCAACCCCCACCACCCCACTTAAATTAAGGCCACAATTGTCTAATATCTTTTGCTCCTGCCATTCCCCAAAGCTTAACCTCCTCATGAATCTCTTGCAACATGCTGCGGATACTCGGGGATACCCCATCAAACACACAAGCATTCCTTTGTTTCCACAACCTCCAAGACACCAAGATAACTAGGGAATTAAACCCCTTTCTTTTACTATTTGGCACTTTCTGCTCAGCCTTCCTCCACCATTCTTGGAAAACCACATCTGTTATTTCTGGAGCCAAAGGCAGCAATCCCACCTTGTTCAAAGTCTGGGCCCAAATGTCTCTAGCAAAAACGCAAGCCACCAGAATGTGTTGTGCTGTTTCCTCTTGCTGATCACAAAGAAGACATTTGTCAGGATGGTTCAAACCCCTACGAGCCAGTCTATCTGCTGTCCAACATTTGTTAAGGGATGCAAGCCAAATAAAGAATTTGCATTTCTGAGGTGCCCATGTCCGCCAAATTCGCTCGGACGGTTCAAAATAAACGGAACCAGCGAAGAAACGATCATAAGCTGACTTAGATGAATACTGCCCATTGGCCGTTGGCAGCCATTTATGCTGATCTGAAATTCCAGGTTGCAAATGAATTCCCCGAGTGACATCCCATATATAAAAGAAGCCCATAAGCACTTCGGCCGGCAAACTACCAGTAATGTCCGACACCCATCTATTATTTAGCAAAGCCTCGTACACAGATCTGCTCTTCTGAATTTTCATGGGGATGCGACTCAGGAGAACTGGAGCAAGCTCACCCACAGATTTCCCATGTAACCATTTATCAGTCCAAAACAAAGTATTCTGGCCATCTCCCACAATAGAGCAAACAGAGACTGAGAAAAATGCTGCTGCATTTGGATGCACCTGAATGTCAAAATCTGACCATGACCGTTCAGGCTGGGTCTTTTTAAGCCACATCCAACGCATATTCAGAGACCAGCCAAGCACCTCCAAATTGTGGATTCCGAGGCCCCCCCTACTGATAGGCCTGCAAACCTTGGACCAACCGACAACACAATGGCCTCCTCTAACATCTGTCCTTCCCTTCCATAAAAACCCCCTGCGAATCTTATCAATAGCTCTAATTAC    T   .   .   ACO=B73_mantaSV

And this is the same record in the output, the ALT field turned to T,*, and a sample was genotyped as 2/2:

1   119885  .   TCTCTTTTTCTCGAACACGCAGGAGAACTGTGCGTCATTATATTAAGAGGAAAAAGGTCCCAAGTGGACTAAGAAAACAAAGTGCCCGAGAAGGCAGCAAACGAGAAGAGGGGGACAAAAAGAAAAAAAGAAAGAAACCAAAATAAAAGAAAGAAAAACTAGAAACTAGAAACAAGGGGGGGGGGTGCAACCCCCACCACCCCACTTAAATTAAGGCCACAATTGTCTAATATCTTTTGCTCCTGCCATTCCCCAAAGCTTAACCTCCTCATGAATCTCTTGCAACATGCTGCGGATACTCGGGGATACCCCATCAAACACACAAGCATTCCTTTGTTTCCACAACCTCCAAGACACCAAGATAACTAGGGAATTAAACCCCTTTCTTTTACTATTTGGCACTTTCTGCTCAGCCTTCCTCCACCATTCTTGGAAAACCACATCTGTTATTTCTGGAGCCAAAGGCAGCAATCCCACCTTGTTCAAAGTCTGGGCCCAAATGTCTCTAGCAAAAACGCAAGCCACCAGAATGTGTTGTGCTGTTTCCTCTTGCTGATCACAAAGAAGACATTTGTCAGGATGGTTCAAACCCCTACGAGCCAGTCTATCTGCTGTCCAACATTTGTTAAGGGATGCAAGCCAAATAAAGAATTTGCATTTCTGAGGTGCCCATGTCCGCCAAATTCGCTCGGACGGTTCAAAATAAACGGAACCAGCGAAGAAACGATCATAAGCTGACTTAGATGAATACTGCCCATTGGCCGTTGGCAGCCATTTATGCTGATCTGAAATTCCAGGTTGCAAATGAATTCCCCGAGTGACATCCCATATATAAAAGAAGCCCATAAGCACTTCGGCCGGCAAACTACCAGTAATGTCCGACACCCATCTATTATTTAGCAAAGCCTCGTACACAGATCTGCTCTTCTGAATTTTCATGGGGATGCGACTCAGGAGAACTGGAGCAAGCTCACCCACAGATTTCCCATGTAACCATTTATCAGTCCAAAACAAAGTATTCTGGCCATCTCCCACAATAGAGCAAACAGAGACTGAGAAAAATGCTGCTGCATTTGGATGCACCTGAATGTCAAAATCTGACCATGACCGTTCAGGCTGGGTCTTTTTAAGCCACATCCAACGCATATTCAGAGACCAGCCAAGCACCTCCAAATTGTGGATTCCGAGGCCCCCCCTACTGATAGGCCTGCAAACCTTGGACCAACCGACAACACAATGGCCTCCTCTAACATCTGTCCTTCCCTTCCATAAAAACCCCCTGCGAATCTTATCAATAGCTCTAATTAC    T,* 99  PASS    AC=34,2;AF=0.944444,0.0555556;AN=36;ACP=1,1,1;VCS=1;VCR=1:119885-121192;VCGS=7;VCGR=1:96019-130431;HC=2;ACO=B73_mantaSV,.   GT:GPP:APP:NAK:FAK:MAC:SAF  1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,6.3984,-1:0,0,0  1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,10.0334,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,6.05643,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,5.12502,-1:0,0,0 2/2:0,0,0,0,0,1:0,0,1:-1,-1,6:-1,-1,1:-1,-1,2.34679:0,0,0   ./.:0,0,0,0,0.7462,0.2538:0,0.7462,1:-1,5.48432,5.80061:-1,0,1:-1,0,11.4544:0,2,0   1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,6.11511,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,3.27049,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,3.36197,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,4.12725,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,5.13398,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,8.84972,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,3.88092,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,5.42057,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,5.15327,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,9.88321,-1:0,0,0 ./.:0.9884,0.0116,0,0,0,0:1,0.0116,0:14.462,3.36207,-1:1,0.336207,-1:5.04096,2.85776,-1:0,2,0   1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,5.0996,-1:0,0,0  1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,7.37556,-1:0,0,0 1/1:0,0,1,0,0,0:0,1,0:-1,5.2,-1:-1,1,-1:-1,4.16548,-1:0,0,0

How did this happen and what's the meaning?

Please find attached the running log and the heading 1000 lines of the input and output vcf files. xab-geno.zip

Thank you.

Best wishes,

Songtao Gui

jonassibbesen commented 5 years ago

Hi Songtao,

Thank you for posting. The * allele is used for nested variation, such as, variants inside an upstream deletion. In your example there is a large homozygote called deletion upstream at position 115390 for this individual. Therefore all alleles (reference and alternative) inside this deletion will be missing, including the variant at position 119885. Supplementary Figure 14 from our paper illustrates this dependency between nested variation.

Hope it answered your question.

Best,

Jonas