WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
234 stars 359 forks source link

About SVs annotation by using annovar #205

Open yshcai opened 1 year ago

yshcai commented 1 year ago

Hi, I recently perfom the annotation of structural variations (SVs) by using ANNOVAR. I have used five SV callers to identify SVs of a single sample, and merged them to a individual vcf file by SURVIVOR. So, I plan to annotate this merged SVs information next. The ANNOVAR required conversion of vcf format to avinput format because my vcf file is merged from different SV callers and the command is convert2annovar.pl --format vcf4old --outfile my_sample.avinput my_sample.vcf. However, I find something wrong information like this:

# original vcf
Chr2    7048855 pbsv.INV.840    A   AAAAAAATGTTTTTTTTTTTTCGTGATTAAAGGCTTTTAATTAGTCTAATTATCAATTATATACCATAATAAGTTGTTTCGATGTTGTTGTAATCGATGCAATTAGCGTTATTTTAGTGTTTTAAAAAAGTCAGAGTTTTTCAGCGTCAGAAAAACACTAAAATAACGCTAATTCCGTGGCTTAAACAACTTTTTATAATAAATAATTAACAATTAGACTTATTAACAGCCATTAATCACGAGAAAAAAAACATTTTTTTAAATTCAGTTTTTTGGATTTTTCGGCTACGGATGGAGAGCTACAGAAAATTTTACTTGGCATAATTTGTAGGAAATTTAATTTGCAATAATTTATAAAGAGCAACATTTTTCGATATCTTCCATATTTCGCGAGATATCGAGGAAAAACGAAAAATCACGATTTGCGACCGCAGCGCTACCCCGCGGTAAATTGCGGAACTAATTTTTTCAGGCAAATCGACGAATTTTTCATGGAAAATTTAGTTCCGGAATTCACCGCGGGGTGGCGCTGCGGTCGGAAAACATTATTTTTCGTTTTTCTTCGATATCTCGCGAAATAAGAGAGATATCGAAAAATGTTGCTCTTTATAAATTATTGGCAATTGAATTTCCTACAAATTATGTTCATCGAAATTTTCTGTAGCTCTTAATCTGTAGCCAGAAAATGCAAAAAACTGAATTTA    55  PASS    SUPP=3;SUPP_VEC=01110;SVLEN=1266;SVTYPE=INV;SVMETHOD=SURVIVOR1.0.7;CHR2=Chr2;END=7050121;CIPOS=0,328;CIEND=-944,0;STRANDS=++    GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO ./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN    ./.:NA:705:0,10:++:.,.:INS,INV:cuteSV.INV.16:G:GTTTTTTTTTTTTTCGTGATTAAAGGCTTTTAATTAGTCTAATTATCAATTATATACCATAATAAGTTGTTTCGATGTTGTTGTAATCGATGCAATTAGCGTTATTTTAGTGTTTTAAAAAAGTCAGAGTTTTTCAGCGTCAGAAAAACACTAAAATAACGCTAATTCCGTGGCTTAAACAACTTTTTATAATAAATAATTAACAATTAGACTTATTAACAGCCATTAATCACGAGAAAAAAAACATTTTTTTAAATTCAGTTTTTTGGATTTTTCGGCTACGGATGGAGAGCTACAGAAAATTTTACTTGGCATAATTTGTAGGAAATTTAATTTGCAATAATTTATAAAGAGCAACATTTTTCGATATCTTCCATATTTCGCGAGATATCGAGGAAAAACGAAAAATCACGATTTGCGACCGCAGCGCTACCCCCGCGGTAAATTGCGGAACTAATTTTTTCAGGCAAATCGACGAATTTTTCATGGAAAATTTAGTTCCGGAATTCACCGCGGGGTGGCGCTGCGGTCGGAAAACATTATTTTTCGTTTTTCTTCGATATCTCGCGAAATAAGAGAGATATCGAAAAATGTTGCTCTTTATAAATTATTGGCAATTGAATTTCCTACAAATTATGTTCATCGAAATTTTCTGTAGCTCTTAATCTGTAGCCAGAAAATGCAAAAAACTGAATTTAAAAAAATG:Chr2_7049027-Chr2_7049027,Chr2_7049097-Chr2_7049177   0/1:NA:1266:82,41:++:.,.,.:INV,INS,INV:pbsv.INV.840:A:AAAAAAATGTTTTTTTTTTTTCGTGATTAAAGGCTTTTAATTAGTCTAATTATCAATTATATACCATAATAAGTTGTTTCGATGTTGTTGTAATCGATGCAATTAGCGTTATTTTAGTGTTTTAAAAAAGTCAGAGTTTTTCAGCGTCAGAAAAACACTAAAATAACGCTAATTCCGTGGCTTAAACAACTTTTTATAATAAATAATTAACAATTAGACTTATTAACAGCCATTAATCACGAGAAAAAAAACATTTTTTTAAATTCAGTTTTTTGGATTTTTCGGCTACGGATGGAGAGCTACAGAAAATTTTACTTGGCATAATTTGTAGGAAATTTAATTTGCAATAATTTATAAAGAGCAACATTTTTCGATATCTTCCATATTTCGCGAGATATCGAGGAAAAACGAAAAATCACGATTTGCGACCGCAGCGCTACCCCGCGGTAAATTGCGGAACTAATTTTTTCAGGCAAATCGACGAATTTTTCATGGAAAATTTAGTTCCGGAATTCACCGCGGGGTGGCGCTGCGGTCGGAAAACATTATTTTTCGTTTTTCTTCGATATCTCGCGAAATAAGAGAGATATCGAAAAATGTTGCTCTTTATAAATTATTGGCAATTGAATTTCCTACAAATTATGTTCATCGAAATTTTCTGTAGCTCTTAATCTGTAGCCAGAAAATGCAAAAAACTGAATTTA:Chr2_7048855-Chr2_7050121,Chr2_7049019-Chr2_7049019,Chr2_7049183-Chr2_7049831    0/1:NA:1251:74,33:++:48,55:INV,INV:Sniffles2.INV.3D4S1:NA:NA:Chr2_7048864-Chr2_7049427,Chr2_7048867-Chr2_7050118    ./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN

# avinput
Chr2    7048855 7048855 -   AAAAAATGTTTTTTTTTTTTCGTGATTAAAGGCTTTTAATTAGTCTAATTATCAATTATATACCATAATAAGTTGTTTCGATGTTGTTGTAATCGATGCAATTAGCGTTATTTTAGTGTTTTAAAAAAGTCAGAGTTTTTCAGCGTCAGAAAAACACTAAAATAACGCTAATTCCGTGGCTTAAACAACTTTTTATAATAAATAATTAACAATTAGACTTATTAACAGCCATTAATCACGAGAAAAAAAACATTTTTTTAAATTCAGTTTTTTGGATTTTTCGGCTACGGATGGAGAGCTACAGAAAATTTTACTTGGCATAATTTGTAGGAAATTTAATTTGCAATAATTTATAAAGAGCAACATTTTTCGATATCTTCCATATTTCGCGAGATATCGAGGAAAAACGAAAAATCACGATTTGCGACCGCAGCGCTACCCCGCGGTAAATTGCGGAACTAATTTTTTCAGGCAAATCGACGAATTTTTCATGGAAAATTTAGTTCCGGAATTCACCGCGGGGTGGCGCTGCGGTCGGAAAACATTATTTTTCGTTTTTCTTCGATATCTCGCGAAATAAGAGAGATATCGAAAAATGTTGCTCTTTATAAATTATTGGCAATTGAATTTCCTACAAATTATGTTCATCGAAATTTTCTGTAGCTCTTAATCTGTAGCCAGAAAATGCAAAAAACTGAATTTA unknown 55

In fact, the orginal vcf shows that there is a inversion between 7048855 and 7050121 in Chr2, but in the avinput file it is a insertion. I think maybe the convert2annovar.pl is not suitable for SV vcf file, but I have no idea about how to correctly convert is to a avinput because I want to write a script to reach it.