Open znikasz opened 5 years ago
Hello @znikasz
Ensembl VEP only outputs in ANN format if you use the --vcf_info_field ANN --terms so
output options
http://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html#output
This is how we call Ensembl VEP in Cannoli
Dear @heuermh ,
thank you for an answer. I checked cannoli way of running VEP and I've tried to do it myself.
I've used a very simple VCF file, containing one change:
chr1 2309937 . C CTT 281.04 . AC=2;AF=1;AN=2 GT:AD:DP:GQ:PL 1/1:0,8:8:24:295,24,0
I've annotated it with VEP, release 96:
vep --dir /opt/vep/.vep/ -i /data/input.vcf.gz -o /data/output2.vep.vcf --format vcf --vcf --vcf_info_field ANN --terms SO --no_stats --offline
Annotated VCF had lines:
##VEP="v96" time="2019-05-14 17:42:50" cache="/opt/vep/.vep/homo_sapiens/96_GRCh38" ensembl=96.7a35428 ensembl-funcgen=96.9c3a0cd ensembl-variation=96.70d2777 ensembl-io=96.6e65b30 1000genomes="phase3" COSMIC="87" ClinVar="201901" ESP="V2-SSA137" HGMD-PUBLIC="20184" assembly="GRCh38.p12" dbSNP="151" gencode="GENCODE 30" genebuild="2014-07" gnomAD="r2.1" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=ANN,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID">
(...)
chr1 2309937 . C CTT 281.04 . AC=2;AF=1;AN=2;DP=18;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=59.4;QD=28.73;SOR=1.863;ANN=TT|3_prime_UTR_variant|MODIFIER|SKI|ENSG00000157933|Transcript|ENST00000378536|protein_coding|7/7||||5807-5808|||||||1||HGNC|HGNC:10896 GT:AD:DP:GQ:PL 1/1:0,8:8:24:295,24,0
But now running:
ac.loadVcf("/tmp/output2.vep.vcf").toVariants.dataset.show
throws:
Caused by: java.lang.NumberFormatException: For input string: "5807-5808"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.bdgenomics.adam.converters.TranscriptEffectConverter$.parseFraction(TranscriptEffectConverter.scala:99)
at org.bdgenomics.adam.converters.TranscriptEffectConverter$.parseTranscriptEffect(TranscriptEffectConverter.scala:143)
as 5807-5808
is not a number.
Is there anything that I'm doing wrong?
Thank you for the detailed reply. VEP must not be using the same ANN format string as SnpEff or are otherwise different from the ANN specification. I'll take a closer look.
I have already been using Adam with snpEff annotations and it worked great. However, when I've tried to use it with a VCF file annotated using Ensembl VEP it didn't work. I was expecting, that ANN format is standardized enough, so that
transcriptEffects
will be populated.So I wanted to ask whether I do something wrong and there is a way to make Adam work with VEP or it's not yet supported?
Thank you.