Closed moonso closed 6 years ago
@PeterARBork @bjhall @dnil I will only support the 90 annotation since the code will be awful if we need to support both. I don't have a larger file with information on how the VEP entries may look like so I will assume the keys above and that the value is only a float. If you have an example VCF with more variants or any other information please inform me.
@moonso you have my support for supporting vep 90 and disregarding the earlier versions.
Fixed in #734
@moonso
I checked some larger VCFs and unfortunately spotted a problem... All of the *AF fields are strictly floats, except for "AF" (1000G).
In rare cases (always for INDELs as far as I can tell) this field will contain two values separated by an &. For example "0.0791&0.07905". The weird thing is that they appear to almost always be the same value, just with different precision (4 vs 5 decimal places). According to the documentation this should not happen, so I suppose it is a bug in VEP.
I will test if this is fixed in VEP 91, but at the moment this will most likely cause crashes when loading VCFs from VEP 90. I haven't had time to actually test, but float("0.0791&0.07905") will throw a ValueError.
Hmm ok, we should post an issue on VEP:s GitHub. I think I wrote a try catch for this but we would still loose information.
I reported this to VEP last week. It should be safe to use the first value in these cases. And it should be fixed by version 92: https://github.com/Ensembl/ensembl-vep/issues/143#issuecomment-366748690
Great job!
Btw, did anyone try this?
I just loaded a vcf with only vep annotations, but found no variants in Scout - probably because there's no Gene.refGene field in the vcf. Does that sound about right, since I didn't annotate with annovar or other tools?
Is Scout already supporting gene identifier annotation from vep or could it do so in the future?
Scout parses hgnc ids from VEP. Sounds weird. You can email me a reduced version of the file if you don't get any progress
Here's a few of the lines in the vcf:
##VEP="v90" time="2018-02-22 14:29:25" cache="/home/databases/variant_effect_predictor/.vep/homo_sapiens/90_GRCh38" db="homo_sapiens_core_90_38@ensembldb.ensembl.org" ensembl-variation=90.58bf949 ensembl-io=90.9a148ea ensembl-funcgen=90.743f32b ensembl=90.4a44397 1000genomes="phase3" COSMIC="81" ClinVar="201706" ESP="V2-SSA137" HGMD-PUBLIC="20164" assembly="GRCh38.p10" dbSNP="150" gencode="GENCODE 27" genebuild="2014-07" gnomAD="170228" polyphen="2.2.2" regbuild="16" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Consequence|gnomadAD_AF|AF|MAX_AF|gnomadAD_NFE_AF|EUR_AF|Distance|Codons|Amino_acids|Gene|SYMBOL|CLIN_SIG|Feature|Feature_type|EXON|PolyPhen|SIFT|Protein_position|BIOTYPE">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 57PGF
chr1 17019382 rs12045097 A G 530.77 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=2.497;ClippingRankSum=0;DB;DP=38;ExcessHet=3.0103;FS=1.323;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;POSITIVE_TRAIN_SITE;QD=13.97;ReadPosRankSum=0.113;SOR=0.368;VQSLOD=19.34;culprit=MQRankSum;CSQ=intron_variant|||0.5288||0.2097||||ENSG00000117118|SDHB||ENST00000375499|Transcript|||||protein_coding,intron_variant&non_coding_transcript_variant|||0.5288||0.2097||||ENSG00000117118|SDHB||ENST00000475049|Transcript|||||processed_transcript,intron_variant&non_coding_transcript_variant|||0.5288||0.2097||||ENSG00000117118|SDHB||ENST00000485092|Transcript|||||retained_intron,downstream_gene_variant|||0.5288||0.2097||||ENSG00000117118|SDHB||ENST00000485515|Transcript|||||processed_transcript,downstream_gene_variant|||0.5288||0.2097||||ENSG00000117118|SDHB||ENST00000491274|Transcript|||||protein_coding GT:AD:DP:GQ:PL 0/1:19,19:38:99:559,0,558
chr1 17019909 rs2746476 A T 1308.77 PASS AC=2;AF=1;AN=2;DB;DP=39;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;POSITIVE_TRAIN_SITE;QD=33.56;SOR=1.387;VQSLOD=22.35;culprit=MQ;CSQ=intron_variant|||0.996||0.9722||||ENSG00000117118|SDHB||ENST00000375499|Transcript|||||protein_coding,intron_variant&non_coding_transcript_variant|||0.996||0.9722||||ENSG00000117118|SDHB||ENST00000475049|Transcript|||||processed_transcript,intron_variant&non_coding_transcript_variant|||0.996||0.9722||||ENSG00000117118|SDHB||ENST00000485092|Transcript|||||retained_intron,downstream_gene_variant|||0.996||0.9722||||ENSG00000117118|SDHB||ENST00000485515|Transcript|||||processed_transcript,downstream_gene_variant|||0.996||0.9722||||ENSG00000117118|SDHB||ENST00000491274|Transcript|||||protein_coding GT:AD:DP:GQ:PL 1/1:0,39:39:99:1337,117,0
chr1 17021461 rs2746475 G A 913.77 PASS AC=2;AF=1;AN=2;DB;DP=29;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;POSITIVE_TRAIN_SITE;QD=33.84;SOR=1.101;VQSLOD=23.65;culprit=MQ;CSQ=intron_variant|||0.996||0.9722||||ENSG00000117118|SDHB||ENST00000375499|Transcript|||||protein_coding,intron_variant&non_coding_transcript_variant|||0.996||0.9722||||ENSG00000117118|SDHB||ENST00000475049|Transcript|||||processed_transcript,intron_variant&non_coding_transcript_variant|||0.996||0.9722||||ENSG00000117118|SDHB||ENST00000485092|Transcript|||||retained_intron,downstream_gene_variant|||0.996||0.9722||||ENSG00000117118|SDHB||ENST00000485515|Transcript|||||processed_transcript,downstream_gene_variant|||0.996||0.9722||||ENSG00000117118|SDHB||ENST00000491274|Transcript|||||protein_coding GT:AD:DP:GQ:PL 1/1:0,27:27:81:942,81,0
chr1 17022049 rs2746474 T G 712.77 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=-1.742;ClippingRankSum=0;DB;DP=36;ExcessHet=3.0103;FS=8.043;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;POSITIVE_TRAIN_SITE;QD=19.8;ReadPosRankSum=0.761;SOR=2.124;VQSLOD=19.9;culprit=MQRankSum;CSQ=intron_variant|||0.5166||0.1968||||ENSG00000117118|SDHB||ENST00000375499|Transcript|||||protein_coding,intron_variant&non_coding_transcript_variant|||0.5166||0.1968||||ENSG00000117118|SDHB||ENST00000475049|Transcript|||||processed_transcript,intron_variant&non_coding_transcript_variant|||0.5166||0.1968||||ENSG00000117118|SDHB||ENST00000485092|Transcript|||||retained_intron,downstream_gene_variant|||0.5166||0.1968||||ENSG00000117118|SDHB||ENST00000485515|Transcript|||||processed_transcript,downstream_gene_variant|||0.5166||0.1968||||ENSG00000117118|SDHB||ENST00000491274|Transcript|||||protein_coding GT:AD:DP:GQ:PL 0/1:12,24:36:99:741,0,310
chr1 17022192 rs61769185 C T 841.77 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=4.309;ClippingRankSum=0;DB;DP=43;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;POSITIVE_TRAIN_SITE;QD=19.58;ReadPosRankSum=-0.476;SOR=0.855;VQSLOD=19.39;culprit=MQRankSum;CSQ=intron_variant|||0.3033||0.0974||||ENSG00000117118|SDHB||ENST00000375499|Transcript|||||protein_coding,intron_variant&non_coding_transcript_variant|||0.3033||0.0974||||ENSG00000117118|SDHB||ENST00000475049|Transcript|||||processed_transcript,intron_variant&non_coding_transcript_variant|||0.3033||0.0974||||ENSG00000117118|SDHB||ENST00000485092|Transcript|||||retained_intron,downstream_gene_variant|||0.3033||0.0974||||ENSG00000117118|SDHB||ENST00000485515|Transcript|||||processed_transcript,downstream_gene_variant|||0.3033||0.0974||||ENSG00000117118|SDHB||ENST00000491274|Transcript|||||protein_coding GT:AD:DP:GQ:PL 0/1:15,28:43:99:870,0,319
chr1 17023303 rs978528 C T 1873.77 PASS AC=2;AF=1;AN=2;DB;DP=54;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;POSITIVE_TRAIN_SITE;QD=34.7;SOR=0.693;VQSLOD=21.78;culprit=MQ;CSQ=intron_variant|||0.9059||0.8867||||ENSG00000117118|SDHB||ENST00000375499|Transcript|||||protein_coding,downstream_gene_variant|||0.9059||0.8867||||ENSG00000117118|SDHB||ENST00000463045|Transcript|||||protein_coding,upstream_gene_variant|||0.9059||0.8867||||ENSG00000117118|SDHB||ENST00000475049|Transcript|||||processed_transcript,downstream_gene_variant|||0.9059||0.8867||||ENSG00000117118|SDHB||ENST00000475506|Transcript|||||retained_intron,upstream_gene_variant|||0.9059||0.8867||||ENSG00000117118|SDHB||ENST00000485092|Transcript|||||retained_intron,intron_variant&non_coding_transcript_variant|||0.9059||0.8867||||ENSG00000117118|SDHB||ENST00000485515|Transcript|||||processed_transcript,downstream_gene_variant|||0.9059||0.8867||||ENSG00000117118|SDHB||ENST00000491274|Transcript|||||protein_coding GT:AD:DP:GQ:PL 1/1:0,54:54:99:1902,162,0
Here's the panel:
##panel_id=pb_panel
##institute=bonkolab
##version=1.0
##date=2010-01-02
##display_name=PB Test panel
#hgnc_id hgnc_symbol disease_associated_transcripts reduced_penetrance genetic_disease_models mosaicism database_entry_version original_hgnc
7481 MT-TF MT-TF
7495 MT-TQ MT-TQ
37102 DDX11L1
14825 OR4F5
10681 SDHB
18674 DDX41
I've been playing with panels also, but in this case I thought the SDHB variants should appear in scout?
Your VCF is missing a lot of important VEP fields. Consider using --everything
@moonso I will test this on our setup next week! I'm on paternity leave and travelling at the moment...
Alright it works when using --everything.
It would be great with some specifications or an example of how you annotate the vcf's, to make them ready for Scout.
👍 sure @PeterARBork I'll make a new issue for that.
VEP 87
GMAF - 1000G all popluations combined xxx_MAF - 1000G (or NHLBI-ESP) individual populations ExAC_MAF - ExAC all popluations ExAC_Adj_MAF- ExAC all popluations (adjusted) ExAC_xxx_MAF - ExAC individual populations http://dec2016.archive.ensembl.org/info/docs/tools/vep/vep_formats.html
VEP 90
AF - 1000G all populations combined xxx_AF - 1000G (or NHLBI-ESP) individual populations gnomAD_AF - gnomAD exomes, all populations combined gnomAD_xxx_AF - gnomAD exomes, individual populations MAX_AF - Max of all populations (1000G, gnomAD exomes, ESP) https://www.ensembl.org/info/docs/tools/vep/vep_formats.html