Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
451 stars 151 forks source link

REVEL scores empty when using VEP 109 #1369

Closed rklocke closed 1 year ago

rklocke commented 1 year ago

Describe the issue

Hi, we are currently using VEP version 105 and are trying to update to version 109 (VEP Docker v109.3 + 109 resources). However, when running VEP 109, the annotated VCF is produced but the REVEL field is empty, despite CADD and SpliceAI scores being annotated as expected and no errors are produced.

Additional information

System

Full VEP command line

Full command being run:

docker run -v /home/dnanexus:/data be9eb9a28830 vep
-i /data/X221099_markdup_recalibrated_Haplotyper_temp.vcf 
-o /data/X221099__markdup_recalibrated_Haplotyper_temp_annotated.vcf.gz
--vcf --cache --refseq --exclude_predicted --symbol --hgvs --hgvsg --check_existing
--variant_class --numbers --format vcf --offline --exclude_null_alleles --assembly GRCh37
--custom /opt/vep/.vep/clinvar_20230107.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN,CLNSIGCONF
--custom /opt/vep/.vep/gnomad.genomes.r2.0.1.sites.noVEP_normalised_decomposed_PASS.vcf.gz,gnomADg,vcf,exact,0,AC,AN,AF,Hom
--custom /opt/vep/.vep/gnomad.exomes.r2.0.2.sites.noVEP_normalised_decomposed_PASS.vcf.gz,gnomADe,vcf,exact,0,AC,AN,AF,Hom
--custom /opt/vep/.vep/TWE_POPAF_N500_chr1-22_220413.vcf.gz,TWE,vcf,exact,0,AF,AC_Hom,AC_Het,AN
--custom /opt/vep/.vep/HGMD_Pro_2021.4_hg19.vcf.gz,HGMD,vcf,exact,0,PHEN,RANKSCORE,CLASS
--plugin SpliceAI,snv=/opt/vep/.vep/spliceai_scores.masked.snv.hg19.vcf.gz,indel=/opt/vep/.vep/spliceai_scores.masked.indel.hg19.vcf.gz 
--plugin REVEL,/opt/vep/.vep/revel_b37.tsv.gz
--plugin CADD,/opt/vep/.vep/cadd_whole_genome_SNVs_GRCh37.tar.gz,/opt/vep/.vep/gnomad.genomes.r2.1.1.indel.tsv.gz,/opt/vep/.vep/InDels_GRCh37.tsv.gz
--fields Allele,SYMBOL,HGNC_ID,VARIANT_CLASS,Consequence,IMPACT,EXON,INTRON,Feature,HGVSc,HGVSp,HGVS_OFFSET,Existing_variation,STRAND,ClinVar,ClinVar_CLNSIG,ClinVar_CLNSIGCONF,ClinVar_CLNDN,gnomADg_AC,gnomADg_AN,gnomADg_AF,gnomADe_AC,gnomADe_AN,gnomADe_AF,gnomADe_Hom,TWE_AF,TWE_AC_Hom,TWE_AC_Het,TWE_AN,HGMD,HGMD_PHEN,HGMD_CLASS,HGMD_RANKSCORE,SpliceAI_pred_DS_AG,SpliceAI_pred_DS_AL,SpliceAI_pred_DS_DG,SpliceAI_pred_DS_DL,SpliceAI_pred_DP_AG,SpliceAI_pred_DP_AL,SpliceAI_pred_DP_DG,SpliceAI_pred_DP_DL,REVEL,CADD_PHRED
--buffer_size 500 --fork 16 --no_stats --compress_output bgzip --shift_3prime 1

Example record in output VCF:

1   66058513    rs1137101   A   G   2421.77 .   AC=1;AF=0.5;AN=2;BaseQRankSum=-0.771;ClippingRankSum=0;DB;DP=191;ExcessHet=3.0103;FS=1.812;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=12.75;ReadPosRankSum=-2.09;SOR=0.841;CSQ=G|LEPR||SNV|missense_variant|MODERATE|6/20||NM_002303.6|NM_002303.6:c.668A>G|NP_002294.2:p.Gln223Arg||rs1137101|1|8521|Benign||LEPTIN_RECEPTOR_POLYMORPHISM&Obesity_due_to_leptin_receptor_gene_deficiency&not_specified&Monogenic_Non-Syndromic_Obesity&not_provided|16451|30816|0.533846|123990|245296|0.505471|33092|0.437|206|231|1000|CM010905|"Higher_body_mass_index%2C_association_with"|DP|0.1|0.00|0.00|0.00|0.00|18|2|35|12||18.44    GT:AD:DP:GQ:PL  0/1:91,99:190:99:2450,0,2252

We have also tried to annotate with only REVEL and again, there are no errors but the REVEL scores are empty:

docker run -v /home/dnanexus:/data be9eb9a28830 vep
-i /data/X221099.vcf
-o /data/X221099-annotated.vcf.gz 
--vcf --cache --refseq --numbers --format vcf --offline --exclude_null_alleles --assembly "GRCh37"
--plugin REVEL,/opt/vep/.vep/revel_b37.tsv.gz  --no_stats

and using the /data folder structure for REVEL:

docker run -v /home/dnanexus:/data be9eb9a28830 vep
-i /data/X221099.vcf -o /data/X221099-annotated.vcf.gz
--vcf --cache --refseq --numbers --format vcf --offline --exclude_null_alleles --assembly "GRCh37" 
--plugin REVEL,/data/revel_b37.tsv.gz  --no_stats

Full error message

Including the warnings, if available

Any help would be appreciated!

Thanks, Becky

dglemos commented 1 year ago

Hi @rklocke, I cannot reproduce the issue. Could you please check if the file revel_b37.tsv.gz has the location 1:66058513?

rklocke commented 1 year ago

Hi @dglemos,

The file revel_b37.tsv.gz, has this location, the second entry is the REVEL score for the relevant variant:

docker run -v /home/dnanexus:/data be9eb9a28830 zgrep 66058513 revel_b37.tsv.gz
1       66058513        65592830        A       C       Q       P       0.153   ENST00000344610;ENST00000349533;ENST00000371060;ENST00000371059;ENST00000371058
1       66058513        65592830        A       G       Q       R       0.034   ENST00000344610;ENST00000349533;ENST00000371060;ENST00000371059;ENST00000371058
1       66058513        65592830        A       T       Q       L       0.125   ENST00000344610;ENST00000349533;ENST00000371060;ENST00000371059;ENST00000371058

Here are the files I have mounted (the latest CADD.pm and REVEL.pm files are in /Plugins):

$ pwd
/home/dnanexus
$ ls
InDels_GRCh37.tsv.gz                  InDels_GRCh37.tsv.gz.tbi
Plugins                  plugin_config.txt
gnomad.genomes.r2.1.1.indel.tsv.gz         revel_b37.tsv.gz
gnomad.genomes.r2.1.1.indel.tsv.gz.tbi     revel_b37.tsv.gz.tbi
X221099_subset.vcf                        homo_sapiens_refseq
cadd_whole_genome_SNVs_GRCh37.tar.gz      homo_sapiens_refseq_vep_109_GRCh37.tar.gz  
cadd_whole_genome_SNVs_GRCh37.tar.gz.tbi  hs37d5.fasta-index.tar.gz                  vep_v109.3.tar.gz

I have attached the input VCF used (4 missense variants) and the output VCF. CADD_PHRED is annotated as expected but REVEL is not. My command:

$ docker run -v /home/dnanexus:/data be9eb9a28830 vep \
-i /data/X221099_subset.vcf -o /data/X221099_annotated.vcf \
--vcf --cache --refseq --numbers --format vcf --offline --exclude_null_alleles --assembly GRCh37 \
--plugin REVEL,/data/revel_b37.tsv.gz \
--plugin CADD,/data/cadd_whole_genome_SNVs_GRCh37.tar.gz,/data/gnomad.genomes.r2.1.1.indel.tsv.gz,/data/InDels_GRCh37.tsv.gz \
--fields Allele,SYMBOL,Consequence,IMPACT,EXON,INTRON,HGVSc,HGVSp,REVEL,CADD_PHRED
>> Unknown option: no_update
Ignoring unsupported option 'no_update' found via ENV variable or INI file
Unknown option: no_htslib
Ignoring unsupported option 'no_htslib' found via ENV variable or INI file
Unknown option: pluginsdir
Ignoring unsupported option 'pluginsdir' found via ENV variable or INI file
Unknown option: no_plugins

Thanks in advance for your help! Becky

X221099_subset.vcf.gz X221099_annotated.vcf.gz

rklocke commented 1 year ago

Apologies, the annotated VCF from the exact command I had given is attached (the previous VCF was generated from a command with --force_overwrite and not including --offline). It seems strange that the only --plugin information that remains in the ##vep-command-line description in the annotated VCF is:

--plugin [PATH]/InDels_GRCh37.tsv.gz

and that no errors are given about not being able to find the relevant files. Is there a different way I should specify plugins in v109?

Thanks again for any help :) X221099_annotated_2.vcf.gz

dglemos commented 1 year ago

Thank you for sending all these details, now I can reproduce the issue. The REVEL plugin is trying to match the transcript ids from the file revel_b37.tsv.gz with the transcripts from the cache. The problem is revel_b37.tsv.gz only has Ensembl ids while the cache you are using has RefSeq ids:

revel_b37.tsv.gz: ENST00000344610;ENST00000349533;ENST00000371060;ENST00000371059;ENST00000371058
cache: NM_002303.6

We will update the plugin to optionally not match by transcript id. In the meantime, you could remove the last column from the revel file.

rklocke commented 1 year ago

Hi Diana, thank you for looking into this and for explaining the cause. Just checking if there is an estimate of when the plugin might be updated / the next VEP release? Thanks!

dglemos commented 1 year ago

We plan to update the plugin in the next release which is scheduled to happen in the summer.

rklocke commented 1 year ago

Hi, I've modified our revel_b37.tsv.gz file to remove the last column like below:

#chr    hg19_pos    grch38_pos  ref alt aaref   aaalt   REVEL
1   35142   35142   G   A   T   M   0.027
1   35142   35142   G   C   T   R   0.035
1   35142   35142   G   T   T   K   0.043

And run vep v109.3 Docker and REVEL is now annotated mostly as expected, however I notice one of our variants now has the wrong REVEL score:

2   73717567    rs2056486   G   T   3457.77 .   AC=1;AF=0.5;AN=2;BaseQRankSum=-4.604;ClippingRankSum=0;DB;DP=261;ExcessHet=3.0103;FS=1.492;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=13.35;ReadPosRankSum=1.443;SOR=0.6;CSQ=T|ALMS1||SNV|missense_variant|MODERATE|10/23||NM_015120.4|NM_015120.4:c.8484G>T|NP_055935.4:p.Arg2828Ser||rs2056486|1|383761|Benign||Alstrom_syndrome&not_specified&Cardiovascular_phenotype|11646|30918|0.376674|64709|245736|0.263327|11776|0.252|58|194|1000|||||0.00|0.00|0.00|0.00|-25|22|-26|0|0.043|8.368   GT:AD:DP:GQ:PL  0/1:120,139:259:99:3486,0,3208

It seems to be annotating the score from a different nucleotide change at this position (G>C instead of G>T) within the revel_b37_no_transcripts.tsv.gz file:

2   73717567    73490440    G   C   R   S   0.043
2   73717567    73490440    G   T   R   S   0.039

How I modified the revel_b37.tsv.gz file in case it's useful:

$ zcat revel_b37.tsv.gz | cut -d$'\t' -f9 --complement > revel_b37_no_transcripts.tsv
$ bgzip revel_b37_no_transcripts.tsv
$ tabix -f -s 1 -b 2 -e 2 revel_b37_no_transcripts.tsv.gz

Is this an issue with how the REVEL.pm file matches variants in v109 or due to me modifying the REVEL file itself? Thanks again for your help.

dglemos commented 1 year ago

Thank you for reporting this issue! I don't see any problem with the way you modified the file, the problem is in the plugin. REVEL matches by peptide change which is the same in your example both G>C and G>T have peptide R>S. We are going to include a fix for this issue too.

dglemos commented 1 year ago

We have fixed the plugin to:

Best wishes, Diana