Ensembl / VEP_plugins

Plugins for the Ensembl Variant Effect Predictor (VEP)
Apache License 2.0
141 stars 117 forks source link

CADD plugin gives the same score for different alternative alleles? #747

Closed Xi-Cao closed 2 weeks ago

Xi-Cao commented 2 weeks ago

Hi there,

Thank you for your helpful plugin on gene annotation!

I recently annotated my fine-mapping variants using VEP and the CADD plugin, with a list of rsIDs as the input file. Here’s the code I used for VEP:

vep -i ~/vep/test/vep_snplist -o vep_mesusie_snp_out.txt \
 --assembly GRCh37 --cache --cache_version 113 --dir ~/vep --everything --tab --fork 4 --force_overwrite --no_stats \
 --fasta ~/vep/homo_sapiens_merged/113_GRCh37/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz \
 --plugin CADD,snv=/data1/resource/vep_105_data/homo_sapiens/Plugins_data.hg19/whole_genome_SNVs.tsv.gz,indels=/data1/resource/vep_105_data/homo_sapiens/Plugins_data.hg19/gnomad.genomes-exomes.r4.0.indel.tsv.gz 

To review the results, I submitted the SNP list directly on the CADD website and received a score file. I combined the two score results in R and found that 357 variants had different CADD scores between the plugin and the website. Upon searching these variants on grch37.ensembl.org, I noticed that the discrepancies in CADD scores were due to alternative alleles. In my VEP result file, variants received only one CADD score despite having different alternative alleles. For example, for rs55894538 (CADD: A:2.406, T:1.800):

vep_signal[vep_signal$Uploaded_variation=="rs55894538",c(1:4,7,21)]
     Uploaded_variation    Location Allele            Gene                                  Consequence CADD_PHRED
8876         rs55894538 10:90102403      A ENSG00000184719                               intron_variant      2.406
8877         rs55894538 10:90102403      T ENSG00000184719                               intron_variant      2.406
8878         rs55894538 10:90102403      A ENSG00000184719                               intron_variant      2.406
8879         rs55894538 10:90102403      T ENSG00000184719                               intron_variant      2.406
8880         rs55894538 10:90102403      A ENSG00000184719                               intron_variant      2.406
8881         rs55894538 10:90102403      T ENSG00000184719                               intron_variant      2.406
8882         rs55894538 10:90102403      A ENSG00000184719 intron_variant,non_coding_transcript_variant      2.406
8883         rs55894538 10:90102403      T ENSG00000184719 intron_variant,non_coding_transcript_variant      2.406

What command should I use to obtain the correct CADD score for the different alternative alleles through VEP plugin?

Best, xicao

nakib103 commented 2 weeks ago

Hello @Xi-Cao,

Thanks for reporting this issue. This bug has been introduced in Ensembl VEP plugins in the e110 release when we added support for structural variant to the plugin. I am checking more into this and working on a fix.

At this point of time, you can use the e109 version of the CADD plugin if you are not interested in structural variants. Or, alternatively, use this PR to update CADD plugin which should solve this issue.

Best regards, Nakib

Xi-Cao commented 2 weeks ago

Hi, thanks for your reply!

I would like to know if there are any additional steps required to update the CADD.pm file used in VEP. I tried modifying the CADD.pm file directly and downloading the e110 release to replace the present CADD.pm in my ~/.vep/Plugins/ directory, but the results didn't change in either case.

Uploaded_variation   Location Allele Gene CADD_PHRED
1         rs56155961 1:21827094      A    -      0.008
2         rs56155961 1:21827094      G    -      0.008
3         rs56155961 1:21827094      T    -      0.008

Best, xicao

nakib103 commented 2 weeks ago

Hi @Xi-Cao,

Yes, if you directly download the file and replace it with the existing one it should work.

The thing is you should use the 109 release and not 110, as 110 release have the problem you mentioned. Alternatively, we have already added the bugfix to the latest version of the VEP plugins in 113. So you can already use it. Download the file and try again.

Xi-Cao commented 2 weeks ago

Hi, thanks for your reply!

Apologies for the mistake in my previous message. The CADD.pm release I used was actually 109, but the results remain unchanged. Additionally, I downloaded your file this time and replaced the CADD.pm in the ~/.vep/Plugins directory. However, it seems that the variants still received the same scores for different alternative alleles. I tried several times but the issue persists. Are there any additional steps or commands required to reload it?

Here is a list of ten variants and the result to test running: vep_snplist_10.txt output5.txt In the first three lines, rs56155961 got CADD_PHRED 0.008 three times.

Here is my test code:

vep \
  --assembly GRCh37 \
  --cache \
  --cache_version 113 \
  --dir ~/vep \
  --everything \
  --fasta ~/vep/homo_sapiens_merged/113_GRCh37/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz \
  --force_overwrite \
  --input_file ~/vep/test/vep_snplist_10 \
  --output_file output5.txt \
  --tab \
  --force_overwrite \
  --plugin NearestGene,max_range=100000 \
  --plugin CADD,snv=/data1/resource/vep_105_data/homo_sapiens/Plugins_data.hg19/CADD/whole_genome_SNVs.tsv.gz,indels=/data1/resource/vep_105_data/homo_sapiens/Plugins_data.hg19/CADD/gnomad.genomes-exomes.r4.0.indel.tsv.gz

Best, xicao

nakib103 commented 2 weeks ago

Hi @Xi-Cao,

As you are providing --dir ~/vep VEP will look for the CADD plugin file under - ~/vep/Plugins. Because --dir parameter sets the base plugin directory, see here.

Can you try again after updating the correct plugin file? I have tested the variant rs56155961 with the fix and for me it gives the correct result -

rs56155961  1:21827094  A   -   -   -   intergenic_variant  -   -   -   -   -   rs56155961  MODIFIER    -   -   SNV -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   0.008   -1.262725
rs56155961  1:21827094  G   -   -   -   intergenic_variant  -   -   -   -   -   rs56155961  MODIFIER    -   -   SNV -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   0.008   -1.284279
rs56155961  1:21827094  T   -   -   -   intergenic_variant  -   -   -   -   -   rs56155961  MODIFIER    -   -   SNV -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   0.1877  0.1929  0.281   0.2837  0.0964  0.1094  -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   0.2837  EAS -   -   -   -   -   -   -   -   -   0.015   -1.101046
Xi-Cao commented 2 weeks ago

Hi~ I corrected the plugin_file and it could work well now. Please accept my heartfelt thanks for your patient help!

Best, xicao

nakib103 commented 2 weeks ago

Glad to be able to help! I will close this issue. If you face ay further problem feel free to open a new one.