Ensembl / VEP_plugins

Plugins for the Ensembl Variant Effect Predictor (VEP)
Apache License 2.0
138 stars 115 forks source link

CADD can not get any scores #698

Closed zyxNo1 closed 7 months ago

zyxNo1 commented 7 months ago

Hello, I am using VEP tool for variants annotation and have encountered an issue when using the CADD plugin. I tried several different vcf files but never got any CADD scores. Here’s the command I’ve executed: vep -i ~/WGS.bwa.dedup-NC_T_1_vs_NC_N_1-MuTect2.vcf --plugin CADD,~/resource/ensembl-vep/cadd/whole_genome_SNVs.tsv.gz,~/resource/ensembl-vep/cadd/gnomad.genomes.r4.0.indel.tsv.gz --cache --force_overwrite --fork 10

The output headers indeed include fields related to CADD:

## Extra column keys:
## IMPACT : Subjective impact classification of consequence type
## DISTANCE : Shortest distance from variant to transcript
## STRAND : Strand of the feature (1/-1)
## FLAGS : Transcript quality flags
## CADD_PHRED : PHRED-like scaled CADD score. CADD is only available here for non-commercial use. See CADD website for more information.
## CADD_RAW : Raw CADD score. CADD is only available here for non-commercial use. See CADD website for more information.

However, in the annotated body of the output, no CADD scores are displayed, as seen in this example line:

#Uploaded_variation     Location        Allele  Gene    Feature Feature_type    Consequence     cDNA_position   CDS_position    Protein_position        Amino_acids     Codons  Existing_variation      Extra
chr1_14699_C/G  chr1:14699      G       ENSG00000223972 ENST00000450305 Transcript      downstream_gene_variant -       -       -       -       -       -       IMPACT=MODIFIER;DISTANCE=1029;STRAND=1

Additionally, there are numerous warnings during the process, repeating warnings about uninitialized values $s, $alt, and $file.

The version of VEP being used is 105.

Your help is greatly appreciated in advance.

olaaustine commented 7 months ago

Hi @zyxNo1, Thank you very much for opening this query. To debug this issue, can you share more examples of your input variants, what version of VEP plugin you are using and also what version of CADD file you are using? Also in your command, I can see you are using --cache, can you tell us what cache version you are using also Thank you OIa.

zyxNo1 commented 7 months ago

Thanks for your kindly reply. The cache version is consistent with VEP, also being 105. The CADD file used is the latest one, which corresponds to version 111. I have attached partial input and output and hope they can be helpful.

input.vcf.gz

variant_effect_output.txt

olaaustine commented 7 months ago

Hi @zyxNo1, Thank you for your response, I have not been able to recreate the issue using that input file, I was able to get CADD scores. To understand whats happening, can you confirm that you have tabix installed in your path and you also have the tabix files of ~/resource/ensembl-vep/cadd/whole_genome_SNVs.tsv.gz and~/resource/ensembl-vep/cadd/gnomad.genomes.r4.0.indel.tsv.gz in the same directory, if it is not, you can download it from the same place you downloaded those files.

Secondly, can you you run the command above like this vep -i ~/WGS.bwa.dedup-NC_T_1_vs_NC_N_1-MuTect2.vcf --plugin CADD,~/resource/ensembl-vep/cadd/whole_genome_SNVs.tsv.gz,~/resource/ensembl-vep/cadd/gnomad.genomes.r4.0.indel.tsv.gz --cache --force_overwrite --fork 10 --dir_cache <directory where your cache is> --offline Let us know if this works. Thank you Ola.

zyxNo1 commented 7 months ago

I have just comfired that tabix tool is in the system path (version 1.11), and four required files are located in the same directory: gnomad.genomes.r4.0.indel.tsv.gz gnomad.genomes.r4.0.indel.tsv.gz.tbi whole_genome_SNVs.tsv.gz whole_genome_SNVs.tsv.gz.tbi

Then I attempted to run VEP with the following command: vep -i ~/input.vcf --plugin CADD,~/resource/ensembl-vep/cadd/whole_genome_SNVs.tsv.gz,~/resource/ensembl-vep/cadd/gnomad.genomes.r4.0.indel.tsv.gz --cache --force_overwrite --fork 10 --dir_cache ~/.vep/ --offline

But still got no scores.

Moreover, I can confirm that VEP’s basic annotation functionality is working as expected, as well as “custom” annotations like ClinVar information. The plugin CADD ouput generates a notably large warning file of uninitialized values; while input.vcf is 2.5 M but the warning file is about 45 M. I don't know if this information can be of any help.

olaaustine commented 7 months ago

Hi @zyxNo1, Please can you specify the --dir_plugins to use also and the --assembly flag. Can you let us know if this resolves the problem? Thank you Ola

zyxNo1 commented 7 months ago

YES!!

Use correct path to specify --dir_plugins, then the CADD scores are annotated without any warnings.

Many thanks for your time and consideration. @olaaustine