Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
453 stars 151 forks source link

ERROR when running VEP with clinvar #1551

Closed ysbioinfo closed 11 months ago

ysbioinfo commented 11 months ago

Describe the issue

Hi, I am trying to add the information from Clinvar to VEP annotation. I downloaded the Clinvar VCF file from: https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar_20231104.vcf.gz and took a test run, but it threw out an error. Could you help me out?

Additional information

I installed VEP using conda. It works very well if I do not add the --custom option

System

Full VEP command line

vep --id "1  230710048 230710048 A/G 1" --species homo_sapiens -o test.txt --cache --dir_cache /mnt/efs/NGS/yangshi/resource/ensembl_vep_cache/ --offline --assembly GRCh38 --custom file=/mnt/efs/NGS/yangshi/resource/clinvar_vcf/clinvar_20231104_GRCh38.vcf.gz,short_name=ClinVar,format=vcf,type=exact,coords=0,fields=CLNSIG%CLNREVSTAT%CLNDN

Full error message

Possible precedence issue with control flow operator at /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.

-------------------- EXCEPTION -------------------- MSG: ERROR: New type "type=exact" is not valid

STACK Bio::EnsEMBL::VEP::AnnotationSource::File::type /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/share/ensembl-vep-105.0-0/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:233 STACK Bio::EnsEMBL::VEP::AnnotationSource::File::new /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/share/ensembl-vep-105.0-0/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:146 STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_custom /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/share/ensembl-vep-105.0-0/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:228 STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/share/ensembl-vep-105.0-0/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:93 STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/share/ensembl-vep-105.0-0/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:170 STACK Bio::EnsEMBL::VEP::Runner::init /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/share/ensembl-vep-105.0-0/modules/Bio/EnsEMBL/VEP/Runner.pm:128 STACK Bio::EnsEMBL::VEP::Runner::run /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/share/ensembl-vep-105.0-0/modules/Bio/EnsEMBL/VEP/Runner.pm:199 STACK toplevel /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/bin/vep:232 Date (localtime) = Thu Nov 9 13:52:40 2023 Ensembl API version = 105

Data files (if applicable)

They include:

dglemos commented 11 months ago

Hi @ysbioinfo, I'm sorry you are having problems when running VEP with custom annotation. The latest documentation for custom annotation is only valid for the latest release 110, using keys such as short_name=, format=, etc is not supported in release 105. We will improve the documentation for the next release to make it clear older versions do not support the new format. In the meantime, you can follow the documentation for 105 here: http://dec2021.archive.ensembl.org/info/docs/tools/vep/script/vep_custom.html#custom_options

ysbioinfo commented 11 months ago

Hi @dglemos , Thanks for your prompt reply! I followed your guidance you and the error disappeared, but now there is a new error:

[E::hts_open_format] Failed to open file file=/mnt/efs/NGS/yangshi/resource/clinvar_vcf/clinvar_20231104_GRCh38.vcf.gz Couldn't find index for file file=/mnt/efs/NGS/yangshi/resource/clinvar_vcf/clinvar_20231104_GRCh38.vcf.gz at /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/lib/site_perl/5.26.2/x86_64-linux-thread-multi/Bio/DB/HTS/Tabix.pm line 53.

Do you have any suggestion on this?

dglemos commented 11 months ago

Custom annotation files must be sorted in chromosome and position order, compressed using bgzip and finally indexed using tabix. You should index the ClinVar VCF file with the following command: tabix -p vcf clinvar_20231104_GRCh38.vcf.gz

ysbioinfo commented 11 months ago

Hi @dglemos , Actually I have an indexed file for that VCF. To make sure all files are in the right format, I re-indexed the VCF file using your command tabix -p vcf clinvar_20231104_GRCh38.vcf.gz, and I also made sure that the VCF file has been sorted. However, the same error still occurred. I found an issue similar to mine #979 , but here I have already used the absolute path for clinvar VCF, so I guess this solution will not work for me. Could you be so kind as to give me more advice? THanks

dglemos commented 11 months ago

Is the file .tbi in the same directory as the clinvar file? Can you also post the latest VEP command you run?

ysbioinfo commented 11 months ago

Yes, they are in the same folder:

(vcf2maf) yang.shi@ip-10-11-10-38:/mnt/efs/NGS/yangshi/resource/clinvar_vcf$ ls
clinvar_20231104_GRCh37.vcf.gz      clinvar_20231104_GRCh38.vcf.gz      clinvar_20231104_papu_GRCh37.vcf.gz      clinvar_20231104_papu_GRCh38.vcf.gz
clinvar_20231104_GRCh37.vcf.gz.tbi  clinvar_20231104_GRCh38.vcf.gz.tbi  clinvar_20231104_papu_GRCh37.vcf.gz.tbi  clinvar_20231104_papu_GRCh38.vcf.gz.tbi

And below is my latest command:

(vcf2maf) yang.shi@ip-10-11-10-38:/mnt/efs/NGS/yangshi/resource/clinvar_vcf$ vep --id "1  230710048 230710048 A/G 1" --species homo_sapiens -o test.txt --cache --dir_cache /mnt/efs/NGS/yangshi/resource/ensembl_vep_cache/ --offline --assembly GRCh38 --custom file=/mnt/efs/NGS/yangshi/resource/clinvar_vcf/clinvar_20231104_GRCh38.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN
Possible precedence issue with control flow operator at /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
[E::hts_open_format] Failed to open file file=/mnt/efs/NGS/yangshi/resource/clinvar_vcf/clinvar_20231104_GRCh38.vcf.gz
Couldn't find index for file file=/mnt/efs/NGS/yangshi/resource/clinvar_vcf/clinvar_20231104_GRCh38.vcf.gz at /mnt/efs/NGS/yangshi/software/anaconda3/envs/vcf2maf/lib/site_perl/5.26.2/x86_64-linux-thread-multi/Bio/DB/HTS/Tabix.pm line 53.
dglemos commented 11 months ago

The problem is in the custom command: --custom file=/mnt/efs/NGS/yangshi/resource/clinvar_vcf/clinvar_20231104_GRCh38.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN

It should be: --custom /mnt/efs/NGS/yangshi/resource/clinvar_vcf/clinvar_20231104_GRCh38.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN

ysbioinfo commented 11 months ago

It perfectly solves my problem. Thanks! @dglemos