Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
449 stars 151 forks source link

Using one of the two annotation sources VCF is used when same annotation source name is used #1651

Open Aisha-D opened 5 months ago

Aisha-D commented 5 months ago

Describe the issue

If two custom annotations with the same name are provided, vep will annotate the VCF using first custom annotation and ignore the second annotation source. Expectation was an error to be raised (to require clarification to differentiate the two annotation sources).

Example of command and VCF header output:

docker run -v /home/dnanexus:/opt/vep/.vep ensemblorg/ensembl-vep:release_103.1 ./vep 
-i /opt/vep/.vep/128970875-24085Q0017-24NGSHO13-8128-U-96527893_markdup_recalibrated_tnhaplotyper2_split.vcf 
-o /opt/vep/.vep/128970875-24085Q0017-24NGSHO13-8128-U-96527893_markdup_recalibrated_tnhaplotyper2_split_filevep.vcf 
--vcf --cache --refseq --exclude_predicted --symbol --hgvs --af_gnomad --check_existing --variant_class --numbers --offline 
--custom /opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN 
--custom /opt/vep/.vep/haemonc_1706_samples.vcf.gz,Prev,vcf,exact,0,AC,NS 
--custom /opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID 
--custom /opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID 
--plugin CADD,/opt/vep/.vep/./whole_genome_SNVs.tsv.gz,/opt/vep/.vep/./gnomad.genomes.r3.0.indel.tsv.gz --fields SYMBOL,VARIANT_CLASS,Consequence,EXON,HGVSc,HGVSp,gnomAD_AF,CADD_PHRED,Existing_variation,ClinVar,ClinVar_CLNDN,ClinVar_CLNSIG,COSMIC,Prev_AC,Prev_NS,Feature
##VEP="v103" time="2024-04-03 17:49:22" cache="/opt/vep/.vep/homo_sapiens_refseq/103_GRCh38" ensembl=103.4c8d44a ensembl-variation=103.06320c4 ensembl-io=103.353f93a ensembl-funcgen=103.b53bef4 1000genomes="phase3" COSMIC="92" ClinVar="202008" ESP="V2-SSA137" HGMD-PUBLIC="20194" assembly="GRCh38.p13" dbSNP="154" gencode="GENCODE 37" genebuild="2014-07" gnomAD="r2.1" polyphen="2.2.2" refseq="2020-09-29 12:45:25 - GCF_000001405.39_GRCh38.p13_genomic.gff" regbuild="1.0" sift="sift5.2.2"
##CADD_PHRED=PHRED-like scaled CADD score
##CADD_RAW=Raw CADD score
##INFO=<ID=ClinVar,Number=.,Type=String,Description="/opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz (exact)">
##INFO=<ID=ClinVar_CLNSIG,Number=.,Type=String,Description="CLNSIG field from /opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz">
##INFO=<ID=ClinVar_CLNREVSTAT,Number=.,Type=String,Description="CLNREVSTAT field from /opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz">
##INFO=<ID=ClinVar_CLNDN,Number=.,Type=String,Description="CLNDN field from /opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz">
##INFO=<ID=Prev,Number=.,Type=String,Description="/opt/vep/.vep/haemonc_1706_samples.vcf.gz (exact)">
##INFO=<ID=Prev_AC,Number=.,Type=String,Description="AC field from /opt/vep/.vep/haemonc_1706_samples.vcf.gz">
##INFO=<ID=Prev_NS,Number=.,Type=String,Description="NS field from /opt/vep/.vep/haemonc_1706_samples.vcf.gz">
##INFO=<ID=COSMIC,Number=.,Type=String,Description="/opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz (exact)">
##INFO=<ID=COSMIC_ID,Number=.,Type=String,Description="ID field from /opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz">
##INFO=<ID=SYMBOL,Number=.,Type=String,Description="The SYMBOL field from INFO/CSQ">
##INFO=<ID=VARIANT_CLASS,Number=.,Type=String,Description="The VARIANT_CLASS field from INFO/CSQ">
##INFO=<ID=Consequence,Number=.,Type=String,Description="The Consequence field from INFO/CSQ">
##INFO=<ID=EXON,Number=.,Type=String,Description="The EXON field from INFO/CSQ">
##INFO=<ID=HGVSc,Number=.,Type=String,Description="The HGVSc field from INFO/CSQ">
##INFO=<ID=HGVSp,Number=.,Type=String,Description="The HGVSp field from INFO/CSQ">
##INFO=<ID=gnomAD_AF,Number=.,Type=Float,Description="The gnomAD_AF field from INFO/CSQ">
##INFO=<ID=CADD_PHRED,Number=.,Type=String,Description="The CADD_PHRED field from INFO/CSQ">
##INFO=<ID=Existing_variation,Number=.,Type=String,Description="The Existing_variation field from INFO/CSQ">
##INFO=<ID=Feature,Number=.,Type=String,Description="The Feature field from INFO/CSQ">
##bcftools_split-vepVersion=1.12+htslib-1.12
##bcftools_split-vepCommand=split-vep -d -c - -a CSQ 128970875-24085Q0017-24NGSHO13-8128-U-96527893_markdup_recalibrated_tnhaplotyper2_allgenesvep.vcf; Date=Wed Apr  3 17:54:57 2024
##bcftools_annotateVersion=1.12+htslib-1.12
##bcftools_annotateCommand=annotate -x INFO/CSQ -o tmp.vcf; Date=Wed Apr  3 17:54:57 2024

System

Aisha-D commented 5 months ago

Example of custom annotation order swapped:

ocker run -v /home/dnanexus:/opt/vep/.vep ensemblorg/ensembl-vep:release_103.1 ./vep -i /opt/vep/.vep/128858722-24079Q0066-24NGSHO12-8128-U-96527893_markdup_recalibrated_tnhaplotyper2_split.vcf -o /opt/vep/.vep/128858722-24079Q0066-24NGSHO12-8128-U-96527893_markdup_recalibrated_tnhaplotyper2_split_filevep.vcf --vcf --cache --refseq --exclude_predicted --symbol --hgvs --af_gnomad --check_existing --variant_class --numbers --offline 
--custom /opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN 
--custom /opt/vep/.vep/novaseq_205samples_211007.vcf.gz,Prev,vcf,exact,0,AC,NS 
--custom /opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID 
--custom /opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID --plugin CADD,/opt/vep/.vep/./in/vep_refs/whole_genome_SNVs.tsv.gz,/opt/vep/.vep/./in/vep_refs/gnomad.genomes.r3.0.indel.tsv.gz --fields SYMBOL,VARIANT_CLASS,Consequence,EXON,HGVSc,HGVSp,gnomAD_AF,CADD_PHRED,Existing_variation,ClinVar,ClinVar_CLNDN,ClinVar_CLNSIG,COSMIC,Prev_AC,Prev_NS,Feature
##INFO=COSMIC,Number=.,Type=String,Description="/opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz (exact)">
##INFO=COSMIC_ID,Number=.,Type=String,Description="ID field from /opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz">
dglemos commented 5 months ago

Hi @Aisha-D, I can reproduce the issue - thanks for reporting it. We are looking for a solution.

Best wishes, Diana

dglemos commented 5 months ago

The issue is in your command. In the custom annotation you use the same short name (COSMIC) for both cosmic annotations:

--custom /opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID 
--custom /opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID --plugin CADD,/opt/vep/.vep/./in/vep_refs/whole_genome_SNVs.tsv.gz,/opt/vep/.vep/./in/vep_refs/gnomad.genomes.r3.0.indel.tsv.gz --fields SYMBOL,VARIANT_CLASS,Consequence,EXON,HGVSc,HGVSp,gnomAD_AF,CADD_PHRED,Existing_variation,ClinVar,ClinVar_CLNDN,ClinVar_CLNSIG,COSMIC,Prev_AC,Prev_NS,Feature

This name has to be unique, if you change the names to COSMICNonCoding and COSMICCoding the output returns both files info correctly.

--custom /opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz,COSMICNonCoding,vcf,exact,0,ID 
--custom /opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz,COSMICCoding,vcf,exact,0,ID --plugin CADD,/opt/vep/.vep/./in/vep_refs/whole_genome_SNVs.tsv.gz,/opt/vep/.vep/./in/vep_refs/gnomad.genomes.r3.0.indel.tsv.gz --fields SYMBOL,VARIANT_CLASS,Consequence,EXON,HGVSc,HGVSp,gnomAD_AF,CADD_PHRED,Existing_variation,ClinVar,ClinVar_CLNDN,ClinVar_CLNSIG,COSMIC,Prev_AC,Prev_NS,Feature

Let me know if you have more questions.

Best wishes, Diana

Aisha-D commented 5 months ago

Hi Diana, Thanks for looking into this. We resolved the issue but was hoping rather than overriding the data if the same name was used to instead raise an error.

dglemos commented 5 months ago

That makes sense. We will update VEP in the future to check if there are any duplicated names.

Best wishes, Diana