Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 150 forks source link

Differing ClinVar clinical significance annotation according to transcript/feature #1680

Open growland2 opened 1 month ago

growland2 commented 1 month ago

Describe the issue

When running VEP using a local implementation via Docker, we see that variants are annotated with different ClinVar clinical significance values depending on the corresponding transcript or feature. Examples of such variants are shown below:

Example variant 1:

CHROM    POS    ID    REF    ALT
1   976097  666960  G   GGGGCC

When annotated via VEP locally using VEP version=107, VEP Cache=RefSeq and ClinVar GRCh37 version=20240317 (as a custom annotation source), this variant was not annotated with a ClinVar_CLNSIG for the given transcripts (see excerpt from VEP output below):

VEP v107 and RefSeq cache output:

1   976097  666960  G   GGGGCC  49.28   .   AC=2;AF=1;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22.34;QD=16.43;SOR=1.179;CSQ=CGGGC|AGRN||insertion|frameshift_variant|HIGH|4/39||NM_001305275.2|NM_001305275.2:c.574_578dup|NP_001292204.1:p.Ser194GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03,CGGGC|AGRN||insertion|frameshift_variant|HIGH|3/36||NM_001364727.2|NM_001364727.2:c.259_263dup|NP_001351656.1:p.Ser89GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03,CGGGC|AGRN||insertion|frameshift_variant|HIGH|4/36||NM_198576.4|NM_198576.4:c.574_578dup|NP_940978.2:p.Ser194GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03   GT:AD:DP:GQ:PL  1/1:0,3:3:9:77,9,0

Command:

docker run -v /home/dnanexus:/data -w /data 199b8c2aa90b vep -i /data/path_likely_path_final_temp.vcf -o /data/path_likely_path_final_temp_annotated.vcf.gz --dir /data --vcf --cache --refseq --exclude_predicted --symbol --hgvs --hgvsg --check_existing --variant_class --numbers --format vcf --offline --exclude_null_alleles --assembly GRCh37 --custom /data/clinvar_20240317_GRCh37.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN,CLNSIGCONF --custom /data/gnomad.genomes.r2.1.1.sites.all.noVEP_normalised_decomposed_PASS.dias_trimmed_v1.0.0.vcf.bgz,gnomADg,vcf,exact,0,AC,AN,AF,nhomalt,popmax,AC_popmax,AN_popmax,AF_popmax,nhomalt_popmax --custom /data/gnomad.exomes.r2.1.1.sites.noVEP_normalised_decomposed_PASS.dias_trimmed_v1.0.0.vcf.bgz,gnomADe,vcf,exact,0,AC,AN,AF,nhomalt,popmax,AC_popmax,AN_popmax,AF_popmax,nhomalt_popmax,non_cancer_AC,non_cancer_AN,non_cancer_AF,non_cancer_nhomalt,non_cancer_AC_popmax,non_cancer_AN_popmax,non_cancer_AF_popmax,non_cancer_nhomalt_popmax,non_cancer_popmax --custom /data/TWE_POPAF_N500_chr1-22_220413.vcf.gz,TWE,vcf,exact,0,AF,AC_Hom,AC_Het,AN --custom /data/HGMD_Pro_2023.4_hg19.vcf.gz,HGMD,vcf,exact,0,PHEN,RANKSCORE,CLASS --plugin SpliceAI,snv=/data/spliceai_scores.masked.snv.hg19.vcf.gz,indel=/data/spliceai_scores.masked.indel.hg19.vcf.gz --plugin REVEL,/data/revel_b37.tsv.gz --plugin CADD,/data/cadd_whole_genome_SNVs_GRCh37.tar.gz,/data/gnomad.genomes.r2.1.1.indel.tsv.gz,/data/InDels_GRCh37.tsv.gz --fields Allele,SYMBOL,HGNC_ID,VARIANT_CLASS,Consequence,IMPACT,EXON,INTRON,Feature,HGVSc,HGVSp,HGVS_OFFSET,Existing_variation,STRAND,ClinVar,ClinVar_CLNSIG,ClinVar_CLNSIGCONF,ClinVar_CLNDN,gnomADg_AC,gnomADg_AN,gnomADg_AF,gnomADg_nhomalt,gnomADg_popmax,gnomADg_AC_popmax,gnomADg_AN_popmax,gnomADg_AF_popmax,gnomADg_nhomalt_popmax,gnomADe_AC,gnomADe_AN,gnomADe_AF,gnomADe_nhomalt,gnomADe_popmax,gnomADe_AC_popmax,gnomADe_AN_popmax,gnomADe_AF_popmax,gnomADe_nhomalt_popmax,gnomADe_non_cancer_AC,gnomADe_non_cancer_AN,gnomADe_non_cancer_AF,gnomADe_non_cancer_nhomalt,gnomADe_non_cancer_AC_popmax,gnomADe_non_cancer_AN_popmax,gnomADe_non_cancer_AF_popmax,gnomADe_non_cancer_nhomalt_popmax,gnomADe_non_cancer_popmax,TWE_AF,TWE_AC_Hom,TWE_AC_Het,TWE_AN,HGMD,HGMD_PHEN,HGMD_CLASS,HGMD_RANKSCORE,SpliceAI_pred_DS_AG,SpliceAI_pred_DS_AL,SpliceAI_pred_DS_DG,SpliceAI_pred_DS_DL,SpliceAI_pred_DP_AG,SpliceAI_pred_DP_AL,SpliceAI_pred_DP_DG,SpliceAI_pred_DP_DL,REVEL,CADD_PHRED --buffer_size 500 --fork 16 --no_stats --compress_output bgzip --shift_3prime 1

We then re-ran locally using VEP version=v112, VEP Cache=RefSeq and ClinVar GRCh37 version=20240317, to ensure the prior output was not specific to v107. This returned the same results as above.

Next, we ran specifying the merged cache containing RefSeq and Ensembl transcripts. Here we saw that we only get the expected ClinVar_CLNSIG annotation "Pathogenic/Likely_pathogenic" for some of the Ensembl (ENS) transcripts (ENST00000469403, ENST00000477585 and ENST00000479707) but not for the remaining transcripts (ENST00000379370, NM_001305275.2, NM_001364727.2 and NM_198576.4) (see output from VEP below).

VEP v112 and merged cache output:

1   976097  666960  G   GGGGCC  62.74   .   AC=2;AF=1;AN=2;DB;DP=2;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=31.37;SOR=2.303;CSQ=CGGGC|AGRN|329|insertion|frameshift_variant|HIGH|4/36||ENST00000379370|ENST00000379370.2:c.574_578dup|ENSP00000368678.2:p.Ser194GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03,GGGCC|AGRN|329|insertion|non_coding_transcript_exon_variant|MODIFIER|2/3||ENST00000469403|ENST00000469403.1:n.521_525dup||14|rs1570190059|1|666960|Pathogenic/Likely_pathogenic||Congenital_myasthenic_syndrome_8&Congenital_myasthenic_syndrome|||||||||||||||||||||||||||||||||||||||||||||17.03,GGGCC|AGRN|329|insertion|downstream_gene_variant|MODIFIER|||ENST00000477585||||rs1570190059|1|666960|Pathogenic/Likely_pathogenic||Congenital_myasthenic_syndrome_8&Congenital_myasthenic_syndrome|||||||||||||||||||||||||||||||||||||||||||||17.03,GGGCC|AGRN|329|insertion|upstream_gene_variant|MODIFIER|||ENST00000479707||||rs1570190059|1|666960|Pathogenic/Likely_pathogenic||Congenital_myasthenic_syndrome_8&Congenital_myasthenic_syndrome|||||||||||||||||||||||||||||||||||||||||||||17.03,CGGGC|AGRN|329|insertion|frameshift_variant|HIGH|4/39||NM_001305275.2|NM_001305275.2:c.574_578dup|NP_001292204.1:p.Ser194GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03,CGGGC|AGRN|329|insertion|frameshift_variant|HIGH|3/36||NM_001364727.2|NM_001364727.2:c.259_263dup|NP_001351656.1:p.Ser89GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03,CGGGC|AGRN|329|insertion|frameshift_variant|HIGH|4/36||NM_198576.4|NM_198576.4:c.574_578dup|NP_940978.2:p.Ser194GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03   GT:AD:DP:GQ:PGT:PID:PL  1/1:0,2:2:6:1|1:1200192_C_G:90,6,0

Command:

docker run -v /home/dnanexus:/data -w /data 607ee83f9536 vep -i /data/vep_clnsig_temp.vcf -o /data/vep_clnsig_temp_annotated.vcf.gz --dir /data --vcf --cache --merged --exclude_predicted --symbol --hgvs --hgvsg --check_existing --variant_class --numbers --format vcf --offline --exclude_null_alleles --assembly GRCh37 --custom /data/clinvar_20240317_GRCh37.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN,CLNSIGCONF --custom /data/gnomad.genomes.r2.1.1.sites.all.noVEP_normalised_decomposed_PASS.dias_trimmed_v1.0.0.vcf.bgz,gnomADg,vcf,exact,0,AC,AN,AF,nhomalt,popmax,AC_popmax,AN_popmax,AF_popmax,nhomalt_popmax --custom /data/gnomad.exomes.r2.1.1.sites.noVEP_normalised_decomposed_PASS.dias_trimmed_v1.0.0.vcf.bgz,gnomADe,vcf,exact,0,AC,AN,AF,nhomalt,popmax,AC_popmax,AN_popmax,AF_popmax,nhomalt_popmax,non_cancer_AC,non_cancer_AN,non_cancer_AF,non_cancer_nhomalt,non_cancer_AC_popmax,non_cancer_AN_popmax,non_cancer_AF_popmax,non_cancer_nhomalt_popmax,non_cancer_popmax --custom /data/TWE_POPAF_N500_chr1-22_220413.vcf.gz,TWE,vcf,exact,0,AF,AC_Hom,AC_Het,AN --custom /data/HGMD_Pro_2023.4_hg19.vcf.gz,HGMD,vcf,exact,0,PHEN,RANKSCORE,CLASS --plugin SpliceAI,snv=/data/spliceai_scores.masked.snv.hg19.vcf.gz,indel=/data/spliceai_scores.masked.indel.hg19.vcf.gz --plugin REVEL,/data/revel_b37.tsv.gz --plugin CADD,/data/cadd_whole_genome_SNVs_GRCh37.tar.gz,/data/gnomad.genomes.r2.1.1.indel.tsv.gz,/data/InDels_GRCh37.tsv.gz --fields Allele,SYMBOL,HGNC_ID,VARIANT_CLASS,Consequence,IMPACT,EXON,INTRON,Feature,HGVSc,HGVSp,HGVS_OFFSET,Existing_variation,STRAND,ClinVar,ClinVar_CLNSIG,ClinVar_CLNSIGCONF,ClinVar_CLNDN,gnomADg_AC,gnomADg_AN,gnomADg_AF,gnomADg_nhomalt,gnomADg_popmax,gnomADg_AC_popmax,gnomADg_AN_popmax,gnomADg_AF_popmax,gnomADg_nhomalt_popmax,gnomADe_AC,gnomADe_AN,gnomADe_AF,gnomADe_nhomalt,gnomADe_popmax,gnomADe_AC_popmax,gnomADe_AN_popmax,gnomADe_AF_popmax,gnomADe_nhomalt_popmax,gnomADe_non_cancer_AC,gnomADe_non_cancer_AN,gnomADe_non_cancer_AF,gnomADe_non_cancer_nhomalt,gnomADe_non_cancer_AC_popmax,gnomADe_non_cancer_AN_popmax,gnomADe_non_cancer_AF_popmax,gnomADe_non_cancer_nhomalt_popmax,gnomADe_non_cancer_popmax,TWE_AF,TWE_AC_Hom,TWE_AC_Het,TWE_AN,HGMD,HGMD_PHEN,HGMD_CLASS,HGMD_RANKSCORE,SpliceAI_pred_DS_AG,SpliceAI_pred_DS_AL,SpliceAI_pred_DS_DG,SpliceAI_pred_DS_DL,SpliceAI_pred_DP_AG,SpliceAI_pred_DP_AL,SpliceAI_pred_DP_DG,SpliceAI_pred_DP_DL,REVEL,CADD_PHRED --buffer_size 2000 --fork 16 --no_stats --compress_output bgzip --shift_3prime 1

Example variant 2:

CHROM    POS    ID    REF    ALT
1   1371178    830327   T   TGGCGCGGAGC

As was seen for "Example variant 1", when annotated via VEP locally using VEP version=107, VEP Cache=RefSeq and ClinVar GRCh37 version=20240317, this variant received different ClinVar_CLNSIG values depending on the trascript/feature with NM_022834.5 and NM_199121.3 receiving no ClinVar_CLNSIG annotation, and NR_125994.1, NR_125995.1 and NR_125996.1 being annotated with the expected "Pathogenic/Likely_pathogenic" (see excerpt from VEP output below):

VEP v107 and RefSeq cache output:

1   1371178 830327  T   TGGCGCGGAGC 62.74   .   AC=2;AF=1;AN=2;DB;DP=2;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=31.37;SOR=0.693;CSQ=GCGCGGAGCG|VWA1||insertion|frameshift_variant&splice_region_variant|HIGH|1/3||NM_022834.5|NM_022834.5:c.62_71dup|NP_073745.2:p.Gly25AlafsTer74|21||1|||||||||||||||||||||||||||||||||||||||||||||||||23.7,GCGCGGAGCG|VWA1||insertion|frameshift_variant&splice_region_variant|HIGH|1/3||NM_199121.3|NM_199121.3:c.62_71dup|NP_954572.2:p.Gly25AlafsTer53|21||1|||||||||||||||||||||||||||||||||||||||||||||||||23.7,GGCGCGGAGC|LINC01770||insertion|upstream_gene_variant|MODIFIER|||NR_125994.1||||rs749383814|-1|830327|Pathogenic/Likely_pathogenic||VWA1-related_condition&Neuronopathy&_distal_hereditary_motor&not_provided&Neuronopathy&_distal_hereditary_motor&_autosomal_recessive_7&Neuromuscular_disease|15|28196|0.00053199|0|afr|6|8376|0.000716332|0|||||||||||||||||||0.002|0|2|1000|CI218713|"Neuromyopathy"|DM|||||||||||23.7,GGCGCGGAGC|LINC01770||insertion|upstream_gene_variant|MODIFIER|||NR_125995.1||||rs749383814|-1|830327|Pathogenic/Likely_pathogenic||VWA1-related_condition&Neuronopathy&_distal_hereditary_motor&not_provided&Neuronopathy&_distal_hereditary_motor&_autosomal_recessive_7&Neuromuscular_disease|15|28196|0.00053199|0|afr|6|8376|0.000716332|0|||||||||||||||||||0.002|0|2|1000|CI218713|"Neuromyopathy"|DM|||||||||||23.7,GGCGCGGAGC|LINC01770||insertion|upstream_gene_variant|MODIFIER|||NR_125996.1||||rs749383814|-1|830327|Pathogenic/Likely_pathogenic||VWA1-related_condition&Neuronopathy&_distal_hereditary_motor&not_provided&Neuronopathy&_distal_hereditary_motor&_autosomal_recessive_7&Neuromuscular_disease|15|28196|0.00053199|0|afr|6|8376|0.000716332|0|||||||||||||||||||0.002|0|2|1000|CI218713|"Neuromyopathy"|DM|||||||||||23.7   GT:AD:DP:GQ:PGT:PID:PL  1/1:0,2:2:6:1|1:827267_C_T:90,6,0

What decides which transcript receives the expected ClinVar_CLNSIG value?

Additional information

System

nuno-agostinho commented 1 month ago

Hi @growland2,

Sorry to hear about this inconvenience.

While running VEP with ClinVar as a custom file (similar to your command), I do get the same results consistently for the expected variants, regardless of their Ensembl/RefSeq transcript.

I am going to try and see if any of the options you are using could be affecting the results.

Best regards, Nuno

growland2 commented 1 month ago

Thanks, FYI running these variants through VEP GRCh37 online failed to annotate with any clinical significance for any transcript for both example variants:

VEP online results for Example variant 1 and Example variant 2.

Kind regards,

Greg