Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
456 stars 153 forks source link

COSMIC information in cached file isn't annotated #775

Closed hsiaoyi0504 closed 3 years ago

hsiaoyi0504 commented 4 years ago

Describe the issue

I am trying to annotate a vcf with this record:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NHRI-SLY-AML-001_N      NHRI-SLY-AML-001_T
5       170837543       .      C       CCTGC   .       PASS    DP=84;MQ=53.41;TLOD=19.61;NLOD=-5.59;FractionInformativeReads=0.964     GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB  0/0:-5.59:13,0:0.000:8,0:5,0:13:.:.     0/1:19.61:35,33:0.485:16,13:19,20:68:28,7,18,15:17,18,16,17

With following commands:

$VEP_PATH/vep --cache --offline \
    --cache_version 100 \
    --assembly GRCh37 \
    --port 3337 \
    --dir_plugins $VEP_PLUGIN_DIR \
    --dir_cache $VEP_CACHE_DIR \
        -i $INPUT_VCF_PATH \
    --vcf \
    -o $OUTPUT_VCF_PATH \
    --check_existing \
    --plugin LoFtool,$VEP_CACHE_DIR/Plugins/LoFtool_scores.txt \
    --plugin ExACpLI,$VEP_CACHE_DIR/Plugins/ExACpLI_values.txt \
    --plugin MPC,$VEP_PLUGIN_DATA_DIR/fordist_constraint_official_mpc_values_v2.txt.gz \
    --plugin LOVD \
    --plugin FlagLRG,$VEP_PLUGIN_DATA_DIR/list_LRGs_transcripts_xrefs.txt \
    --plugin FunMotifs,$VEP_PLUGIN_DATA_DIR/blood.funmotifs_sorted.bed.gz,fscore,dnase_seq \
    --plugin PostGAP,$VEP_PLUGIN_DATA_DIR/postgap_GRCh37.txt.gz,ALL \
    --plugin satMutMPRA,file=$VEP_PLUGIN_DATA_DIR/satMutMPRA_GRCh37_ALL.gz,cols=ALL \
    --fork 4

I will get the following result. However, it seems to me that the matched COSMIC records aren't annotated.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NHRI-SLY-AML-001_N      NHRI-SLY-AML-001_T
5       170837543       .      C       CCTGC   .       PASS    DP=84;MQ=53.41;TLOD=19.61;NLOD=-5.59;FractionInformativeReads=0.964;CSQ=CTGC|frameshift_variant|HIGH|NPM1|ENSG00000181163|Transcript|ENST00000296930|protein_coding|11/11||||1160-1161|859-860|287|L/PAX|ctc/cCTGCtc|||1||HGNC|7910||||0.523|0.96||||||,CTGC|frameshift_variant|HIGH|NPM1|ENSG00000181163|Transcript|ENST00000351986|protein_coding|10/10||||892-893|772-773|258|L/PAX|ctc/cCTGCtc|||1||HGNC|7910||||0.523|0.96||||||,CTGC|downstream_gene_variant|MODIFIER|NPM1|ENSG00000181163|Transcript|ENST00000393820|protein_coding|||||||||||3414|1||HGNC|7910||||0.523|0.96||||||,CTGC|frameshift_variant|HIGH|NPM1|ENSG00000181163|Transcript|ENST00000517671|protein_coding|12/12||||994-995|859-860|287|L/PAX|ctc/cCTGCtc|||1||HGNC|7910||||0.523|0.96||||||,CTGC|downstream_gene_variant|MODIFIER|NPM1|ENSG00000181163|Transcript|ENST00000519955|retained_intron|||||||||||4807|1||HGNC|7910||||0.523|0.96||||||,CTGC|non_coding_transcript_exon_variant|MODIFIER|NPM1|ENSG00000181163|Transcript|ENST00000524204|retained_intron|2/2||||295-296|||||||1||HGNC|7910||||0.523|0.96||||||   GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB  0/0:-5.59:13,0:0.000:8,0:5,0:13:.:.     0/1:19.61:35,33:0.485:16,13:19,20:68:28,7,18,15:17,18,16,17

I did check this variant exists in COSMIC's mutation data (v90).

> grep "COSV51564309" CosmicMutantExport_v90.tsv
NPM1    ENST00000296930.5   885 7910    2218757 2218757 2086750 haematopoietic_and_lymphoid_tissue  NS  NS  NS  haematopoietic_neoplasm acute_myeloid_leukaemia_therapy_related M4  NS  n   COSV51564309    COSM4170212 22424264    c.859_860insCTGC    p.L287Pfs*13    Insertion - Frameshift  het     37  5:170837543-170837544   +   n   -           Variant of unknown origin   24522528        blood-bone marrow   NS  78.4
NPM1_ENST00000517671    ENST00000517671.1   885 7910    2218757 2218757 2086750 haematopoietic_and_lymphoid_tissue  NS  NS  NS  haematopoietic_neoplasm acute_myeloid_leukaemia_therapy_related M4  NS  n   COSV51564309    COSM4170212 64593882    c.859_860insCTGC    p.L287Pfs*13    Insertion - Frameshift  het     37  5:170837543-170837544   +   n   -           Variant of unknown origin   24522528        blood-bone marrow   NS  78.4
NPM1_ENST00000351986    ENST00000351986.6   798 7910    2218757 2218757 2086750 haematopoietic_and_lymphoid_tissue  NS  NS  NS  haematopoietic_neoplasm acute_myeloid_leukaemia_therapy_related M4  NS  n   COSV51564309    COSM4170212 29272418    c.772_773insCTGC    p.L258Pfs*13    Insertion - Frameshift  het     37  5:170837543-170837544   +   n   -           Variant of unknown origin   24522528        blood-bone marrow   NS  78.4

Additional information

I am pretty sure the VEP did annotate some other variants correctly. For example, with this record:

15      90631934        .       C       T       .       PASS    DP=77;MQ=51.62;TLOD=41.81;NLOD=23.01;FractionInformativeReads=1.000     GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB  0/0:23.01:20,0:0.000:8,0:12,0:20:.:.    0/1:41.81:36,21:0.368:26,9:10,12:57:21,15,15,6:17,19,11,10

I will get

15      90631934        .       C       T       .       PASS    DP=77;MQ=51.62;TLOD=41.81;NLOD=23.01;FractionInformativeReads=1.000;CSQ=T|missense_variant|MODERATE|IDH2|ENSG00000182054|Transcript|ENST00000330062|protein_coding|4/11||||533|419|140|R/Q|cGg/cAg|rs121913502&CM106818&COSV57468751&COSV57469541||-1||HGNC|5383|likely_pathogenic&pathogenic|0&0&1&1|1&1&1&1|0.307|0.38|1.55173916304|IDH2&XM_005254894.1&c.29G>A||||,T|missense_variant|MODERATE|IDH2|ENSG00000182054|Transcript|ENST00000539790|protein_coding|2/9||||239|29|10|R/Q|cGg/cAg|rs121913502&CM106818&COSV57468751&COSV57469541||-1||HGNC|5383|likely_pathogenic&pathogenic|0&0&1&1|1&1&1&1|0.307|0.38||IDH2&XM_005254894.1&c.29G>A||||,T|missense_variant|MODERATE|IDH2|ENSG00000182054|Transcript|ENST00000540499|protein_coding|4/11||||424|263|88|R/Q|cGg/cAg|rs121913502&CM106818&COSV57468751&COSV57469541||-1||HGNC|5383|likely_pathogenic&pathogenic|0&0&1&1|1&1&1&1|0.307|0.38||IDH2&XM_005254894.1&c.29G>A|LRG_611t1|||,T|intron_variant|MODIFIER|IDH2|ENSG00000182054|Transcript|ENST00000559482|protein_coding||2/7||||||||rs121913502&CM106818&COSV57468751&COSV57469541||-1||HGNC|5383|likely_pathogenic&pathogenic|0&0&1&1|1&1&1&1|0.307|0.38||IDH2&XM_005254894.1&c.29G>A||||,T|3_prime_UTR_variant&NMD_transcript_variant|MODIFIER|IDH2|ENSG00000182054|Transcript|ENST00000560061|nonsense_mediated_decay|2/9||||239|||||rs121913502&CM106818&COSV57468751&COSV57469541||-1||HGNC|5383|likely_pathogenic&pathogenic|0&0&1&1|1&1&1&1|0.307|0.38||IDH2&XM_005254894.1&c.29G>A|||| GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB  0/0:23.01:20,0:0.000:8,0:12,0:20:.:.    0/1:41.81:36,21:0.368:26,9:10,12:57:21,15,15,6:17,19,11,10

System

Full VEP command line

Explained above

Data files (if applicable)

Explained above

aparton commented 4 years ago

Hi,

Thank you for this report. We're looking into it, and we'll get back to you once we know more.

Kind Regards, Andrew

eriicdesousa commented 4 years ago

I've also seen the behaviour described here also with deletions not only insertions. It seems it just annotates with whatever COSMIC mutations there are at the specific position. I've seen it in both VEP 99 and VEP 100 which are the 2 versions that I've used. I can give lots of examples if it helps

aparton commented 4 years ago

Hi @oldguyeric,

Thank you for this report. We've been able to find multiple examples of this issue, and we're currently investigating further.

I'll let you know once we have an estimated timeline for a fix.

Kind Regards, Andrew

aparton commented 3 years ago

Hi,

This issue should now be resolved. I'm going to close this issue now. If you have any further questions, please feel free to reopen it or create a new issue.

Kind Regards, Andrew