Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 150 forks source link

NAN allele encoding in JSON output #1672

Open muppetjones opened 1 month ago

muppetjones commented 1 month ago

Describe the issue

Presence of a NAN allele results in invalid JSON encoding.

Given the following input:

7 55269675 55269675 C/NAN +

The following output was given (abbreviated)

{
    "transcript_consequences": [
        {
            "variant_allele": -nan
        }
    ],
    "allele_string": "C/NAN"

Additional information

System

Full VEP command line

docker run -t --rm {binding} {self.image}:{self.tag} vep \
--everything --json --hgvsg --ga4gh_vrs --offline --no_stats --xref_refseq --verbose
--cache --dir_cache {bound_path} --fasta {path}
--input_file {path} --output_file {path}

Full error message

n/a -- VEP does not error.

Data files (if applicable)

Full output record: {"nearest":["ENST00000454757"],"most_severe_consequence":"intron_variant","transcript_consequences":[{"ccds":"CCDS5514.1","gene_id":"ENSG00000146648","trembl":["Q75MF2_HUMAN","I3WA73_HUMAN","I3WA72_HUMAN","G9MC81_HUMAN","F1JTL6_HUMAN","E9PFD7_HUMAN","C9JYS6_HUMAN","A7VN06_HUMAN"],"source":"Ensembl","given_ref":"C","uniparc":["UPI000003E750"],"used_ref":"C","gene_symbol_source":"HGNC","consequence_terms":["intron_variant"],"gene_symbol":"EGFR","hgnc_id":3236,"protein_id":"ENSP00000275493","strand":1,"impact":"MODIFIER","intron":"26/27","biotype":"protein_coding","swissprot":["EGFR_HUMAN"],"gene_pheno":1,"transcript_id":"ENST00000275493","canonical":1,"variant_allele":-nan,"refseq_transcript_ids":["NM_005228.3"]},{"source":"Ensembl","given_ref":"C","uniparc":["UPI000011F91B"],"gene_id":"ENSG00000146648","trembl":["Q9H3D0_HUMAN","F1JTL6_HUMAN","C9JYS6_HUMAN"],"gene_symbol":"EGFR","consequence_terms":["intron_variant"],"hgnc_id":3236,"used_ref":"C","gene_symbol_source":"HGNC","impact":"MODIFIER","strand":1,"biotype":"protein_coding","intron":"17/17","protein_id":"ENSP00000410031","transcript_id":"ENST00000442591","variant_allele":-nan,"gene_pheno":1},{"gene_pheno":1,"transcript_id":"ENST00000454757","variant_allele":-nan,"protein_id":"ENSP00000395243","biotype":"protein_coding","intron":"26/27","impact":"MODIFIER","strand":1,"gene_symbol_source":"HGNC","used_ref":"C","gene_symbol":"EGFR","consequence_terms":["intron_variant"],"hgnc_id":3236,"gene_id":"ENSG00000146648","trembl":["Q75MF2_HUMAN","I3WA73_HUMAN","I3WA72_HUMAN","G9MC81_HUMAN","F1JTL6_HUMAN","E9PFD7_HUMAN","C9JYS6_HUMAN","A7VN06_HUMAN"],"source":"Ensembl","given_ref":"C","uniparc":["UPI00020655C0"]},{"strand":1,"impact":"MODIFIER","biotype":"protein_coding","intron":"25/25","protein_id":"ENSP00000415559","transcript_id":"ENST00000455089","variant_allele":-nan,"gene_pheno":1,"source":"Ensembl","uniparc":["UPI000050D030"],"given_ref":"C","gene_id":"ENSG00000146648","trembl":["Q504U8_HUMAN","I3WA73_HUMAN","I3WA72_HUMAN","G9MC81_HUMAN","F1JTL6_HUMAN","A7VN06_HUMAN"],"consequence_terms":["intron_variant"],"gene_symbol":"EGFR","hgnc_id":3236,"used_ref":"C","gene_symbol_source":"HGNC"},{"impact":"MODIFIER","strand":1,"biotype":"retained_intron","distance":1180,"gene_pheno":1,"variant_allele":-nan,"transcript_id":"ENST00000485503","gene_id":"ENSG00000146648","given_ref":"C","source":"Ensembl","used_ref":"C","gene_symbol_source":"HGNC","hgnc_id":3236,"gene_symbol":"EGFR","consequence_terms":["downstream_gene_variant"]},{"gene_id":"1956","given_ref":"C","source":"RefSeq","used_ref":"C","gene_symbol_source":"EntrezGene","hgnc_id":3236,"gene_symbol":"EGFR","consequence_terms":["intron_variant"],"protein_id":"NP_001333826.1","impact":"MODIFIER","strand":1,"biotype":"protein_coding","intron":"25/25","variant_allele":-nan,"transcript_id":"NM_001346897.2"},{"gene_id":"1956","source":"RefSeq","given_ref":"C","gene_symbol_source":"EntrezGene","used_ref":"C","consequence_terms":["intron_variant"],"gene_symbol":"EGFR","hgnc_id":3236,"protein_id":"NP_001333827.1","intron":"26/26","biotype":"protein_coding","strand":1,"impact":"MODIFIER","transcript_id":"NM_001346898.2","variant_allele":-nan},{"variant_allele":-nan,"transcript_id":"NM_001346899.2","protein_id":"NP_001333828.1","impact":"MODIFIER","strand":1,"intron":"25/26","biotype":"protein_coding","used_ref":"C","gene_symbol_source":"EntrezGene","hgnc_id":3236,"consequence_terms":["intron_variant"],"gene_symbol":"EGFR","gene_id":"1956","given_ref":"C","source":"RefSeq"},{"given_ref":"C","source":"RefSeq","gene_id":"1956","hgnc_id":3236,"consequence_terms":["intron_variant"],"gene_symbol":"EGFR","used_ref":"C","gene_symbol_source":"EntrezGene","impact":"MODIFIER","strand":1,"biotype":"protein_coding","intron":"26/27","protein_id":"NP_001333829.1","variant_allele":-nan,"transcript_id":"NM_001346900.2"},{"variant_allele":-nan,"transcript_id":"NM_001346941.2","protein_id":"NP_001333870.1","biotype":"protein_coding","intron":"20/21","impact":"MODIFIER","strand":1,"gene_symbol_source":"EntrezGene","used_ref":"C","hgnc_id":3236,"consequence_terms":["intron_variant"],"gene_symbol":"EGFR","gene_id":"1956","given_ref":"C","source":"RefSeq"},{"transcript_id":"NM_005228.5","variant_allele":-nan,"canonical":1,"protein_id":"NP_005219.2","impact":"MODIFIER","strand":1,"intron":"26/27","biotype":"protein_coding","used_ref":"C","gene_symbol_source":"EntrezGene","consequence_terms":["intron_variant"],"gene_symbol":"EGFR","hgnc_id":3236,"gene_id":"1956","source":"RefSeq","given_ref":"C"}],"id":"523693fc-239c-4f65-8448-6b764c7e6840","input":"7\t55269675\t55269675\tC/NAN\t+\t523693fc-239c-4f65-8448-6b764c7e6840","start":55269675,"variant_class":"indel","end":55269675,"seq_region_name":"7","strand":1,"assembly_name":"GRCh37","allele_string":"C/NAN"}

jamie-m-a commented 1 month ago

Hi @muppetjones

Apologies for the delay in responding, can you share your input please? I've tried using the input as described in the JSON output, and using VEP 112 it seems to appropriately be giving nan as the variant allele. But I just want to confirm using your exact input for this example.