Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
445 stars 151 forks source link

Enabling shift_3prime option results in strange consequences (start_retained_variant in the middle of transcript?) #778

Closed tskir closed 4 years ago

tskir commented 4 years ago

I'm trying to normalise VEP output so that the consequences for a given variant are always the same, regardless of how it is shifted compared to the repeat region. However, the option shift_3prime, which was introduced in Ensembl/VEP 100 and could solve this issue, appears to produce very strange results in certain cases.

Running with default parameters

Request:

curl \
  -d '{"variants": ["1 237674093 . TAGG T"], "transcript_id": "ENST00000366574"}' \
  -H "Content-Type: application/json" rest.ensembl.org/vep/human/region

Response:

[
    {
        "input": "1 237674093 . TAGG T",
        "strand": 1,
        "transcript_consequences": [
            {
                "gene_symbol": "RYR2",
                "impact": "HIGH",
                "strand": 1,
                "cdna_end": 8929,
                "biotype": "protein_coding",
                "hgnc_id": "HGNC:10484",
                "protein_end": 2864,
                "gene_symbol_source": "HGNC",
                "gene_id": "ENSG00000198626",
                "cds_end": 8591,
                "consequence_terms": [
                    "splice_acceptor_variant",
                    "coding_sequence_variant"
                ],
                "transcript_id": "ENST00000366574",
                "variant_allele": "-"
            }
        ],
        "allele_string": "AGG/-",
        "most_severe_consequence": "splice_acceptor_variant",
        "id": ".",
        "assembly_name": "GRCh38",
        "end": 237674096,
        "seq_region_name": "1",
        "start": 237674094,
        "colocated_variants": [
            {
                "frequencies": {
                    "-": {
                        "gnomad_eas": 0,
                        "gnomad": 4.028e-06,
                        "gnomad_oth": 0,
                        "gnomad_afr": 0,
                        "gnomad_amr": 0,
                        "gnomad_asj": 0,
                        "gnomad_nfe": 8.887e-06,
                        "gnomad_sas": 0,
                        "gnomad_fin": 0
                    }
                },
                "strand": 1,
                "allele_string": "AGGAGGAGGA/AGGAGGA",
                "phenotype_or_disease": 1,
                "start": 237674094,
                "clin_sig_allele": "GGAGGA:uncertain_significance",
                "seq_region_name": "1",
                "end": 237674103,
                "clin_sig": [
                    "uncertain_significance"
                ],
                "id": "rs794728836"
            }
        ]
    }
]

Running with shift_3prime

Request:

curl \
  -d '{"variants": ["1 237674093 . TAGG T"], "transcript_id": "ENST00000366574", "shift_3prime": "1"}' \
  -H "Content-Type: application/json" rest.ensembl.org/vep/human/region

Response:

[
    {
        "colocated_variants": [
            {
                "frequencies": {
                    "-": {
                        "gnomad_oth": 0,
                        "gnomad": 4.028e-06,
                        "gnomad_eas": 0,
                        "gnomad_amr": 0,
                        "gnomad_afr": 0,
                        "gnomad_nfe": 8.887e-06,
                        "gnomad_asj": 0,
                        "gnomad_sas": 0,
                        "gnomad_fin": 0
                    }
                },
                "id": "rs794728836",
                "clin_sig": [
                    "uncertain_significance"
                ],
                "clin_sig_allele": "GGAGGA:uncertain_significance",
                "seq_region_name": "1",
                "end": 237674103,
                "start": 237674094,
                "phenotype_or_disease": 1,
                "allele_string": "AGGAGGAGGA/AGGAGGA",
                "strand": 1
            }
        ],
        "start": 237674094,
        "end": 237674096,
        "seq_region_name": "1",
        "assembly_name": "GRCh38",
        "id": ".",
        "most_severe_consequence": "start_retained_variant",
        "allele_string": "AGG/-",
        "strand": 1,
        "transcript_consequences": [
            {
                "transcript_id": "ENST00000366574",
                "consequence_terms": [
                    "start_retained_variant"
                ],
                "cds_end": 8591,
                "gene_id": "ENSG00000198626",
                "variant_allele": "-",
                "biotype": "protein_coding",
                "gene_symbol_source": "HGNC",
                "hgnc_id": "HGNC:10484",
                "protein_end": 2864,
                "cdna_end": 8929,
                "impact": "LOW",
                "strand": 1,
                "gene_symbol": "RYR2"
            }
        ],
        "input": "1 237674093 . TAGG T"
    }
]

Difference

With default parameters, the most severe consequence reported is "splice_acceptor_variant", which might not be necessarily biologically correct, but is at least sensible. However, with the shift_3prime option it changes to "start_retained_variant". Which raises questions:

tskir commented 4 years ago

Link to the region in Ensembl: https://www.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000198626;r=1:237674073-237674114;t=ENST00000366574;mr=1:237674094-237674096, with the deletion variant highlighted (one of three possible locations)

aparton commented 4 years ago

Hi @tskir ,

Thank you for this report. I've looked into this issue this morning, and I've been able to reproduce the issue and find a fix.

I'll let you know when the REST server has been updated with this change.

Kind Regards, Andrew

aparton commented 4 years ago

Hi @tskir,

This issue should now be resolved. Thank you for bringing it to our attention.

If you still see issues, or if you have any other questions, please let us know.

Kind Regards, Andrew