ga4gh / vrs-python

GA4GH Variation Representation Python Implementation
https://github.com/ga4gh/vrs
Apache License 2.0
50 stars 27 forks source link

hgvs to vrs is returning valid results when hgvs has `IncorrectReferenceAllele` #364

Open larrybabb opened 6 months ago

larrybabb commented 6 months ago

In clinvar there's a variant NM_006087.3:c.900C>A (267781) that has no hgvs or spdi or location data. When I used the metakb variant normalizer service translate_from which uses the vrs-python translate from it returned a valid VRS object.

curl -X 'GET' \
  'https://normalize.cancervariants.org/variation/translate_from?variation=NM_006087.3%3Ac.900C%3EA&fmt=hgvs' \
  -H 'accept: application/json'

Response Body
{
  "query": {
    "variation": "NM_006087.3:c.900C>A",
    "fmt": "hgvs"
  },
  "warnings": [],
  "service_meta_": {
    "name": "variation-normalizer",
    "version": "0.8.1",
    "response_datetime": "2024-03-15T17:10:02.804011Z",
    "url": "https://github.com/cancervariants/variation-normalization"
  },
  "vrs_python_meta_": {
    "name": "vrs-python",
    "version": "2.0.0a2",
    "url": "https://github.com/ga4gh/vrs-python"
  },
  "variation": {
    "id": "ga4gh:VA.AO175l6scMggCBYXONYydcuvMsoZqNXi",
    "type": "Allele",
    "location": {
      "id": "ga4gh:SL.SoeOSfpr0PfwJu_akcSCZ7DMyDRodV-C",
      "type": "SequenceLocation",
      "sequenceReference": {
        "type": "SequenceReference",
        "refgetAccession": "SQ.k_G7nBWO-L7cKeMOjyJlibHhDn1Ts69Q"
      },
      "start": 899,
      "end": 900
    },
    "state": {
      "type": "LiteralSequenceExpression",
      "sequence": "A"
    }
  }
}

So, tried to lookup this variant in the clingen allele registry and found that it failed for the following reason

We were not able to parse, find, or, register allele using NM_006087.3:c.900C>A HGVS expression or CA Identifier.
The following information might be helpful to understand the reason.

Type of the error: IncorrectReferenceAllele

Explanation: Reference allele does not match for NM_006087.3[6985-0,6986+0), given=C, found=G.

Reference sequence:

Actual allele: G

Provided in the HGVS expression: C

Region: [6985-0,6986+0)

I did not investigate further to see if vrs-python ignores checking the Reference Alleles or not, because I assume it to be true.

I don't think vrs-python translate_from should accept hgvs expressions that contain referenceAlleles that do not actually match the nucleotides specified by the hgvs expression. Like the Clingen Allele Registry, we should probably throw an exception.

larrybabb commented 6 months ago

I think we should be checking reference alleles on all format types spdi, gnomad, beacon and hgvs.

korikuzma commented 6 months ago

@larrybabb I think this is related to #151 . Once added, I can update the normalizer's vrs-python endpoints (which haven't been updated in a long time) to accept kwargs