Closed miseminger closed 6 months ago
Thanks, @miseminger! Is there a canonical way to resolve HGVS identifiers? Based on some other registries, I see e.g., http://reg.clinicalgenome.org/allele?hgvs=NP_003997.1:p.Trp24Cys as an endpoint where an HGVS identifier is interpreted. Is there any other that should be prioritized?
Hi @bgyori, thanks for the quick reply!
I haven't been able to find a canonical way to resolve (any HGVS format name), aside from manually parsing the name meaning based on HGVS's webpage guidelines, though several different databases use HGVS nomenclature. Your ClinGen link pattern is likely the best one to mention.
HGVS names each begin with an immutable reference sequence identifier, and it allows for a few different reference sequence types and sources, including references from RefSeq, Ensembl, and LRG.
The HGVS Python package can parse HGVS names into parts (link to example usage), though it was last updated in 2014 and might not be compatible with the latest HGVS version.
Update: found one.
The Mutalyzer3 API will take in an HGVS description and give back a JSON file parsing the description; see the /description_to_model/{description}
tab.
For example, the request URL for NM_000352.3:c.215A>G is: https://mutalyzer.nl/api/description_to_model/NM_000352.3%3Ac.215A%3EG.
Hi @miseminger, thanks for the suggestion. Ben fixed an issue with the automatic PR generation workflow and we now have a PR for your entry in #1035. FYI there was some previous discussion about adding HGVS and other related "languages" in #460
Just a heads up: it looks like there is an issue with the Bioregistry's resolver when handling local unique identifiers containing colons which we will have to fix before we can merge this. It's actually pretty timely, since we were just alerted to another issue with the resolver yesterday in #1034
Hi @miseminger, thanks for the submission - you can see it live at https://bioregistry.io/hgvs
Thanks, @cthoyt!
In looking at the Example Local Unique Identifiers just now, I noticed that two of them resolve, but the examples with parentheses in the reference sequence identifier part either give "InternalServerError" with message "Unknown reference" (3 examples) or "HgvsParsingError" (for NG_012337.3(NM_003002.4):r.(274g>u)).
I tried searching http://reg.clinicalgenome.org for "NG_012337.3:r.(274g>u)" and "NM_003002.4:r.(274g>u)" separately, but it came up empty. I'm suspecting ClinGen has records for many HGVS format mutations (>650 million as of 2018) but still missing a few?
All to suggest switching the default resolver to be Mutalyzer3, which can resolve all HGVS names, especially as it's listed on the HGVS Software page.
Thanks again!
Prefix
hgvs
Name
Human Genome Variation Society Nomenclature
Homepage
https://hgvs-nomenclature.org/stable/background/simple/
Source Code Repository
https://github.com/HGVSnomenclature/hgvs-nomenclature
Description
The HGVS Nomenclature is an internationally-recognized standard for the description of DNA, RNA and protein sequence variants. It is used to convey variants in clinical reports and to share variants in publications and databases.
The HGVS Nomenclature is administered by the HGVS Variant Nomenclature Committee (HVNC) under the auspices of the Human Genome Organization (HUGO).
License
No response
Publications
doi:10.1002/humu.22981 | pubmed:26931183
Example Local Unique Identifier
NP_003997.1:p.Trp24Cys
Regular Expression Pattern for Local Unique Identifier
No response
URI Format String
No response
Wikidata Property
P3331
Contributor Name
Madeline Iseminger
Contributor GitHub
miseminger
Contributor ORCiD
0000-0002-0548-891X
Contributor Email
miseming@sfu.ca
Contact Name
No response
Contact ORCiD
No response
Contact GitHub
No response
Contact Email
No response
Additional Comments
No response