biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
110 stars 48 forks source link

Add prefix [hgvs] #1032

Closed miseminger closed 6 months ago

miseminger commented 6 months ago

Prefix

hgvs

Name

Human Genome Variation Society Nomenclature

Homepage

https://hgvs-nomenclature.org/stable/background/simple/

Source Code Repository

https://github.com/HGVSnomenclature/hgvs-nomenclature

Description

The HGVS Nomenclature is an internationally-recognized standard for the description of DNA, RNA and protein sequence variants. It is used to convey variants in clinical reports and to share variants in publications and databases.

The HGVS Nomenclature is administered by the HGVS Variant Nomenclature Committee (HVNC) under the auspices of the Human Genome Organization (HUGO).

License

No response

Publications

doi:10.1002/humu.22981 | pubmed:26931183

Example Local Unique Identifier

NP_003997.1:p.Trp24Cys

Regular Expression Pattern for Local Unique Identifier

No response

URI Format String

No response

Wikidata Property

P3331

Contributor Name

Madeline Iseminger

Contributor GitHub

miseminger

Contributor ORCiD

0000-0002-0548-891X

Contributor Email

miseming@sfu.ca

Contact Name

No response

Contact ORCiD

No response

Contact GitHub

No response

Contact Email

No response

Additional Comments

No response

bgyori commented 6 months ago

Thanks, @miseminger! Is there a canonical way to resolve HGVS identifiers? Based on some other registries, I see e.g., http://reg.clinicalgenome.org/allele?hgvs=NP_003997.1:p.Trp24Cys as an endpoint where an HGVS identifier is interpreted. Is there any other that should be prioritized?

miseminger commented 6 months ago

Hi @bgyori, thanks for the quick reply!

I haven't been able to find a canonical way to resolve (any HGVS format name), aside from manually parsing the name meaning based on HGVS's webpage guidelines, though several different databases use HGVS nomenclature. Your ClinGen link pattern is likely the best one to mention.

HGVS names each begin with an immutable reference sequence identifier, and it allows for a few different reference sequence types and sources, including references from RefSeq, Ensembl, and LRG.

The HGVS Python package can parse HGVS names into parts (link to example usage), though it was last updated in 2014 and might not be compatible with the latest HGVS version.

miseminger commented 6 months ago

Update: found one.

The Mutalyzer3 API will take in an HGVS description and give back a JSON file parsing the description; see the /description_to_model/{description} tab.

For example, the request URL for NM_000352.3:c.215A>G is: https://mutalyzer.nl/api/description_to_model/NM_000352.3%3Ac.215A%3EG.

cthoyt commented 6 months ago

Hi @miseminger, thanks for the suggestion. Ben fixed an issue with the automatic PR generation workflow and we now have a PR for your entry in #1035. FYI there was some previous discussion about adding HGVS and other related "languages" in #460

Just a heads up: it looks like there is an issue with the Bioregistry's resolver when handling local unique identifiers containing colons which we will have to fix before we can merge this. It's actually pretty timely, since we were just alerted to another issue with the resolver yesterday in #1034

cthoyt commented 6 months ago

Hi @miseminger, thanks for the submission - you can see it live at https://bioregistry.io/hgvs

miseminger commented 6 months ago

Thanks, @cthoyt!

In looking at the Example Local Unique Identifiers just now, I noticed that two of them resolve, but the examples with parentheses in the reference sequence identifier part either give "InternalServerError" with message "Unknown reference" (3 examples) or "HgvsParsingError" (for NG_012337.3(NM_003002.4):r.(274g>u)).

I tried searching http://reg.clinicalgenome.org for "NG_012337.3:r.(274g>u)" and "NM_003002.4:r.(274g>u)" separately, but it came up empty. I'm suspecting ClinGen has records for many HGVS format mutations (>650 million as of 2018) but still missing a few?

miseminger commented 6 months ago

All to suggest switching the default resolver to be Mutalyzer3, which can resolve all HGVS names, especially as it's listed on the HGVS Software page.

Thanks again!