biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
115 stars 50 forks source link

Add prefix Orphanet #187

Closed matentzn closed 2 years ago

matentzn commented 2 years ago

Prefix

Orphanet

Name

Orphanet

Homepage

https://www.orpha.net

Description

Orphanet is a unique resource, gathering and improving knowledge on rare diseases so as to improve the diagnosis, care and treatment of patients with rare diseases.

Example Identifier

79154

Regular Expression Pattern

[A-Z0-9][0-9]+

Redundant Prefix in Regular Expression Pattern

No response

Provider Format URL

http://www.orpha.net/ORDO/Orphanet_

Contributor Name

Nico Matentzoglu

Contributor ORCiD

0000-0002-7356-1779

Additional Comments

There is already a lowercase orphanet namespace but it uses the wrong URL prefix as well as we prefer this to be upper case due to convention.

cthoyt commented 2 years ago

The Bioregistry already has two related prefixes, that are not the same:

  1. http://bioregistry.io/registry/orphanet which is the main orphanet vocabulary
  2. http://bioregistry.io/registry/orphanet.ordo which is the Orphanet Rare Disease Ontology

ordo is already tagged as a synonym of orphanet.ordo.

It looks like this request is mixing parts of each, using the prefix from the main vocabulary but the provider format URL for the ORDO vocabulary. I'm not really sure where the confusion comes from, but maybe it's a systematic mistake propagated from the Biolink Model's Biocontext (see the "Metaregistry" heading in http://bioregistry.io/registry/orphanet.ordo). I manually curated the mappings from that resource and noted that they were mix and matching before

matentzn commented 2 years ago

I dont so much care about the prefix here, but the URL prefix needs to be added somewhere so I can run conversions correctly!

cthoyt commented 2 years ago

I mentioned in that other issue (https://github.com/mapping-commons/sssom-py/issues/161) that I recently actually added this URI prefix in biopragmatics/bioregistry@21906c3. Is that sufficient?

matentzn commented 2 years ago

It is! Thats the important bit.

dhimmel commented 1 year ago

The Bioregistry already has two related prefixes, that are not the same

I'm confused about the difference between ORPHA and ORDO terms, so coming here for help. And possibly finding some disambiguating description that we can add to Bioregistry to help users.

On ORDO, there's a 2014 publication, only on ResearchGate, titled "ORDO: An Ontology Connecting Rare Disease, Epidemiology and Genetic Data". My understanding of this paper is that the authors took Orphanet and converted it to an OWL ontology. I don't find any discussion of whether the identifiers carry over or new identifiers are assigned. The https://github.com/Orphanet/Orpha2Ordo repo mentioned in the paper is gone.

The official page for ORDO appears to be at https://www.orphadata.com/ordo/. Any more insights into how Orphanet (ORPHA) terms are difference from ORDO terms? Does every ORDO term have a corresponding Orphanet term, but with different identifiers?

I arrived here when we noticed EFO has some cross-references to orphanet and some to orphanet.ordo post Bioregistry normalization.

matentzn commented 1 year ago

It is my understanding that the identifier space for the two is one and the same. I cant say for certain that there are not some ids in Orphanet that are not in ORDO, but the last five years I am working under the assumption that the semantic space is one and the same. I will check with the Monarch folks and see if I find out more.

cthoyt commented 1 year ago

Please take a look at the terms that are prefixed with a C - these appear to work in ordo but not in Orphanet (https://www.ebi.ac.uk/ols/ontologies/ordo/terms?iri=http://www.orpha.net/ORDO/Orphanet_C023)

cthoyt commented 1 year ago

There's definitely two semantic spaces here, with distinct patterns. The resources that can resolve one or both are hard to understand. This feels reminiscent of the omim vs omim.ps discussion

dhimmel commented 1 year ago

So orphanet.ordo:C023 for "age of onset" exists only in ODRO because it's a property key that can be applied to diseases in Orphanet, but is not a disease itself?

Identifiers do appear shared between both. For example, both the following will get to you to "Rare dyslipidemia":

Is the identifier set for orphanet.ordo a strict superset of the identifier set of orphanet? @Orphanet / Marc Hanauer any guidance you could provide here would be much appreciated.

With respect to bioregistry, we use it to normalize prefixes to enable more comprehensive mapping between resources. I see why having two separate Orphanet namespaces is technically correct here, but it would be good to have a recommendation to improve data linking. For example, perhaps all orphanet IDs should be converted to orphanet.ordo.

dhimmel commented 1 year ago

EFO case study

Looking at EFO OTAR Slim v3.57.0 (a version of EFO with a disease focus), we observe the following counts of cross-references (xrefs after bioregistry normalization):

All of the 55 orphanet.ordo xrefs appear to be valid orphanet IDs (table below). So for our use case, I'm tempted to convert these all to orphanet prefixes. Note that some EFO terms imported from Orphanet like Orphanet:107 actually xref themselves via orphanet.ordo:98702.

Another option would be to convert all orphanet prefixes to orphanet.ordo. However, it looks like EFO uses Orphanet and not ORDO as the prefix for the terms that they include from ORDO/Orphanet.

Expand for ORDO table efo_otar_slim_id | efo_label | xref_curie -- | -- | -- EFO:0000538 | hypertrophic cardiomyopathy | orphanet.ordo:217569 EFO:0007208 | Churg-Strauss syndrome | orphanet.ordo:183 EFO:0008506 | hyperparathyroidism | orphanet.ordo:99879 EFO:0008519 | primary hyperparathyroidism | orphanet.ordo:99878 EFO:0008559 | American trypanosomiasis | orphanet.ordo:3386 EFO:0008597 | anti-p200 pemphigoid | orphanet.ordo:454710 EFO:0008601 | pemphigus foliaceus | orphanet.ordo:79481 EFO:0008602 | paraneoplastic pemphigus | orphanet.ordo:63455 EFO:0008603 | pemphigus erythematosus | orphanet.ordo:79480 EFO:0009012 | Polyarteritis Nodosa | orphanet.ordo:767 EFO:0009201 | myopic macular degeneration | orphanet.ordo:178493 EFO:1000680 | mucous membrane pemphigoid | orphanet.ordo:46486 EFO:1001267 | Aortic Coarctation | orphanet.ordo:1457 EFO:1001293 | collagenous colitis | orphanet.ordo:36205 EFO:1001294 | lymphocytic colitis | orphanet.ordo:65279 EFO:1001295 | microscopic colitis | orphanet.ordo:58220 EFO:1001341 | Heavy Chain Disease | orphanet.ordo:86864 EFO:1001354 | Kleine-Levin Syndrome | orphanet.ordo:33543 EFO:1001376 | Necrobiotic Xanthogranuloma | orphanet.ordo:158011 EFO:1001383 | Opsoclonus-Myoclonus Syndrome | orphanet.ordo:1183 EFO:1001444 | Tularemia | orphanet.ordo:3392 EFO:1001445 | Tungiasis | orphanet.ordo:879 EFO:1001452 | Yellow Nail Syndrome | orphanet.ordo:662 EFO:1001467 | Hypereosinophilic syndrome | orphanet.ordo:168956 EFO:1001473 | Non-familial restrictive cardiomyopathy | orphanet.ordo:217720 EFO:1001477 | Systemic capillary leak syndrome | orphanet.ordo:188 EFO:1001485 | acromegaly | orphanet.ordo:963 EFO:1001795 | fusariosis | orphanet.ordo:228119 EFO:1001806 | macrophage activation syndrome | orphanet.ordo:158061 EFO:1001808 | manganese poisoning | orphanet.ordo:306682 EFO:1001809 | Marchiafava-Bignami Disease | orphanet.ordo:221074 EFO:1001814 | nephrogenic fibrosing dermopathy | orphanet.ordo:137617 EFO:1001822 | Paroxysmal Hemicrania | orphanet.ordo:157835 EFO:1001838 | renal nutcracker syndrome | orphanet.ordo:71273 EFO:1001842 | Serotonin Syndrome | orphanet.ordo:43116 EFO:1001849 | staphylococcal skin infections | orphanet.ordo:36236 EFO:1001856 | Susac Syndrome | orphanet.ordo:838 EFO:1001857 | Takayasu arteritis | orphanet.ordo:3287 EFO:1001882 | cutaneous nodular amyloidosis | orphanet.ordo:137810 EFO:1001897 | Morvan syndrome | orphanet.ordo:83467 EFO:1001982 | Antisynthetase syndrome | orphanet.ordo:81 EFO:1001983 | Autosomal recessive Charcot Marie Tooth diseas... | orphanet.ordo:466775 EFO:1001985 | congenital fibrosis of the extraocular muscles | orphanet.ordo:45358 EFO:1001987 | dropped head syndrome | orphanet.ordo:447881 EFO:1001989 | Monomelic amyotrophy | orphanet.ordo:65684 EFO:1001992 | Scapuloperoneal spinal muscular atrophy | orphanet.ordo:431255 EFO:1001999 | systemic juvenile idiopathic arthritis | orphanet.ordo:85414 EFO:1002000 | Takotsubo cardiomyopathy | orphanet.ordo:66529 MONDO:0004976 | amyotrophic lateral sclerosis | orphanet.ordo:803 MONDO:0015612 | Dent disease | orphanet.ordo:1652 Orphanet:107 | BOR syndrome | orphanet.ordo:107 Orphanet:167 | Chédiak-Higashi syndrome | orphanet.ordo:167 Orphanet:869 | Triple A syndrome | orphanet.ordo:869 Orphanet:931 | Acheiropodia | orphanet.ordo:931 Orphanet:98702 | Connective tissue disease with eye involvement | orphanet.ordo:98702