Closed saramsey closed 4 years ago
For any terms that are "semantic web native", I would definitely use the URI given by the creators. For any OBO library ontology, a term URI starts with http://purl.obolibrary.org/obo/
. To me, it creates confusion to change the URIs of these terms. From your examples, it would be incorrect for Oncotree or OMIM terms to have OBO PURLs, since these aren't part of the OBO library. But there are several OBO namespaces you listed under identifiers.org which should use OBO PURLs.
Similarly for foaf
, owl
, dc
, and skos
; these are published under specific prefixes.
The Biolink model has a file specifying prefix expansions: https://github.com/biolink/biolink-model/blob/master/context.jsonld
It's incomplete and needs some attention, but I think it is a good place to start.
Thanks for your reply! I appreciate your suggestions, as I am a newbie with semantic web stuff. Regarding OncoTree and OMIM, I should note that I found this in mondo.owl
:
<owl:equivalentClass rdf:resource="http://purl.obolibrary.org/obo/ONCOTREE_GINET"/>
similarly, I found this in mondo.owl
:
<owl:equivalentClass rdf:resource="http://purl.obolibrary.org/obo/OMIM_424500"/>
I am not assigning these identifiers OBO PURLs; they seem to come to me that way when I import MONDO.
Regarding OncoTree and OMIM, I should note that I found this in mondo.owl:
Thanks for pointing that out! This is a bug :-)
https://github.com/monarch-initiative/mondo-ingest/issues/199
Here is a line from mondo.owl
<owl:Class rdf:about="http://purl.obolibrary.org/obo/KEGG_05215">
containing a persistent URL, http://purl.obolibrary.org/obo/KEGG_05215
, that does not seem to work (takes me to the dreaded http://ontologies.berkeleybop.org/
page). That, plus the fact that this URL doesn't make explicit that the KEGG term is a KEGG pathway (and not, say, a KEGG disease or other semantic type) is why I have opted to use identifiers.org
for KEGG pathways; in this case I feel that their CURIE prefix, kegg.pathway
, is more informative.
Thank you for suggesting the context.jsonld
file; very helpful and I will study it. Right off the bat, there are a few items that I do not quite understand. For example on line 27,
"ExO": "http://example.org/UNKNOWN/ExO/",
I would have thought that the best purl for an ExO term would be something like
http://purl.obolibrary.org/obo/ExO_0000004
which does in fact work (resolves to the expected page on Ontobee).
In context.jsonld
, on line 34,
"HGNC": "http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=",
I cannot get that base URL to work (probably a PEBKAC issue). I wonder if it would be preferable to use:
"hgnc": "https://identifiers.org/hgnc:"
which does work; at least, this URL
https://identifiers.org/hgnc:9967
resolves to what I think is the current page:
https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/9967
I agree about ExO, I'm not sure why it's in there; it's some sort of placeholder. I think everything else in the file including "UNKNOWN" are also placeholders. I'm not sure what to prefer for HGNC; for many resources identifiers.org may be the best choice if they don't publish any preferred URI form. Hopefully some other folks can weigh in on some of the non-OBO, bio database stuff.
For any OBO library ontology, a term URI starts with http://purl.obolibrary.org/obo/.
Thanks for the suggestion. I would like to look into following this advice. But I confess I am at a bit of a loss as to how to know if a particular ontology is an "OBO library" ontology or not. Is there a definitive list? I know that OBO Foundry has a list http://www.obofoundry.org/ but I am not sure if that is what you mean by "OBO library" ontology. On the other hand, I guess I could look for any "OBO namespace" URI appearing in one of the many OWL ontologies that I am loading, but that seems fraught since many of the URIs don't resolve. Perhaps I could look at the YAML files in https://github.com/OBOFoundry/purl.obolibrary.org/tree/master/config, but I am not sure if that is the right thing to do either, since there are ontologies in there (e.g., NCIt) that I do not load from OBO but rather from the UMLS distribution. Please pardon my ignorance!
@saramsey the official OBO registry source is here: https://github.com/OBOFoundry/OBOFoundry.github.io/tree/master/ontology
(the other repo is redirect configs, but it's not exactly the same)
Those are compiled into these files which can be used in software: https://github.com/OBOFoundry/OBOFoundry.github.io/tree/master/registry
NCIT is there because there is an "NCIT OBO edition" that has all the same terms but is structured more in line with OBO conventions. There are also related files that provide some deeper integration with Uberon and other OBOs.
Thank you, @balhoff! OK, I have already switched to using purl.obolibrary.org
(instead of identifiers.org
) for the CL
and UBERON
ontologies and I'm in the process of switching the others.
OK, here is the list of CURIE prefix to URL mappings that I am now using (it is a YAML file). If I have failed to identify any OBO ontologies that should be mapped to obolibrary PURLs, I'd be grateful for a heads-up.
use_for_bidirectional_mapping:
-
AIR: https://identifiers.org/umls/AIR/
-
bao: "https://identifiers.org/bao:"
-
BFO: http://purl.obolibrary.org/obo/BFO_
-
BSPO: http://purl.obolibrary.org/obo/BSPO_
-
BTO: http://purl.obolibrary.org/obo/BTO_
-
biolink: https://w3id.org/biolink/
-
CARO: http://purl.obolibrary.org/obo/CARO_
-
CGNC: "http://birdgenenames.org/cgnc/GeneReport?id="
-
CHEBI: http://purl.obolibrary.org/obo/CHEBI_
-
CL: http://purl.obolibrary.org/obo/CL_
-
clinicaltrials: "https://identifiers.org/clinicaltrials:"
-
CLO: http://purl.obolibrary.org/obo/CLO_
-
CP: http://purl.obolibrary.org/obo/CP_
-
dbpedia: http://dbpedia.org/resource/
-
dc: http://purl.org/dc/elements/1.1/
-
DDANAT: http://purl.obolibrary.org/obo/DDANAT_
-
DOID: http://purl.obolibrary.org/obo/DOID_
-
ecogene: "https://identifiers.org/ecogene:"
-
ECTO: http://purl.obolibrary.org/obo/ECTO_
-
efo: "https://identifiers.org/efo:"
-
EnsemblGenomes: http://www.ensemblgenomes.org/id/
-
ENVO: http://purl.obolibrary.org/obo/envo#
-
EO: http://purl.obolibrary.org/obo/EO_
-
ExO: http://purl.obolibrary.org/obo/ExO_
-
FAO: http://purl.obolibrary.org/obo/FAO_
-
FBbt: http://purl.obolibrary.org/obo/FBbt_
-
FBgn: http://flybase.org/reports/FBgn
-
FBdv: http://purl.obolibrary.org/obo/FBdv_
-
FMA: http://purl.obolibrary.org/obo/FMA_
-
foaf: http://xmlns.com/foaf/0.1/
-
FOODON: http://purl.obolibrary.org/obo/FOODON_
-
GARD: http://purl.obolibrary.org/obo/GARD_
-
GO: http://purl.obolibrary.org/obo/GO_
-
hgnc: "https://identifiers.org/hgnc:"
-
HP: http://purl.obolibrary.org/obo/HP_
-
iao: http://purl.obolibrary.org/obo/IAO_
-
icd: "https://identifiers.org/icd:"
-
ICD9: http://purl.obolibrary.org/obo/ICD9_
-
identifiers_org_registry: "https://identifiers.org/registry/"
-
IDO: http://purl.obolibrary.org/obo/IDO_
-
kegg.disease: "https://identifiers.org/kegg.disease:H"
-
kegg.pathway: "https://identifiers.org/kegg.pathway:hsa"
-
MA: http://purl.obolibrary.org/obo/MA_
-
meddra: "https://identifiers.org/meddra:"
-
medgen: "https://identifiers.org/medgen:"
-
mesh: "https://identifiers.org/mesh:"
-
MF: http://purl.obolibrary.org/obo/MF_
-
MFOEM: http://purl.obolibrary.org/obo/MFOEM_
-
MFOMD: http://purl.obolibrary.org/obo/MFOMD_
-
MGI: "https://identifiers.org/MGI:"
-
MOD: http://purl.obolibrary.org/obo/MOD_
-
MONDO: http://purl.obolibrary.org/obo/MONDO_
-
MP: http://purl.obolibrary.org/obo/MP_
-
MPATH: http://purl.obolibrary.org/obo/MPATH_
-
NBO: http://purl.obolibrary.org/obo/NBO_
-
ncbigene: "https://identifiers.org/ncbigene:"
-
ncit: "https://identifiers.org/ncit:"
-
NPO: http://purl.obolibrary.org/obo/NPO_
-
OBA: http://purl.obolibrary.org/obo/OBA_
-
OBAN: http://purl.org/oban/
-
OBI: http://purl.obolibrary.org/obo/OBI_
-
OBO: http://purl.obolibrary.org/obo/
-
OBOREL: "http://purl.org/obo/owl/OBO_REL#"
-
OGMS: http://purl.obolibrary.org/obo/OGMS_
-
OIO: http://www.geneontology.org/formats/oboInOwl#
-
OMIM: http://purl.obolibrary.org/obo/OMIM_
-
OMIMDiseaseCluster: http://purl.obolibrary.org/obo/DC_
-
OMIMPS: http://purl.obolibrary.org/obo/OMIMPS_
-
OMIT: http://purl.obolibrary.org/obo/OMIT_
-
OMOP: https://athena.ohdsi.org/search-terms/terms/
-
OncoTree: http://purl.obolibrary.org/obo/ONCOTREE_
-
OPL: http://purl.obolibrary.org/obo/OPL_
-
orphanet: "https://identifiers.org/orphanet:"
-
owl: http://www.w3.org/2002/07/owl#
-
PATO: http://purl.obolibrary.org/obo/PATO_
-
PO: http://purl.obolibrary.org/obo/PO_
-
pombase: "https://identifiers.org/pombase:"
-
PR: http://purl.obolibrary.org/obo/PR_
-
rdf: https://www.w3.org/TR/2004/REC-owl-guide-20040210/#
-
rdfs: http://www.w3.org/2000/01/rdf-schema#
-
REPODB: http://apps.chiragjpgroup.org/repoDB#
-
RGD: "https://identifiers.org/rgd:"
-
RO: http://purl.obolibrary.org/obo/RO_
-
RTX: http://rtx.ai/identifiers#
-
RTXKG1: http://arax.rtx.ai/
-
sgd: "https://identifiers.org/sgd:"
-
SIO: http://semanticscience.org/resource/SIO_
-
skos: http://www.w3.org/2004/02/skos/core#
-
snomedct: "https://identifiers.org/snomedct:"
-
SO: http://purl.obolibrary.org/obo/SO_
-
NCBITaxon: http://purl.obolibrary.org/obo/NCBITaxon_
-
TO: http://purl.obolibrary.org/obo/TO_
-
TUI: https://identifiers.org/umls/STY/
-
UBERON: http://purl.obolibrary.org/obo/UBERON_
-
umls: "https://identifiers.org/umls:"
-
UMLS: https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/
-
UO: http://purl.obolibrary.org/obo/UO_
-
wb: "https://identifiers.org/wb:"
-
wikidata: "https://identifiers.org/wikidata:"
-
ZEA: http://purl.obolibrary.org/obo/ZEA_
-
ZFA: http://purl.obolibrary.org/obo/ZFA_
-
zfin: "https://identifiers.org/zfin:"
@saramsey looking good! I have a few comments:
ENVO: http://purl.obolibrary.org/obo/envo#
should be http://purl.obolibrary.org/obo/ENVO_
(but I wouldn't be surprised if there are a few resources inside ENVO that start with http://purl.obolibrary.org/obo/envo#
, usually those are ontology subset tags)GARD
is an OBO ontologymesh: http://id.nlm.nih.gov/mesh/
rdf
should be http://www.w3.org/1999/02/22-rdf-syntax-ns#
Should your list and the biolink-model list be coordinated going forward? Just wondering if there are cases where you need to diverge, or if the biolink one shouldn't just be updated. There are clearly errors to be fixed in the biolink context.
Everything @balhoff says it correct, but it's not necessary for you to go to N different registries
See https://biolink.github.io/biolink-model/#identifiers
We provide a jsonld context that you can use https://biolink.github.io/biolink-model/context.jsonld
I see there a few issues with some of them @kshefchek is working on this..
@cmungall does that documentation mean that we should not make edits to https://github.com/biolink/biolink-model/blob/master/context.jsonld, but rather this has to be done at prefixcommons?
Yes, the context.jsonld is entirely derived. Ultimately the upstream registries are the authorities. But we can prioritize one authority over another if there is a clash.
And we can override and plug gaps directly:
We should probably have the jsonld context display in a more human friendly form in the derived documentation
Noting a need to be aware of the concepts of "external-base-uri" and "internal-base-uri" where KG's, ontologies and reasoning all benefit greatly from the nice regular consistent forms provided by third party resolvers which I am collectively referring to as "internal-base-uri".
In sometimes stark contrast are the irregular messy native "external-base-uri" which
The way forward I see (for publicly interfacing aspects) is to maintain both internal and external mapping for a common set of curie-prefixes and convert from and to as required.
Where required is typically to internal from external to make life easier and from internal to external for publicly publishing results without alienating our sources.
These are the mappings in common with the dipper curie_map.yaml that catch my eye as different. However both dipper's input and output is 100% public and as such may be a different different use case than a reasoner. But converging on common curie-prefixes is important in any case.
IAO http://purl.obolibrary.org/obo/IAO_
MGI http://www.informatics.jax.org/accession/MGI:
MUGEN http://bioit.fleming.gr/mugen/Controller?workflow=ViewModel&expand_all=true&name_begins=model.block&eid=
OMIM http://omim.org/entry/
OMIMPS http://www.omim.org/phenotypicSeries/
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
RGD http://rgd.mcw.edu/rgdweb/report/gene/main.html?id=
skos https://www.w3.org/TR/skos-reference/#
SNOMED http://purl.obolibrary.org/obo/SNOMED_
@saramsey looking good! I have a few comments:
ENVO: http://purl.obolibrary.org/obo/envo#
should behttp://purl.obolibrary.org/obo/ENVO_
(but I wouldn't be surprised if there are a few resources inside ENVO that start withhttp://purl.obolibrary.org/obo/envo#
, usually those are ontology subset tags)- I don't think
GARD
is an OBO ontology- MeSH provides an RDF version and uses
mesh: http://id.nlm.nih.gov/mesh/
rdf
should behttp://www.w3.org/1999/02/22-rdf-syntax-ns#
Thank you so much! I am going to fix these issues. If I find a suitable purl registry for GARD concepts, I will post it here.
- Have you come across OBOREL terms? Just curious, because this was the predecessor of RO and I'm wondering if these IDs are still being used.
An example, from efo.owl
:
<owl:onProperty rdf:resource="http://purl.org/obo/owl/OBO_REL#role_of"/>
and from hp.owl
:
<dc:source>http://www.obofoundry.org/ro/#OBO_REL:preceded_by</dc:source>
Should your list and the biolink-model list be coordinated going forward? Just wondering if there are cases where you need to diverge, or if the biolink one shouldn't just be updated. There are clearly errors to be fixed in the biolink context.
Yes, coordination makes sense. I'm not aware of any places where we have to diverge; just some changes we would like to propose that would trigger updates to the biolink context.jsonld
. If any true use-cases for divergence arise, I will definitely post here.
FWIW, here is the latest version of the CURIE<->URL mappings that we are using for ARAX KG2: (the first section, use_for_bidirectional_mapping
, is the one to look at; the other sections are for cleaning up incorrect or messed up URLs or CURIE prefixes that someone arose from upstream sources or during our ingestion processes):
https://github.com/RTXteam/RTX/blob/kg2-curie-refactoring/code/kg2/curies-to-urls-map.yaml
Noting a need to be aware of the concepts of "external-base-uri" and "internal-base-uri" where KG's, ontologies and reasoning all benefit greatly from the nice regular consistent forms provided by third party resolvers which I am collectively referring to as "internal-base-uri".
In sometimes stark contrast are the irregular messy native "external-base-uri" which
- our data sources actually produce and maintain
- the wider population (nontologists) expect to see.
- mashed-up/aggregated data from the wild is and will continue to use
The way forward I see (for publicly interfacing aspects) is to maintain both internal and external mapping for a common set of curie-prefixes and convert from and to as required.
Where required is typically to internal from external to make life easier and from internal to external for publicly publishing results without alienating our sources.
- N.B. Harold states he has been sued for changing identifier urls.
These are the mappings in common with the dipper curie_map.yaml that catch my eye as different. However both dipper's input and output is 100% public and as such may be a different different use case than a reasoner. But converging on common curie-prefixes is important in any case.
IAO http://purl.obolibrary.org/obo/IAO_ MGI http://www.informatics.jax.org/accession/MGI: MUGEN http://bioit.fleming.gr/mugen/Controller?workflow=ViewModel&expand_all=true&name_begins=model.block&eid= OMIM http://omim.org/entry/ OMIMPS http://www.omim.org/phenotypicSeries/ rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# RGD http://rgd.mcw.edu/rgdweb/report/gene/main.html?id= skos https://www.w3.org/TR/skos-reference/# SNOMED http://purl.obolibrary.org/obo/SNOMED_
Thank you @TomConlin ! I will specifically check these entries in the KG@ curies-to-urls-map.yaml
file. I note that the above URL does not seem to work, in my hands:
http://purl.obolibrary.org/obo/SNOMED_106562006
but I can confirm that SNOMED CT concepts are available in purl.bioontology.org
:
http://purl.bioontology.org/ontology/SNOMEDCT/106562006
FWIW, I am using the following sources to resolve URLs for identifiers with the above-referenced CURIE prefixes:
Where required is typically to internal from external to make life easier and from internal to external for publicly publishing results without alienating our sources.
N.B. Harold states he has been sued for changing identifier urls.
Wow! I was not aware of that. It's somewhat astounding that an upstream source would sue (as opposed to sending a C&D letter) an individual developer over using an 'internal URL' over an 'external URL'.
To be clear - threatened with suit. I'll not name the organization, but we assigned every organization an OID in the HL7 OID registry. So it was a C&D letter.
To be clear - threatened with suit. I'll not name the organization, but we assigned every organization an OID in the HL7 OID registry. So it was a C&D letter.
Very helpful. Thank you.
Closing this now as I believe it's well understood the canonical ID to URI expansion is here: https://biolink.github.io/biolink-model/#identifiers
open another issue if further clarification required!
I'm not sure where I should post this issue, so I will post it here and hope that it gets taken up in a future meeting of the Translator Data Modeling group.
For KG2 development, we are attempting to use persistent URIs wherever possible for identifying concepts. The Biolink model does not seem (?) to define which persistent URI systems are preferred. That is OK, it is not obvious that there exists one persistent URI system/registry that would work for all situations.
After a bunch of empirical testing, we have settled on a hierarchy of persistent URI remapping services which we are using, with the highest ones most preferred:
identifiers.org
(air
,bao
,bto
,chebi
,cl
,clinicaltrials
,doid
,ecogene
,efo
,eo
,fma
,foodon
,go
,hgnc
,hp
,iao
,icd
(ICD9),icd10
,ido
,kegg.disease
,kegg.pathway
,ma
,meddra
,medgen
,mesh
,mgi
,mod
,mp
,ncbigene
,ncit
,obi
,oborel
,omit
,orphanet
,pato
,po
,pombase
,pr
,rgd
,sgd
,snomedct
,so
,taxonomy
,uberon
,umls
,uo
,wb
,wikidata
,zfin
)w3id.org
(biolink
)purl.obolibrary.org
(bfo
,bspo
,caro
,clo
,cp
,ddanat
,ecto
,envo
,exo
,fao
,fbbt
,fbdv
,gard
,mf
,mfoem
,mfomd
,mondo
,mpath
,nbo
,oba
,ogms
,omim
,omimps
,oncotree
,opl
,to
,zea
,zfa
)purl.org
(dc
,oban
,oborel
)cgnc
,foaf
,oio
,omop
,owl
,sio
,skos
)Any thoughts on this? Interestingly, some of the registries overlap but do not always agree on the CURIE prefix or CURIE identifier format. We have opted to use CURIE prefixes from the above sources in the above priority order, since
identifiers.org
seems to be (by far) the most complete and we have found it to be easy to search and its documentation fairly intuitive. Anyhow, we are wondering what other Translator teams are using for their persistent URI mapping needs.