A general solution to the databases as ontologies problem in bioregistry

There is frequently a need to represent entities from a database as an ontology

See:

Limits of ontologies: How should databases be represented in OBO? presented by Chris Mungall
https://github.com/OBOFoundry/OBOFoundry.github.io/discussions/1981

There are a lot of factors to condense here but some key points

idspace overloading between a database and its ontology representation can cause issues
- e.g. ncbitaxon https://github.com/biopragmatics/bioregistry/issues/1044
- conversely making a new idspace can also confuse (compare PR:P12345 vs UniProtKB:P12345)
ideally we would like upstream database to own the ontology representation, in practice this is likely to never happen, necessitating competing-cooperating alternative translations with no agreement on axioms
bioregistry conflates URLs for humans with semantic URIs

I propose that the bioregistry datamodel is extended to include inlined sub-records for ontology or KG translations of databases. These subrecords would have additional metadata to indicate the source (3rd party vs official vs quasi-official)

One case would be 3rd party ontology rendering with reminted prefixed IDs:

ncbitaxon:
   url: <official NCBI URL>
   renderings:
     - provider: obo
        type: ontology
        documentation: ...
        subset: COMPLETE
        download_url: <OBO ontology PURL>
        prefixmap:
            NCBITaxon: <OBO PURL>
     - provider: umls
        ...
ncit:
   url: <official NCIT URL>
   renderings:
    -  provider: obo
        type: ontology
        documentation: ...
        subset: COMPLETE
        download_url: <OBO ontology PURL>
        prefixmap:
            NCIT: <OBO PURL>

These renderings could even be first class entries as far as the bioregistry UI is concerned, e.g. obo$NCBITaxon (but obviously this wouldn't be used as a prefix)

Another would be 3rd part ontology renderings where the same prefixes and URL expansions are used:

rhea:
   url: <official RHEA URL>
   renderings:
     - provider: biopragmatics
        type: ontology
        documentation: currently this includes all annotations but this is under discussion https://github.com/biopragmatics/pyobo/issues/170
        subset: COMPLETE

here there is no bespoke prefixmap so the standard RHEA ones would be used.

perhaps controversially:

uniprotkb:
   url: <official uniprot URL>
   renderings:
    -  provider: pr
        type: ontology
        documentation: PRO classes at "species-gene" level generally use same local ID as uniprotkb
        subset: OVERLAP
        bioregistry_entry: pr

here this would be a link between 2 existing overlapping bioregistry entries

This scheme could also be used for KG renderings of databases in formats that are more suited than OWL (e.g. kgx, rdfstar with owlstar semantics)

Note that in cases for entries that are "born" ontologies we would not curate this info, this would be considered a reflexive relation

I have not absorbed your proposal quite yet, but

bioregistry conflates URLs for humans with semantic URIs

While this is mostly true its not quite true conceptually:

"goche": {
    "contributor": {
      "email": "cthoyt@gmail.com",
      "github": "cthoyt",
      "name": "Charles Tapley Hoyt",
      "orcid": "0000-0003-4423-4370"
    },
    "description": "Represent chemical entities having particular CHEBI roles",
    "download_owl": "https://raw.githubusercontent.com/geneontology/go-ontology/master/src/ontology/imports/chebi_roles.owl",
    "example": "25512",
    "homepage": "https://github.com/geneontology/go-ontology",
    "name": "GO Chemicals",
    "pattern": "^\\d+$",
    "preferred_prefix": "GOCHE",
    "rdf_uri_format": "http://purl.obolibrary.org/obo/GOCHE_$1",
    "references": [
      "https://obo-communitygroup.slack.com/archives/C023P0Z304T/p1638472847049400",
      "https://github.com/geneontology/go-ontology/issues/19535"
    ],
    "repository": "https://github.com/geneontology/go-ontology",
    "synonyms": [
      "go.chebi",
      "go.chemical",
      "go.chemicals"
    ],
    "uri_format": "https://biopragmatics.github.io/providers/goche/$1"
  },

Check rdf_uri_format.

This does not entirely change the issue, just adding an additional layer.

biopragmatics / bioregistry

A general solution to the databases as ontologies problem in bioregistry #1104