biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
114 stars 49 forks source link

Deconflate the sweetrealm entry #870

Open cmungall opened 1 year ago

cmungall commented 1 year ago

Background

SWEET is a highly modular ontology suite consisting of multiple sub-ontologies, covering the earth and environmental sciences. https://github.com/ESIPFed/sweet

This could be thought loosely as an OBO for the earth sciences

There are reasons not to include SWEET in bioregistry

  1. it's not bio. The environment community have historically been skeptical of tooling that has "bio" in the name even when that tooling is more general
  2. this community embraces linked data technology more than the bio community and are fine with using URIs for identifiers and don't really see a need for standard prefixes

The counterpoint to 2 is efforts to bridge, e.g. ENVO-SWEET SSSOM files:

And more recently the SWEET community registered >200 prefixes for each of the SWEET subontologies with prefixcc

Problems with existing bioregistry entry

bioregistry doesn't have an entry for sweet per se, but it does have:

https://bioregistry.io/registry/sweetrealm

In SWEET, there are around 25 sub-ontologies dealing with "realms", with one ("sorea") being the top-level:

prefix namespace
sorea http://sweetontology.net/realm/
soreaa http://sweetontology.net/realmAtmo/
soreaab http://sweetontology.net/realmAstroBody/
soreaabl http://sweetontology.net/realmAtmoBoundaryLayer/
soreaah http://sweetontology.net/realmAstroHelio/
soreaas http://sweetontology.net/realmAstroStar/
soreaaw http://sweetontology.net/realmAtmoWeather/
soreabb http://sweetontology.net/realmBiolBiome/
soreac http://sweetontology.net/realmCryo/
soreacz http://sweetontology.net/realmClimateZone/
soreaer http://sweetontology.net/realmEarthReference/
soreahb http://sweetontology.net/realmHydroBody/
soreal http://sweetontology.net/realmLandform/
soreala http://sweetontology.net/realmLandAeolian/
sorealc http://sweetontology.net/realmLandCoastal/
sorealf http://sweetontology.net/realmLandFluvial/
sorealg http://sweetontology.net/realmLandGlacial/
sorealo http://sweetontology.net/realmLandOrographic/
sorealp http://sweetontology.net/realmLandProtected/
sorealt http://sweetontology.net/realmLandTectonic/
sorealv http://sweetontology.net/realmLandVolcanic/
soreao http://sweetontology.net/realmOcean/
soreaofe http://sweetontology.net/realmOceanFeature/
soreaofl http://sweetontology.net/realmOceanFloor/
sorear http://sweetontology.net/realmRegion/
soreas http://sweetontology.net/realmSoil/

The example given on the bioregistry page is sweetrealm:ANOVA, which is a broken link, as it's not in realm at all, it's in sorepmst:

http://sweetontology.net/reprMathStatistics/ANOVA

(this link also seems to be broken but I think that's an issue with their servers, this is in fact the correct IRI)

Path forward

I think there are 3 options

  1. retract sweetrealm as broken and out of scope (leave open the possibility of cloning bioregistry infrastructure for environmental science)
  2. deprecate and have a global sweet prefix (consistent with bioportal)
  3. deprecate and register individual prefixes for each sub-ontology (> 200 new prefixes)
    • 3a: use the preferred prefixes registered by this community in prefix.cc
    • 3b: use the bioregistry/identifiers org convention of dot-separated prefixes, e.g. sweet.realmOceanFloor

I am not sure any of these are great tbh. My preference is 3b.

Of course we should do things in collaboration with the sweet community, @rduerr @pbuttigieg @kaiam @brandonnodnarb @lewismc

cthoyt commented 1 year ago

Bioregistry isn't necessarily limited to bio, despite its name, so I think having this system registered is fine, especially given that references appear in some OBO Foundry ontologies.

I'm not sure how I feel about the prefixes from your list, they seem pretty impenetrable. I would also go with the 3B approach and have some subspaces. There's no reason we can't add 200 prefixes to Bioregistry I can see, as long as we clean them up and make them more readable.

Note: the Bioregistry web site can be run with a fully custom data set that isn't the base Bioregistry (or one that's derived from base Bioregistry). We're doing that for our ASKEM project .