FoodOntology / foodon

The core repository for the FOODON food ontology project. This holds the key classes of the ontology; larger files and the results of text-mining projects will be stored in other repos.
Creative Commons Attribution 4.0 International
183 stars 36 forks source link

What is `SUBSET_SIREN`? #267

Closed cthoyt closed 1 year ago

cthoyt commented 1 year ago

While looking through the compliance of MONDO to the Bioregistry, I found that it was importing entities from a namespace called SUBSET_SIREN via FOODON (see issue https://github.com/monarch-initiative/mondo/issues/4638).

For example, http://purl.obolibrary.org/obo/FOODON_03309823 has a cross-reference to SUBSET_SIREN:F9823. Can someone please explain what this namespace is? Then, we can add it to the Bioregistry so others can re-use this domain knowledge!

ddooley commented 1 year ago

So we created that cross-reference early on without an understanding that there was an expectation to formalize the namespace. The SIREN database is just a text document containing a list of food items, the 5th file down in this list of files. This is its only internet presence.

https://www.langual.org/langual_indexed_datasets.asp

So perhaps we should turn this reference into a non-URI kind of annotation?

cthoyt commented 1 year ago

So you mean https://www.langual.org/download/IndexedDatasets/FDA/SIREN%20(updated).TXT?

cthoyt commented 1 year ago

Also, how do you think this relates to https://bioregistry.io/registry/langual?

ddooley commented 1 year ago

Langual has its own codes (which FoodOn does have dbxrefs to, where we spell out the whole URL so I appreciate seeing that it has its own prefix which it seems you added! The SIREN codes are different though - covering food products, which LanguaL doesn't itself contain. LanguaL's reference is to SIREN id items that have been indexed against langual codes.

cthoyt commented 1 year ago

Amazing, thank you so much for the clarification. This is a small, but important, step towards grand OBO Foundry semantic unification 🚀

ddooley commented 1 year ago

P.s. the SIREN codes were managed by NAL, the US National Agriculture Library as I recall, but I don't think actively developed.

ddooley commented 1 year ago

So should our DBXrefs be adjusted to "SIREN:F0000" rather than "SUBSET_SIREN:F0000" ?

cthoyt commented 1 year ago

I think this would be ideal - preferred siren:F0000 since this isn't an OBO ontology name

I already did some digging and found that SIREN was published by the FDA's Bureau of Foods - you can see everythign I found in the Bioregistry PR or tomorrow on https://bioregistry.io/siren after the nightly rebuild

ddooley commented 1 year ago

One followup - do you know a way to have protege actually link to the bioregistry.io/siren IRI in a database_cross_reference annotation? I presume there's no way to do that.

cthoyt commented 1 year ago

That's really interesting, I'm not sure how Protege makes links for database_cross_reference annotations. Typically, they're written as CURIEs inside strings, so I don't recall ever notice it linkifying them, whether they were OBO CURIEs or something else. It would be interesting to add some logic in Protege to use the Bioregistry to do this, but I am not familiar enough with the code to attempt such a thing