OBOFoundry / OBOFoundry.github.io

Metadata and website for the Open Bio Ontologies Foundry Ontology Registry
http://obofoundry.org
Other
164 stars 201 forks source link

Modeling metaclasses #2454

Open cthoyt opened 1 year ago

cthoyt commented 1 year ago

This is an epic issue about how to model metaclasses in ontologies

Charlie's original question

I asked a question on the OBO Foundry slack, but I think it would be valuable to duplicate it here

Do we have a standard pattern for modeling metaclass relationships? For example, if I wanted to create an ontology that contained both gene families and actual genes (I'm not going to do this since HGNC exists, so this just for argument). I have children of my parent "Gene" class, which consist of some gene families and other groupings that all have "is a" relationships between them. Eventually I have an actual specific gene, which "is a" on potentially one or more gene families. How do I make it clear which terms in my ontology are grouping terms and which ones are bottom-level genes? I guess I can look at which terms have no children "is a" relations, but this can also break in certain edge cases.

More generally, I guess the question is how do I denote which classes are metaclasses and which ones are regular classes? And does this question even make sense, since most modeling is pretty meta anyway?

One thing I thought of was to model the relationships between gene families and genes not with "is a" but with another relationship like "part of" or "component of". Maybe this works sometimes, but I think there might be situations where this won't make sense either.

Another example: NCBITaxon has some well-defined "ranks" which are annotated with a has rank relation

I'm also going to copy @cmungall's response here, since this probably warrants at least a blog post from his side (I hope!) and potentially some documentation into OBO Foundry best practices. Here's what he said:

There are two questions here: what property to use to link classes to metaclasses, and what vocabulary to use for the metaclasses themselves. I’ll stick to the former for now. This is all assuming you want to keep modeling as classes at all (which is not a foregone conclusion; many people find OBO unintuitive here). Here is how it is currently done:

This is a mess. Most people don’t realize it’s a mess because they don’t realize there is a common shape to what is currently a lot of bespoke hacks. This becomes quite pressing for things like genes where there is clearly a distinction between gene entities as denoted by HGNCs and the OBO approach of lacking a formal way to distinguish between the forms of “eukaryotic gene” and “sonic hedgehog gene upstream of a specifically modified region of DNA in the epidermal cell of my left pinky”. I would advocate for biolink:category or analogous property. It’s simple yet theoretically sound and avoids many metamodeling pitfalls (unfortunately W3C standards have a lot of unnecessary traps for us here, and while annoying, they can’t be ignored)

Other Resources

cmungall commented 1 year ago

Further thoughts in this slide deck

Note this is an active area of research, it's easy to get into modeling muddles here, see for example this paper: https://www.semantic-web-journal.net/system/files/swj3480.pdf

I contacted the authors and they are interested in our use case. One of them is speaking at this conference on metaclass modeling next week: https://jku-win-dke.github.io/MULTI2023/

They also said that in their wikidata analysis "gene" was the most problematic concept :-)