EBISPOT / hancestro

https://ebispot.github.io/hancestro/
Creative Commons Attribution 4.0 International
6 stars 2 forks source link

Investigate migrating HANCESTRO countries/continents/regions from classes to instances #47

Open daniwelter opened 5 months ago

daniwelter commented 5 months ago

As per Chris Mungal's suggestion in the OBO Foundry Slack, modelling countries etc as classes is bad practice, as is our particular axiomatisation.

Review possible options for making countries et al instances, incl how to make axiomatisation better.

Before release, include details about reasoning based on Slack thread in comments in HANCESTRO docs for reference

daniwelter commented 5 months ago

Copy of Slack convo:

Hi Dani! Everything you need to know is in the explanation: The only thing a West Africa can be part of is an Africa (according to your ontology). part of is reflexive therefore every West Africa is a part of some West Africa this means that West Africa must be a kind of Africa Even without the reflexivity axiom, there are logical issues here - your universal restriction (all values from) is saying that West Africa is not part of the earth, which is problematic if part-of is transitive Debugging these kinds of things can be quite fun if you like logic puzzles but it’s better to avoid this sort of thing altogether by making your ontology more boring and conventional. Some general guidance on how to make boring ontologies that won’t surprise you like this, and other Mungall soapboxing: Never use reflexivity, see the RO guide https://oborel.github.io/obo-relations/reflexivity/ Never combine transitive properties and cardinality (universal restrictions included), see my blog https://douroucouli.wordpress.com/2021/03/24/avoid-mixing-parthood-with-cardinality-constraints/. Use the simplest possible subset of OWL, or even simpler. The profile of OWL expressible in obo format is sufficient for 99.99% of use cases in biological ontologies. Universal restrictions are largely useless, and most people misinterpret them. Like malicious sleeper agents they will sit their inertly in your ontology until some other axiom triggers them, causing unintended entailments for you and people that rely on your ontology (for maximum fun, abandon your ontology after getting it added to lots of import chains) Don’t model countries as classes. This is not me being an ontological fusspot. I know it’s handy because many ontology browsers don’t have good instance support. But you will just get into endless problems, and compound problems for others who use your ontology. If you really need to make a countries-as-classes version in order to work with legacy software, make this an alternative release, and manage countries as instances in your edit version. Reuse rather than reinvent, and reuse from outside OBO if it’s not biology or biology-adjacent Monkeying around with OWLMonkeying around with OWL Avoid mixing parthood with cardinality constraints We frequently have situations where we want to make an assertion combining parthood and cardinality. Here, cardinality pertains to the number of instances. Section 5.3 of the OWL2 primer has an exa… Mar 24th, 2021 (211 kB) https://douroucouli.wordpress.com/2021/03/24/avoid-mixing-parthood-with-cardinality-constraints/

Danielle Welter 3 days ago thanks for the comprehensive explanation @Chris Mungall :slightly_smiling_face: Anita identified the reflexive property as the culprit as well. It snuck into the ontology via one of our new imports; previously we only got our properties it from a clean RO import, where part_of isn't reflexive. We're looking at excluding the spurious import before we release

Alex Henderson 3 days ago @Chris Mungall Given the aforementioned enforced amnesia of Slack, is it possible to put this advice somewhere else we can bookmark? Thanks

Allen Baron 3 days ago @Chris Mungall can you explain the downsides of having countries as classes instead of instances?

Chris Mungall 1 day ago You asked for it :slightly_smiling_face: For many in OBO, the answer is simply that they are not clearly not classes (this is not something where ontological/philosophical arguments can be be made either way like for genes). But putting that aside, let’s say you still want to do it because reasons When attempting to model something as simple as “Zambia part-of Africa” (where Zambia is a class) you are forced to make a logical commitment. You can’t just state those triples (in OWL-DL). You literally and syntactically can’t. Sorry, I don’t make the rules. Try it in Protege if you don’t believe me. You are forced to make a statement about the relationship between two sets, because that’s the interpretation of classes in OWL You could choose (as HANCESTRO does*) to say “every Zambia in the universe is only ever a part of an Africa” (if it’s a part of anything at all) You could choose (as NCIT does) to say “every Zambia in the universe is part of some Africa” (they fudge by calling the relationship “conceptual part of” but this just compounds the weirdness) You could choose (as GAZ-countries does) to say “every Zambia is an instance of an Africa” There are other options, you could say “every Zambia is a part of exactly one Africa” Or you can be like SNOMED and say “every Zambia is an African country” (and omit a relationship between Zambia and Africa)

  1. Every single one of those options is fundamentally weird and deeply wrong. The SNOMED one is the least weird, but it’s actually the most useless because the country and continent are not connected. It’s still fundamentally wrong, because neither Zambia nor Africa are sets.… But OK, you can say, I don’t mind weird, I don’t care about this set business, I just want something that puts “Zambia” visually underneath “Africa” in my ontology browser. I’m going to just choose one as a matter of convention, and pretend the OWL semantics don’t exist…
  2. OK, now you have everyone choosing a different OWL pattern to represent this (as we have now), due to different reasons lost in the mist of time, and things are fundamentally incompatible, there’s no consistent way to query, different tools interpret this differently, users are even more confused by ontologies than they were before (if that is even possible)
  3. Perhaps you can coordinate all of these ontologies to keep doing things that are fundamentally wrong, but wrong in exactly the same way, a kind of weirdness pact, so tools will work, and you can give a consistent story to the poor users, who have already given up on us long ago…
  4. But then you discover that the choice has unintended logical consequences. E.g. the transitivity-only composition problem that HANCESTRO has yet to discover that I alluded to earlier. Maybe you think you’re playing it safe by using a “some” (as NCIT does). What could go wrong?
  5. Well, let’s say you want to query in the opposite direction, and get all the parts of Africa. You can do this with hasPart, because it’s the inverse of partOf, right? Nope. In fact every Zambia is a part of some Africa does not entail every Africa hasPart some Zambia. Remember, by modeling countries and continents as classes you have stepped through a looking glass into a world where there are many instances of each country, and OWL will believe everything you say, and infer nothing more. This is a problem because, for better or worse, our ontology stack is based on OWL.
  6. The straight up subclass between two classes approach as in GAZ has obvious issues too, and once you start combining this ontology with other ontologies you end up with incoherency (in the OWL sense) quickly, with Africa being entailed as a country or Zambia entailed as a continent. This is one of those rare cases where the ontology police are completely right, and in fact modeling something that a normal person would consider as an instance as a class is just wrong and leads you and your users to all kinds of odd places. Perhaps the final and most practical answer is that the people who are the experts on geography don’t model countries as classes. If you look at both specialized resources like Geonames, and generalist resources like Wikidata, they model countries as instances. If we avoid reusing the work that the domain experts have done, we can at least follow their modeling patterns. :nerd_face: 1

Danielle Welter 21 minutes ago @Chris Mungall thanks for taking the time to provide such a comprehensive explanation. I think my ontology modelling approaches are still very much "scarred" by decade-old "avoid individuals at all cost because complexity/reasoning constraints" issues that are no longer relevant these days. For time constraint reasons, I will go ahead with our upcoming release as we have a downstream ontology waiting on us but I will aim to move to a representation that's closer to the best practice you describe above, with countries, regions and continents as instances of the relevant classes. I would appreciate any advice on how to improve the country/region/continent relationships (we use a mix of part_of and located_in) as we will still need to maintain those - we do have use cases where we want to ask for all the countries in Western Africa for example.