PlantPhenoOntology / ppo

An ontology for describing the phenology of individual plants and populations of plants, and for integrating plant phenological data across sources and scales.
16 stars 8 forks source link

Design pattern for absent X has complex OWL definitions that has unintended entailments #62

Open cmungall opened 5 years ago

cmungall commented 5 years ago

http://purl.obolibrary.org/obo/PPO_0002629

flowers absent EquivalentTo
 'plant structure presence' and 
 quality of some (
    whole plant and 
    not (has visible part some flower)) and
 'depends on structure' only (not (has visible part some flower))

Are you sure this means what you think it does?

ramonawalls commented 1 year ago

Finally catching up on some old issues. I reread the thread in UPheno, and I still think that the hierarchy we achieve in PPO is the correct one for our use case. Per this discussion in https://github.com/obophenotype/upheno/issues/330, we are not trying to describe a phenotype that occurs in an organism part but rather in a whole organism (e.g., leaf absent does not inhere in a leaf but in the whole plant). I think this is similar or equivalent to the "lacks all parts (PATO:0002000)" quality some users describe.

@cmungall, do you still have concerns about the way PPO is modeling absence? Is there now an agreed upon way to model lacks all parts that builds the correct hierarchies? I am certainly willing to consider changing our pattern.

@nicolevasilevsky

cmungall commented 1 year ago

I’m pretty confident this approach isn’t going to scale. I also think the pattern could be simplified - what does the second clause do?

I also don’t understand the use case. If the ontology is never going to be exposed to the user then why not just do this as database queries?

In my experience ontologies always leak. Even if your application hides the structure there are many paths by which a user may end up at a page like this https://www.ebi.ac.uk/ols4/ontologies/ppo/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPPO_0002622?lang=en — and this page will explode further as both broader and narrower plant structures are added to the pattern.

The ontobee display is seemingly more tame but actually quite confusing as it just picks a random path.

Agroportal seems generally confused https://agroportal.lirmm.fr/ontologies/PPO/?p=classes&conceptid=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FPPO_0002622&jump_to_nav=true

I think we in the ontology community confuse our users a lot in many different ways when we try and get too clever, I spend a lot of time explaining to biologists a lot of the tacit knowledge they need to interpret our ontologies (sorry not really directed at PPO which I think other than the inverted structure is very intuitive)

Perhaps there could be some way to flag these hierarchies so ontology browsers could behave appropriately.

Apologies for being such a trad ontologist. I just want my hierarchies to look like hierarchies!

I do have a proposal for a pattern that allows some limited inversion while maintaining groupings in a trad hierarchy that I’ll post on the upheno issue when I get a chance

On Tue, Apr 11, 2023 at 4:59 PM Ramona Walls @.***> wrote:

Finally catching up on some old issues. I reread the thread in UPheno, and I still think that the hierarchy we achieve in PPO is the correct one for our use case. Per this discussion in obophenotype/upheno#330 https://github.com/obophenotype/upheno/issues/330, we are not trying to describe a phenotype that occurs in an organism part but rather in a whole organism (e.g., leaf absent does not inhere in a leaf but in the whole plant). I think this is similar or equivalent to the "lacks all parts (PATO:0002000)" quality some users describe.

@cmungall https://github.com/cmungall, do you still have concerns about the way PPO is modeling absence? Is there now an agreed upon way to model lacks all parts that builds the correct hierarchies? I am certainly willing to consider changing our pattern.

@nicolevasilevsky https://github.com/nicolevasilevsky

— Reply to this email directly, view it on GitHub https://github.com/PlantPhenoOntology/ppo/issues/62#issuecomment-1504292439, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOIVWOBKK46FCT7KIDTXAXV55ANCNFSM4G5DM6MA . You are receiving this because you were mentioned.Message ID: @.***>

sbello commented 1 year ago

We last discussed this on 11/3/2022 during the phenotype editors call. We had talked about using the approach proposed by Sarah Alghamdi. The note are in the meeting minutes https://docs.google.com/document/d/1WrQanAMuccS-oaoAIb9yWQAd4Rvy3R3mU01v9wHbriM/edit?usp=sharing

As I recall David OS was going to think about this more. We need to circle back to this.

ramonawalls commented 1 year ago

Thanks for the thoughtful response, @cmungall. I'll address some of your point below, because it is good to document our thinking here where others (including our future selves) can read it.

I’m pretty confident this approach isn’t going to scale.

It has scaled remarkably well for our use case. We ingest hundreds of thousands of records of observations of plant and their parts into a graph database and infer if a structure is present or absent, based on the logical axioms in the ontology.

I also think the pattern could be simplified - what does the second clause do? I also don’t understand the use case.

We needed to make a subproperty of RO:depend on to specify domains and ranges.

I've copied some text from our paper that explains how that definition is used:

"For each “presence” trait class, there is also a pair of convenience subclasses that describe the common qualitative cases of a given plant structure being either present or absent....

"The class hierarchies for all present/absent convenience classes are inferred directly from the relationships of the plant structures on which they depend, thus ensuring that as long as the relationships among the plant structures are correct, inferences based on the present/absent trait classes will also be correct."

If the ontology is never going to be exposed to the user then why not just do this as database queries? In my experience ontologies always leak. Even if your application hides the structure there are many paths by which a user may end up at a page like this https://www.ebi.ac.uk/ols4/ontologies/ppo/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPPO_0002622?lang=en — and this page will explode further as both broader and narrower plant structures are added to the pattern. The ontobee display is seemingly more tame but actually quite confusing as it just picks a random path. Agroportal seems generally confused https://agroportal.lirmm.fr/ontologies/PPO/?p=classes&conceptid=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FPPO_0002622&jump_to_nav=true

Of course the ontology terms are exposed to users, but we don't want to expose the complex multi-hierarchy to them. With the relatively small amount of data we have now, we just provide the terms as a list. Going forward, we plan to offer a simpler hierarchy for both curation and search. We can also look at making a simpler release that will show up on portals like OLS and Agroportal.

We use Elastic Search queries, instead of database queries. As we move to a larger dataset (millions of observations from iNaturalist), we are probably going to use a DB for those and index them. We don't need to reason of those observations.

I think we in the ontology community confuse our users a lot in many different ways when we try and get too clever, I spend a lot of time explaining to biologists a lot of the tacit knowledge they need to interpret our ontologies (sorry not really directed at PPO which I think other than the inverted structure is very intuitive) Perhaps there could be some way to flag these hierarchies so ontology browsers could behave appropriately. Apologies for being such a trad ontologist. I just want my hierarchies to look like hierarchies!

We also like non-inverted hierarchies, but we are super happy with how our ontology is built and don't want to change the core of it. Yes, it is clever, but not too clever. Just clever enough for what we need. We plan to make more intuitive hierarchies available to our users for tagging their data or searching.

I do have a proposal for a pattern that allows some limited inversion while maintaining groupings in a trad hierarchy that I’ll post on the upheno issue when I get a chance

Please tag me when you post it!

I want PPO to be compatible with other OBO ontologies, but I think that our phenological traits are quite different from most biological traits in what they are describing (appearance of structures based on seasons and development, not genetic mutations), despite many similarities (nearly all characteristics are influenced by the environment). Therefore, I think it is most important that we be compatible with the PO and TO, and secondarily important to align with upheno.