'part of' relation and associated logic

I'm creating this issue to document ongoing discussions about how we connect a 'portion of a plant' to a 'whole plant' and the logical inferences implied by that connection.

Here is the most recent email on this topic, from @ramonawalls :+1:

I've been having some serious conversations with some hard core ontologists about how to axiomatize what we want to describe with "is or was part of", and particularly with how to describe proper parthood (which no one seemed to understand why we needed, but they were willing to concede that maybe we do). Everyone was pretty certain that specifying irreflexivity of part of would not give us any useful reasoning power, primarily because it can simply be that the part is one molecule less than the whole. They seemed to think the best solution was to use part of as is, and possibly to add a class for the part, which is what we already did with portion of plant. We might be able to describe the class more fully as something like a part of a plant that is at least missing one shoot system from the whole plant, but that may not be necessary. This shouldn't have any bearing on what we have done so far, but I did affect what I said in the manuscript. I'm going to go through and makes some changes now to reflect the recent discussions, and I'll tag Brian so you can have a look and see if you agree.

Thanks for that, @ramonawalls . I have 4 comment on this topic.

First, for this discussion, it is important to remember that the methodology we are using to go from data about a portion of a plant to data about a whole plant is outside the scope of implementable description logics supported by OWL and associated reasoners. So arguments about logical entailments in OWL are not necessarily the main consideration here.

Second, to the immediate question of whether we can use the existing part of in RO, let me be sure I understand the argument here. The idea is that instead of having a property that explicitly states proper parthood, we get this implicitly by asserting that a portion of a plant is a whole plant minus something, and then, when presented with a statement asserting that some portion of a plant is part of some whole plant, we conclude we have a case of proper parthood because a portion of a plant cannot be a whole plant. If we are certain that the last bit can never be true (a portion of a plant cannot be a whole plant), then I think the logic works. However, here are two reasons to prefer the model with a property that explicitly expresses proper parthood.

It is logically and computationally simpler. With a property that explicitly asserts proper parthood, all downstream axioms and rules can be based on the assertion of that property with an object of whole plant (e.g., a single existential quantification axiom). With the alternative approach (i.e., reflexive part of), all downstream axioms and rules require both an existential quantification axiom and an intersection axiom. This might sound like a trivial difference, but as you know, I spent a lot of effort optimizing reasoning times with the PPO (including custom axiom manipulations in OntoPilot) to make large-volume reasoning possible. The complexity of a logical model matters a lot, and in my experience, small differences in axiom complexity can make huge differences in computing time. I don't know how much of an impact the difference discussed here would make without testing it, but it should at least be a consideration.
It provides a nice logical definition of portion of a plant. With an explicit "proper parthood" relation, we can give a concise logical definition of portion of a plant: plant structure AND is or was part of SOME whole plant. We lose that with a reflexive part of.

Third, we should keep in mind that given our current data model, part of doesn't give the temporal coverage we need. An herbarium specimen is no longer part of a plant. Now, I think we could address this by making the data model more complicated. E.g., phenology data are about an herbarium specimen (or whatever) that is derived from a portion of a plant that is a part of some whole plant. Similarly, for photographs, the data are about a photograph that is an image of a portion of a plant that is a part of some whole plant. I think that is all logically sound and it eliminates the need for was in the property, but at the cost of substantially increased model complexity, which usually has computational downsides (see above).

Fourth, I've been thinking about an extension to our logical model that might be another way to address some of these concerns. What if we added a way to record the proportion of a whole plant represented by a portion of a plant? E.g., this portion of a plant is 50% of the whole plant, but this one is only 10%. For herbarium specimens, that might not be so relevant, but for photographs it definitely could be. E.g., a single photograph of a tree documents ~50% of the whole plant. This information could ultimately be used to help clarify portion of plant / whole plant distinctions, and it could also be useful for attaching confidence scores to data generated for a whole plant from an observation of a portion of a plant. For instance, if an image of a tree does not show any flowers, we can be certain that the portion of a plant has no flowers, but with our current inferential model, we can't say anything about whether the whole plant has flowers. With proportion information, we could reasonably assert that we are 50% confident (or whatever) that the entire tree has no flowers. This last point might go into a separate issue if it is something we'd like to pursue.

Thanks for putting this here, @stuckyb. I will respond to your four points. Please bear in mind that I am in large part conveying the arguments of others, and until I have done some playing with this in Protege and the pipeline, I am not certain what the best solutions is. Also remember that the people giving advice are less familiar with our project than us, but have done a ton of similar work.

That was a part of their point - we can't fully implement this in OWL, so they didn't see any value in creating a proper part of relations.
Your assessment of the suggested logic is correct. Again, bear in mind it was just a suggestion - no guarantee it will work the way we want it or be easy to implement.

Regarding you concerns, the suggestion was not to use reflexive part of, but to use the current RO which is neither reflexive nor irreflexive (so when reasoning, it won't through an error for either type of instance). While I fully appreciate the requirement to keep axiomatization and minimal as possible, the argument was that using a proper_part_of relation would not actually work. We aren't interested in all proper parts of a whole plant, rather only in parts of a plant in which a significant portion of the plant is missing. I think for most of our traits, the minimal part that would need to be missing for us to not be able to infer absence on the whole plant is a shoot system (shoots system includes branches, flowers, and buds). Maybe it would need to be missing only a leaf, for leaf traits. We might create a relation that is called proper_part_of and define it to mean what I just described, but I was pretty convinced by Chris that a true proper part of relation is meaningless in many cases. That said, I can imagine that there might be a way to make it work.

I think I adequately addressed this temporal aspect in the manuscript, and we can use a combination of part_of and derived_from to deal with the temporal issue.

Along these lines, I'm not sure why we need to have a single relation that covers both cases, since we would normally know if something is part of or was part of. I guess it makes the ingest pipeline easier to not have to deal with two separate types of data, but I'm not sure how much.

This is interesting information to have, and could certainly help with thinks like assessing the true probability of absence. For example, if I can see 90% of a tree and it has not flowers, I am more confident that the whole tree has no flowers. However, I think this might be too much work at this point, until we have a solid use case for such data.

On the other hand, I think this corresponds in part to the definition discussed in point 2, which was that we define portion of plant based on what is missing from the plant, rather than just saying anything is missing.

Thanks, Ramona. Interesting points for sure. I am thoroughly convinced of at least one thing -- there are no obviously correct answers here.

One quick comment, though -- as I see it, the existing RO part of is reflexive. The definition clearly says so. Assuming the definition means what it says, then the absence of a corresponding logical axiom is a bug, not a feature, and we probably ought to treat it as such.

A point of clarification here regarding using is or was part of for connecting portion of a plant and whole plant. The reasoner will not be able to infer traits of whole plant based on traits of portion of a plant when we are using for example lower count on portion of a plant. For example, i assert an 'upper count' for a part of plant... We would NOT be able to infer 'upper count' for a whole plant. However, i would reasonably expect to say the 'lower count' of a 'portion of plant' should be at least the 'lower count' of the whole plant, but not necessarily the same number. Anyway, just wanted to add this comment to this thread to clarify what the boundaries are here...

We discussed this at our workshop, and we don't see any need to change it at this time.

See also the decision about scoring parts of plants in issue #68.

PlantPhenoOntology / ppo

'part of' relation and associated logic #61