inferring stages from observations of traits

stuckyb commented 7 years ago

This issue is for the problem of inferring phenological stages and/or traits from count data on trait observations.

First, I quote an email from @jdeck88 on March 12:

So, i'm following up with a type of inference Ramona and I discussed last week related to presence/absence , where given “leaf presence” and upper count/lower count values in the measurement datum we should infer the class ‘leaves present’ -- The previous tests were based on simply saying 'whole plant' 'has quality' 'leaves present' and were fine for what they are don't incorporate the presence/absence of leaves.

Anyway, i don't see any axioms involving the measurement datum in current PPO.

...

still don't see a way to incorporate the measurement datums (lower and upper counts) in the inferencing. Whats the plan here? What i'm looking for is looking at trait "leaf presence" with count of 0 zero leaves should NOT assert plant in leaf phenological stage. "leaf presence" with count of 1 or more leaves should assert plant in leaf phenological stage.

I responded on March 14:

You are correct that automatically going from count data to presence/absence qualities is not currently implemented in the ontology. I think we've held off on that because of the complexity. Remember that count data are the output of observing processes, so you'd need something like
'leaves present' is equivalent to:
'leaf presence' AND (
    'quality of' SOME (
        'whole plant' AND (
            'specified input of' SOME (
                'phenology observing process' AND (
                    'has specified output' SOME (
                        'measurement datum' AND (
                            'lower count' >0
                        )
                    )
                )
            )
        )
    )
)
It gets pretty ugly. (I also can't promise the above is correct without actually testing it.)

There is also a subtle problem here. Assuming we want to stick with the OWL EL profile for performance reasons, we are forced to use existential quantification (SOME) in the class restrictions rather than universal quantification with cardinality restrictions. This allows for cases where a single plant has multiple observations that might conflict with one another (e.g., one observer says it has leaves, but another says it doesn't). In those cases, the logical definition above will still infer 'leaves present', even though observers disagree about that. Of course, our existing logical definitions don't completely avoid this problem either, but I think the issue is less pernicious with our current structure.

We could experiment with those kinds of equivalency axioms, though, and perhaps we should. Is that something we want to pursue?

@robgur added:

I guess I do see a value in being able to assert that a trait value measurement of leaves>0 implies that leaves are present, within the context of the observation. If we have conflicting evidence, reporting that is useful.

And @ramonawalls said:

I agree that being able to do the inferencing based on data property values would be very good to have. Seems like it might be easier with a SWRL rule, but I don't know how those play with ELK. I also don't know how ELK treats inverse object properties (as you can guess - I don't use ELK much), so I hope Brian is able to unearth some information there. I can look into the rule option.

I think that is the state of the discussion up to this point, more or less.

jdeck88 commented 7 years ago

A comment regarding the SWRL rules... i would rather avoid using them since they add some complexity in terms of processing and i worry they may be less portable than relying solely on ELK & SPARQL. Also, we have at our disposal both the ontology creation process and instance data creation process so hopefully won't need it??

stuckyb commented 7 years ago

After thinking some more about my proposed solution (above), I realized there is another problem: if we say that 'leaves present' is equivalent to a 'leaf presence' quality of some plant for which there has been an observation with count value(s) > 0, then we also are stating that every instance of 'leaves present' corresponds with some trait observation and non-zero count data. That is obviously wrong -- there are/will be many unobserved instances of 'leaves present' in the world.

Might we be able to do better with a "subclass of" axiom? What if we were to say that this:

'leaf presence' AND (
   'quality of' SOME (
       'whole plant' AND (
           'specified input of' SOME (
               'phenology observing process' AND (
                   'has specified output' SOME (
                       'measurement datum' AND (
                           'lower count' >0
                       )
                   )
               )
           )
       )
   )
)

is a subclass of 'leaves present'? It's a bit of a mind bender since it appears to invert the expected parent/child class relationship, but we would more or less be defining a type for "leaves present on a plant that has observed leaf trait count data". Since we use a weaker "subclass of" axiom, we wouldn't get the strong entailment that all instances of 'leaves present' must have associated trait observation data.

What do you all think? Does that solve our problem? Does it introduce new problems that I've not yet noticed?

Note that this does not solve the logical problem I described in my earlier response. It still makes the implicit assumption that all trait data are "true" and it doesn't handle conflicting trait data.

robgur commented 7 years ago

Brian, I do think it will solve the immediate problem you raised, but am not sure if it also introduces new problems -- its complex enough to not be sure. I think @ramonawalls would be better poised to answer that question.

I think you and @jdeck88 are probably in the right solution space for solving this problem with how to express inverse relationships --- annoying to be stuck between slow and right, or fast and not so right. I think maybe I agree with #4 for to simply get some forward progress with a discussion about what to do later.

jdeck88 commented 7 years ago

I think your new subclass axiom makes sense, at least from the perspective of creating instances that could satisfy the stated criteria.
Also, i'm fine considering all trait data to be "true" at this stage... this seems to be a universal issue with the internet and LOD.

stuckyb commented 7 years ago

To back up a few steps, @jdeck88, I totally agree with your comment about SWRL rules. If we can find a satisfactory way to do what we need within OWL, that would be best.

Re: the proposed solution, I think it might be a viable path forward, but I'll wait until @ramonawalls weighs in before I work on implementation. One edit, though. The solution outlined above is not sufficient, because it doesn't indicate the connection between the observation and the trait being observed. We'd actually need something like this (again, with the caveat that I've not tested it):

'leaf presence' AND (
    'quality of' SOME (
        'whole plant' AND (
            'specified input of' SOME (
                'phenology observing process' AND (
                    'has specified output' SOME (
                        'measurement datum' AND (
                            'lower count' >0
                        )
                    )
                ) AND 'is about' SOME 'leaf presence'
            )
        )
    )
)

Here's something else I'll throw out. We already have convenience classes for plants that have particular traits; e.g., 'plant with leaves'. We could say that

'whole plant' AND (
    'specified input of' SOME (
        'phenology observing process' AND (
            'has specified output' SOME (
                'measurement datum' AND (
                    'lower count' >0
                )
            )
        ) AND 'is about' SOME 'leaf presence'
    )
)

is a subclass of 'plant with leaves', which would be another route to inferring the phenological stage from trait observations. The difference here is that we sidestep the 'quality of'/'has quality' inverse issue. The downside is that we can no longer infer that the instance of 'leaf presence' is also an instance of 'leaves present'. Plus, we will still encounter the inverse problem with 'specified input of'/'has specified input'.

stuckyb commented 7 years ago

I think this problem is solved and we can close this issue. Anyone disagree?

robgur commented 7 years ago

nope! close away.

On Tue, Jun 6, 2017 at 10:30 AM, stuckyb notifications@github.com wrote:

I think this problem is solved and we can close this issue. Anyone disagree?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PlantPhenoOntology/PPO/issues/28#issuecomment-306504180, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcc7G-ttBvm9ZCN0OOT5mliNcHnQgUUks5sBWKCgaJpZM4MinpG .

jdeck88 commented 7 years ago

agreed... ready to close!

stuckyb commented 7 years ago

Great -- closing.

PlantPhenoOntology / ppo

inferring stages from observations of traits #28