logical definitions and reasoning

ValWood commented 3 years ago

It might be time to think about instating logical definitions so that reasoning can be implemented to improve ontology maintenance.

@jseager7 @CuzickA Maybe we can work together to get logical definitions for the high-level terms in place. This should make subsequent ones easier.

Won't the upheno alignment provide logical definitions for most terms anyway?

jseager7 commented 3 years ago

@ValWood We already do have logical definitions instantiated where the terms match uPheno patterns. For example, see the 'Equivalent to' section of the sidebar on OLS for the term 'increased resistance to chemical' (link).

Unfortunately, this pattern mapping has followed an ad hoc approach – and generally bottom-up instead of top-down – because there are a lot of gaps in the phenotypes covered by uPheno, plus the grouping terms tend to be a poor fit for uPheno's patterns. Obviously we don't have to use uPheno to add logical definitions (we could do it ourselves; we can even create our own patterns).

The problem may just be a lack of knowledge on my part about how to logically define grouping terms with patterns, but it could also be a limitation of uPheno. I think it might become more clear when the pattern mapping spreadsheet is reviewed.

jseager7 commented 3 years ago

To elaborate, many of the single-species phenotypes are straightforward to map to patterns – often the problem here is that there's no ontology term to instantiate the variables of the patterns (basically there's no way to specify the differentia for the pattern). In cases where the pattern doesn't exist, it's usually pretty obvious what the pattern should be.

The real problems start in the pathogen-host branch, because so many of these phenotypes are qualified with spatial information (phenotypes occurring in a host), causal information (phenotypes caused by the pathogen) nested causal information (phenotypes caused by a host process caused by the pathogen), temporal information (phenotypes during pathogen penetration), and so on.

uPheno really struggles with these complex phenotypes at the moment, because currently the patterns follow a pre-compositional approach where every pattern needs to be extended with '…in location', '…during process', or similar. It would probably really help if they had the ability to compose patterns with other patterns (like annotation extensions), but they might argue in return that our phenotypes are too complex (it might be possible to have a union of equivalence axioms to logically define more complex phenotypes, but that's just me guessing).

Some of the pathogen-host phenotypes can be handled by adding process terms to GO, so that the term can be logically defined using the 'abnormal biological process' pattern (and its variations), but I'm not sure this is going to cover all the cases.

CuzickA commented 2 years ago

@ValWood and I had a brief chat about this yesterday. 1) We thought it would be good to have a 1 day meeting in Jan 2022 (AC, JS and VW) to look at uPheno patterns for the PHI-branch of terms. Particularly for the higher level PHI terms. The single species terms should be similar to FYPO. 2) In some cases we need to distinguish between similar terms present in both the single species branch and the PHI branch. For the PHI-branch terms the GO term 'GO:0051701 biological process involved in interaction with host' could be used in the logical definition. 3) Val had a couple of questions for @jseager7. How many PHIPO terms have been mapped? (I thought it was 58% from memory, are these just from single species branch?) Does the reasoner get run regularly and are logic error problems reported? (I wondered whether this was done by the ODK and OBO dashboard?)

Please feel free to add any comments here if I have missed anything :-)

jseager7 commented 2 years ago

For the PHI-branch terms the GO term 'GO:0051701 biological process involved in interaction with host' could be used in the logical definition.

That sounds sensible. Ideally, uPheno would use this term to create patterns for pathogen-host processes and so on. However, uPheno uses a pre-compositional approach for its patterns, so there could be maintainability issues with adding the context of a pathogen-host interaction as another dimension (worst case, every single pattern would need a pathogen-host interaction variant). uPheno could generalise the pattern to be something like 'biological process occurring during biological process', but the compositional problem will still be there. That's not really our problem to solve though, and I think we could always fall back on defining our own patterns (not just logical definitions) if we can't map to uPheno.

How many PHIPO terms have been mapped? (I thought it was 58% from memory, are these just from single species branch?)

According to the grant submission, 536 terms (58%) have logical definitions instantiated. I think there's only about 14 terms in the pathogen-host interaction phenotype branch that have logical definitions, all of which are processes of some kind, and I can't guarantee that they're correct. I'll paste a table summarising these terms below.

Does the reasoner get run regularly and are logic error problems reported? (I wondered whether this was done by the ODK and OBO dashboard?)

There are tests that the ODK runs on every full release. This consists of running some SPARQL queries and the ELK reasoner over the ontology. I don't really understand what the reasoning checks for, but the documentation for the ROBOT Reason command says this:

ROBOT will always perform a logical validation check prior to automatic classification. Formally, this is known as testing for incoherency, i.e. the presence of either a logical inconsistency or unsatisfiable classes. If either of these hold true, the reason operation will fail and robot will exit with a non-zero code, after reporting the problematic classes.

Any reasoning violations go into the 'reports' directory at the top of the repository.

jseager7 commented 2 years ago

Here's the terms from the pathogen-host interaction phenotype branch that are mapped to uPheno patterns. Note that the 'Pattern term ID' in the examples below is for the specific biological process that corresponds to the general biological process mentioned in the pattern name.

PHIPO term label	PHIPO term ID	uPheno pattern	Pattern term ID
abnormal mutualism	PHIPO:0000040	abnormal biological process	GO:0085030
loss of mutualism	PHIPO:0000207	abnormal absence of biological process	GO:0085030
abolished pathogen cell to cell migration within host	PHIPO:0000340	abnormal absence of biological process	GO:0106259
premature pathogen cell to cell migration within host	PHIPO:0000342	abnormally premature biological process	GO:0106259
delayed pathogen cell to cell migration within host	PHIPO:0000343	abnormally delayed biological process	GO:0106259
pathogen penetration into host absent	PHIPO:0000355	abnormal absence of biological process	GO:0044409
increased pathogen penetration into host	PHIPO:0000360	abnormally increased quality of biological process	GO:0044409
delayed pathogen penetration into host	PHIPO:0000361	abnormally delayed biological process	GO:0044409
premature pathogen penetration into host	PHIPO:0000362	abnormally premature biological process	GO:0044409
absence of pathogen growth within host	PHIPO:0000363	abnormal absence of biological process	GO:0044114
delayed timing of pathogen growth within host	PHIPO:0000366	abnormally delayed biological process	GO:0044114
increased pathogen growth within host	PHIPO:0000368	abnormally increased rate of biological process	GO:0044114
mutualism absent	PHIPO:0000948	abnormal absence of biological process	GO:0085030
abolished pathogen growth within host	PHIPO:0000952	abnormal absence of biological process	GO:0044114

PHI-base / phipo

logical definitions and reasoning #352