PHI-base / phipo

Pathogen-Host Interaction Phenotype Ontology
Other
5 stars 5 forks source link

logical definitions and reasoning #352

Open ValWood opened 3 years ago

ValWood commented 3 years ago

It might be time to think about instating logical definitions so that reasoning can be implemented to improve ontology maintenance.

@jseager7 @CuzickA Maybe we can work together to get logical definitions for the high-level terms in place. This should make subsequent ones easier.

Won't the upheno alignment provide logical definitions for most terms anyway?

jseager7 commented 3 years ago

@ValWood We already do have logical definitions instantiated where the terms match uPheno patterns. For example, see the 'Equivalent to' section of the sidebar on OLS for the term 'increased resistance to chemical' (link).

Unfortunately, this pattern mapping has followed an ad hoc approach – and generally bottom-up instead of top-down – because there are a lot of gaps in the phenotypes covered by uPheno, plus the grouping terms tend to be a poor fit for uPheno's patterns. Obviously we don't have to use uPheno to add logical definitions (we could do it ourselves; we can even create our own patterns).

The problem may just be a lack of knowledge on my part about how to logically define grouping terms with patterns, but it could also be a limitation of uPheno. I think it might become more clear when the pattern mapping spreadsheet is reviewed.

jseager7 commented 3 years ago

To elaborate, many of the single-species phenotypes are straightforward to map to patterns – often the problem here is that there's no ontology term to instantiate the variables of the patterns (basically there's no way to specify the differentia for the pattern). In cases where the pattern doesn't exist, it's usually pretty obvious what the pattern should be.

The real problems start in the pathogen-host branch, because so many of these phenotypes are qualified with spatial information (phenotypes occurring in a host), causal information (phenotypes caused by the pathogen) nested causal information (phenotypes caused by a host process caused by the pathogen), temporal information (phenotypes during pathogen penetration), and so on.

uPheno really struggles with these complex phenotypes at the moment, because currently the patterns follow a pre-compositional approach where every pattern needs to be extended with '…in location', '…during process', or similar. It would probably really help if they had the ability to compose patterns with other patterns (like annotation extensions), but they might argue in return that our phenotypes are too complex (it might be possible to have a union of equivalence axioms to logically define more complex phenotypes, but that's just me guessing).

Some of the pathogen-host phenotypes can be handled by adding process terms to GO, so that the term can be logically defined using the 'abnormal biological process' pattern (and its variations), but I'm not sure this is going to cover all the cases.

CuzickA commented 2 years ago

@ValWood and I had a brief chat about this yesterday. 1) We thought it would be good to have a 1 day meeting in Jan 2022 (AC, JS and VW) to look at uPheno patterns for the PHI-branch of terms. Particularly for the higher level PHI terms. The single species terms should be similar to FYPO. 2) In some cases we need to distinguish between similar terms present in both the single species branch and the PHI branch. For the PHI-branch terms the GO term 'GO:0051701 biological process involved in interaction with host' could be used in the logical definition. 3) Val had a couple of questions for @jseager7. How many PHIPO terms have been mapped? (I thought it was 58% from memory, are these just from single species branch?) Does the reasoner get run regularly and are logic error problems reported? (I wondered whether this was done by the ODK and OBO dashboard?)

Please feel free to add any comments here if I have missed anything :-)

jseager7 commented 2 years ago

For the PHI-branch terms the GO term 'GO:0051701 biological process involved in interaction with host' could be used in the logical definition.

That sounds sensible. Ideally, uPheno would use this term to create patterns for pathogen-host processes and so on. However, uPheno uses a pre-compositional approach for its patterns, so there could be maintainability issues with adding the context of a pathogen-host interaction as another dimension (worst case, every single pattern would need a pathogen-host interaction variant). uPheno could generalise the pattern to be something like 'biological process occurring during biological process', but the compositional problem will still be there. That's not really our problem to solve though, and I think we could always fall back on defining our own patterns (not just logical definitions) if we can't map to uPheno.

How many PHIPO terms have been mapped? (I thought it was 58% from memory, are these just from single species branch?)

According to the grant submission, 536 terms (58%) have logical definitions instantiated. I think there's only about 14 terms in the pathogen-host interaction phenotype branch that have logical definitions, all of which are processes of some kind, and I can't guarantee that they're correct. I'll paste a table summarising these terms below.

Does the reasoner get run regularly and are logic error problems reported? (I wondered whether this was done by the ODK and OBO dashboard?)

There are tests that the ODK runs on every full release. This consists of running some SPARQL queries and the ELK reasoner over the ontology. I don't really understand what the reasoning checks for, but the documentation for the ROBOT Reason command says this:

ROBOT will always perform a logical validation check prior to automatic classification. Formally, this is known as testing for incoherency, i.e. the presence of either a logical inconsistency or unsatisfiable classes. If either of these hold true, the reason operation will fail and robot will exit with a non-zero code, after reporting the problematic classes.

Any reasoning violations go into the 'reports' directory at the top of the repository.

jseager7 commented 2 years ago

Here's the terms from the pathogen-host interaction phenotype branch that are mapped to uPheno patterns. Note that the 'Pattern term ID' in the examples below is for the specific biological process that corresponds to the general biological process mentioned in the pattern name.

PHIPO term label PHIPO term ID uPheno pattern Pattern term ID
abnormal mutualism PHIPO:0000040 abnormal biological process GO:0085030
loss of mutualism PHIPO:0000207 abnormal absence of biological process GO:0085030
abolished pathogen cell to cell migration within host PHIPO:0000340 abnormal absence of biological process GO:0106259
premature pathogen cell to cell migration within host PHIPO:0000342 abnormally premature biological process GO:0106259
delayed pathogen cell to cell migration within host PHIPO:0000343 abnormally delayed biological process GO:0106259
pathogen penetration into host absent PHIPO:0000355 abnormal absence of biological process GO:0044409
increased pathogen penetration into host PHIPO:0000360 abnormally increased quality of biological process GO:0044409
delayed pathogen penetration into host PHIPO:0000361 abnormally delayed biological process GO:0044409
premature pathogen penetration into host PHIPO:0000362 abnormally premature biological process GO:0044409
absence of pathogen growth within host PHIPO:0000363 abnormal absence of biological process GO:0044114
delayed timing of pathogen growth within host PHIPO:0000366 abnormally delayed biological process GO:0044114
increased pathogen growth within host PHIPO:0000368 abnormally increased rate of biological process GO:0044114
mutualism absent PHIPO:0000948 abnormal absence of biological process GO:0085030
abolished pathogen growth within host PHIPO:0000952 abnormal absence of biological process GO:0044114