PHI-base / phipo

Pathogen-Host Interaction Phenotype Ontology
Other
5 stars 5 forks source link

DPO-PHIPO alignment review #215

Open matentzn opened 4 years ago

matentzn commented 4 years ago

@Clare72 (FlyBase) has kindly started a review of alignments between DPO and PHIPO. Here are her notes:

DPO PHIPO Label DPO Label PHIPO Alignment Note
FBcv_0000424 PHIPO_0000019 cell death defective inviable cell logical 'inviable cell‚Äô sounds like it should map to FBcv:0000353 ‚cell lethal
FBcv_0001324 PHIPO_0000262 endocytosis defective abnormal pathogen host interaction endocytosis logical does not need to involve pathogens

@jseager7 What do you think?

jseager7 commented 4 years ago

'endocytosis defective' (FBcv_0001324) should probably map to 'abnormal endocytosis' (PHIPO:0000261) from the single-species branch, instead of 'abnormal pathogen host interaction endocytosis' (PHIPO:0000262) from the pathogen-host interaction branch.

In fact, I think external ontologies should just ignore the terms under 'pathogen-host interaction phenotype' (PHIPO:0000001) when mapping, especially if the other ontology isn't concerned with multi-organism interactions. As we've discussed before, most of these terms are just straight reimplementations of what already exists in the 'single species phenotype' branch (PHIPO:0000002); there's only a few examples, such as 'mutualism absent' (PHIPO:0000948) where the phenotype really is unique to pathogen-host interactions.

matentzn commented 4 years ago

We cant tell automated mapping systems what branches not to map - or at least not easily.. Ok, for every class you do not want to map, you will have to, at the very least, remove the EQ from.. I think multi-species EQs should not be equivalent to single species EQs (perhaps subsume under - future work). Will you remove the EQs for the cases that you feel should not map to anything?

jseager7 commented 4 years ago

We cant tell automated mapping systems what branches not to map - or at least not easily

Oh, for some reason I assumed that DPO were doing this mapping manually; I forgot that I'd added a definition on 'abnormal pathogen host interaction endocytosis'.

for every class you do not want to map, you will have to, at the very least, remove the EQ from.. I think multi-species EQs should not be equivalent to single species EQs

I'm in the process of making sure these multi-species terms are excluded from our pattern mapping process. I'm hoping to have this done soon, maybe in the next week. I'll also have to go through and remove any already-instantiated multi-species terms from the definition files.

matentzn commented 4 years ago

Great! Thank you! yes, all these mappings are from EQs.

ValWood commented 4 years ago

@Clare72 thank you for attempting this! Here are some thoughts. I'm a novice in the phenotype ontology area, but maybe there are ways to approach this process in a more scalable way.

  1. PHIPO pathogen-host interaction phenotypes First, James is completely correct about ignoring the multi-species branch. As such, these terms are designed only for use in a multi-species capacity where a 'pathogen' and a 'host' are included in the phenotypic observation.

PHIPO is a slightly quirky ontology. The pathogen-host interaction phenotype 'branch' is the core scope. Although we have now included other branches for pragmatic purposes (more below).

However, the pathogen-host interaction phenotype terms rather than being a 'straight implementation' of the 'single-species phenotype' describe (often very subtle changes) in pathogen-host interaction phenotypes. Examples include changes to host or pathogen processes observed in specific pathogen-host backgrounds. PHIPO:0001142 | normal pathogen protein localization to host chloroplast PHIPO:0000192 | presence of host HR induced by pathogen during biotrophic lifestyle PHIPO:0000182 | absence of host HR induced by pathogen during biotrophic lifestyle PHIPO:0001015 | decreased level of host callose deposition induced by pathogen

Mostly these terms describe changes in the observed phenotypes of things the pathogen is doing to the host (plant presently, but increasingly human too). The reason this level of detail is required is that the pathogen community are studying genotypic changes in interacting genes which can affect the phenotype in a host and/or pathogen strain-specific manner (i.e. mutate gene a in the pathogen strain x observe a phenotype, mutate plant gene b in strain y and the phenotype disappears or is ameliorated. However the same change might not apply in the same mutation for pathogen strain x' or plant strain y'). There is no other ontology covering this pathogen-host phenotypic interaction space (otherwise we would not develop PHIPO). So it is unlikely that this branch would have equivalent terms in other ontologies.

tangent: 'abnormal pathogen host interaction endocytosis'? looks as though it should not be in this branch. @jseager7 can you check if it has been used? if not I would suggest obsolete it. If we require it, it should be defined more clearly as something the symbiont is specifically doing to affect host endocytosis.

  1. The other PHIPO branches are describing 'pathogen-only' or 'host-only' phenotypes. The pathogen-only branch are usually cell-level (but sometimes population level, or even developmental for multicellular fungi). The cell-level phenotypes are equivalent to FYPO the fission yeast phenotype ontology (Although, note that PHI-base are funded in scope for important non-fungal pathogens, and non-plant hosts, excluding virus's).

    In the 'pathogen -only' space we would love to use a generic cell-level ontology but no such thing exists (yet!). FYPO is the closest but is currently fission yeast specific. However, importantly FYPO describes and logically defines cell-level phenotypes covering pretty much every broadly conserved cell-level process/morphology/anatomy relevant to eukaryotes (over 7000 terms).

My hope is/was (and I hope this is still the long-term plan/) that somehow we can somehow create a generic cell-level ontology based on FYPO. We would love to do this, but somehow we never get fully funded. I don't know how feasible this is from the current work of @mah11 and @matentzn but it would be phenomenally useful, and is hopefully still a long-term goal- even if it requires additional dedicated funding? If a species agnostic cell-level ontology existed we would largely mothball the 'pathogen only' branch of PHIPO. However, we are curating papers right now and therefore need to capture the data in a structured and scalable way that we can easily migrate later (migration requires at least an order of magnitude less resource than re-curation, especially at the level of detail where we are attaching specific genotypes to the phenotypes, an under-appreciated fact of how long curation takes).

  1. So, mapping from DPO to PHIPO might not be a good place to begin. PHIPO is either largely orthogonal concepts or is not very mature in the area that might possibly overlap with DPO. Terms for 'pathogen-only' phenotypes largely mirror FYPO as closely as possible and should be using the same logical defintions. FYPO would be much better as starting point to map any cell level concept. We should all aim map to the core cell-level concepts of FYPO IMO.

  2. We cant tell automated mapping systems what branches not to map - or at least not easily.. Ok, for every class you do not want to map, you will have to, at the very least, remove the EQ from.. I think multi-species EQs should not be equivalent to single species EQs (perhaps subsume under - future work). Will you remove the EQs for the cases that you fill should not map to anything?

OK, this is the bit where I get lost. I am assuming that mapping between phenotype ontologies can be fully automated using equivalent logical definitions? If not why is that?

Oh I meant to tag @mah11 @CuzickA @cmungall sorry (resending)

jseager7 commented 4 years ago

OK, this is the bit where I get lost. I am assuming that mapping between phenotype ontologies can be fully automated using equivalent logical definitions?

I think the mappings that were being reviewed were generated automatically by the uPheno system.

The problem is, I added logical definitions to many terms in the pathogen-host interaction branch, which were leading to them being picked up by uPheno and mapped to other ontologies, often when the single-species branch would've been more appropriate.

I'm in the process of excluding these pathogen-host interaction logical definitions, because currently uPheno's patterns have no logical way to distinguish, for example, 'abnormal pathogen host interaction endocytosis' from 'abnormal endocytosis' (because there's no concept of a pathogen-host interaction).

In the case where there is a corresponding GO term to describe some pathogen-host interaction process (e.g. 'entry into host'), I'm able to logically define those. But what I meant by 'reimplementation' is that many terms in the pathogen-host interaction branch seem to only exist so we can use single-species phenotypes in a pathogen-host interaction context in Canto – due to Canto's restriction about not allowing single-species phenotypes on pathogen-host interactions.

jseager7 commented 4 years ago

"abnormal pathogen host interaction phenotype" looks as though it should not be in this branch [...] If we require it, it should be defined more clearly as something the symbiont is specifically doing to affect host endocytosis.

@ValWood do you mean the term 'abnormal pathogen host interaction endocytosis'? (PHIPO:0000262) I checked Canto's export and it doesn't look like this term has been used in any existing curation sessions. We haven't used 'abnormal endocytosis' (PHIPO:0000261) either.

matentzn commented 4 years ago

Following this with great interest! Thanks @jseager7 @ValWood for discussing this. I would really like to start developing upheno pattern for pathogen host interactions as well!

ValWood commented 4 years ago

@ValWood do you mean the term 'abnormal pathogen host interaction endocytosis'? (PHIPO:0000262)

yes, sorry corrected

I also opened a ticket on the pHIPO tracker to look at this

ValWood commented 4 years ago

In the case where there is a corresponding GO term to describe some pathogen-host interaction process (e.g. 'entry into host'), I'm able to logically define those.

Maybe you can use the parent in the GO branch if you can't be more specific. https://www.ebi.ac.uk/QuickGO/term/GO:0044403 symbiotic process.

Note that this is defined broadly to include parasitsm and mutualism : A process carried out by gene products in an organism that enable the organism to engage in a symbiotic relationship, a more or less intimate association, with another organism. The various forms of symbiosis include parasitism, in which the association is disadvantageous or destructive to one of the organisms; mutualism, in which the association is advantageous, or often necessary to one or both and not harmful to either; and commensalism, in which one member of the association benefits while the other is not affected. However, mutualism, parasitism, and commensalism are often not discrete categories of interactions and should rather be perceived as a continuum of interaction ranging from parasitism to mutualism. In fact, the direction of a symbiotic interaction can change during the lifetime of the symbionts due to developmental changes as well as changes in the biotic/abiotic environment in which the interaction occurs. Microscopic symbionts are often referred to as endosymbionts.

jseager7 commented 4 years ago

Maybe you can use the parent in the GO branch if you can't be more specific. https://www.ebi.ac.uk/QuickGO/term/GO:0044403 symbiotic process.

That's useful to know about – it could help as a component of uPheno's eventual pathogen-host interaction patterns.

Sadly, I can't use 'symbiotic process' for more than one term in PHIPO (if I'm using patterns), because using the same pattern with the same GO term on multiple PHIPO terms will lead to reasoners treating all those PHIPO terms as logically equivalent.

@matentzn For the pathogen-host interaction terms, maybe it would help if we had some sort of uPheno pattern to place a process in the context of another process, for example 'abnormal biological process during/with biological process' – 'with' might be preferable since it doesn't impose any temporal constraints, it only implies some vague causal relation – that way we could compose 'symbiotic process' with other process phenotypes. I'm not sure how this would be modelled though, and maybe it would create more problems than it would solve. Maybe we've discussed this in the past?

jseager7 commented 4 years ago

Just to clarify, we might only need this composition between 'symbiotic process' and other non-symbiotic processes if we decide to keep the pathogen-host interaction versions of the single-species phenotypes. If we can come up with some other way for Canto to handle single-species phenotypes in a pathogen-host interaction context, then that should make many of those terms in the pathogen-host interaction branch redundant.

ValWood commented 4 years ago

I think the mappings that were being reviewed were generated automatically by the uPheno system.

So are mappings also generated between FYPO and other ontologies? If not, it seems odd to look at PHIPO first because it isn't a very mature ontology.

Maybe there will not be so much overlap between FYPO and other phenotype ontologies because the scopes are usually different (Micro vs Macro observations).

Looking at DPO it seems the cell level process terms use only GO slim terms so they should be relatively easy to align with FYPO

jseager7 commented 4 years ago

So are mappings also generated between FYPO and other ontologies? If not, it seems odd to look at PHIPO first because it isn't a very mature ontology.

The mappings are derived from uPheno patterns (I think), so it depends on how much of FYPO is currently mapped to uPheno patterns. Long-term, once FYPO is using uPheno patterns for most (or all) of its logical definitions, then it should make more sense for DPO to map to FYPO (although by that point, PHIPO should also map to FYPO by virtue of the pattern system anyway).

ValWood commented 4 years ago

if we decide to keep the pathogen-host interaction versions of the single-species phenotypes.

I don't understand this comment. we don't have pathogen-host interaction versions of single-species phenotypes (i.e nothing in the pathogen-host interaction branch should be a single species phenotype). We are only describing changes in things that the pathogen does to the host, or the host does to the pathogen in this branch. This is the core purpose of PHIPO and these terms will always be retained even if the single-species branch is done in another way in the long term. I can clarify on the next PHIPO call.

matentzn commented 4 years ago

The uPheno framework will always generate all possible mappings.. It was mere chance that we were looking at PHIPO-DPO (we are looking at dozens of combinations), because @Clare72 was way ahead of everyone doing her reviews..

FYPO is not yet following the uPheno patterns, but we (@mah11 and me) are getting close.

I will leave you two to sort out pathogen host model, but for now, as James says, only single species phenotypes should map to uPheno patterns.

jseager7 commented 4 years ago

I don't understand this comment. we don't have pathogen-host interaction versions of single-species phenotypes (i.e nothing in the pathogen-host interaction branch should be a single species phenotype).

@ValWood Sorry, I wasn't making myself clear. I should've provided an example. See below:

'increased RNA binding' (PHIPO:0000156) A single organism (molecular function??) phenotype in which occurrence of RNA binding by a gene product is increased, when a gene or the bound RNA sequence is mutated. The affected gene product may be encoded by the mutated gene, or by a different gene.

'increased pathogen host interaction RNA binding' (PHIPO:0000176) A pathogen host interaction (molecular function??) phenotype in which occurrence of RNA binding by a gene product is increased, when a gene or the bound RNA sequence is mutated. The affected gene product may be encoded by the mutated gene, or by a different gene.

As far as I can tell, based on the text definition, these are the same phenotype; it's just one of them occurs in a pathogen-host interaction and the other doesn't. They both seem to match the pattern 'abnormally increased rate of molecular function', but they can't both be assigned that pattern, or they'd be logically equivalent (which I assume they're not).

I always thought that the reason these two terms existed was because 'increased RNA binding' (PHIPO:0000156) couldn't be used to annotate metagenotypes, so we had to add a term in the pathogen-host interaction namespace to work around this. Is that at all accurate?

ValWood commented 4 years ago

These terms and definitions need clarifiying. I presume in these cases the pathogen is directly affecting the host RNA binding to something. I'm not sure because I haven't come across altered RNA binding as far as I know. But in the cases where it is valid, I was assuming you could use a combination of the GO term + GO:0044403 symbiotic process for the logical definitions

ValWood commented 4 years ago

I always thought that the reason these two terms existed was because 'increased RNA binding' (PHIPO:0000156) couldn't be used to annotate metagenotypes, so we had to add a term in the pathogen-host interaction namespace to work around this. Is that at all accurate?

this is true, but the implication is that the phenotypic change is occurring as a result in a change in the 'other species' (usually the pathogen) , caused by the pathogen host interaction.

For example (hypothetical situation because I'm not aware of any pathogen effectors altering RNA binding), a change in the pathogen strain x gene y in the interaction with a particular the plant species /genotype alters some specific RNA binding activity in the plant. If the change was due to an alteration in the plant gene this would be a single species phenotype.

Presumably "A pathogen host interaction phenotype" is defined somewhere as something along the lines: "an observed change in the phenotype of one species (pathogen or host) caused by an alteration in the genotype of the interacting species (pathogen or host). My understanding is that if this isn't the case the observation isn't a "pathogen-host interaction".

The fact that the change is caused by a change in the interacting species is the differntia between the pathogen-host interaction terms and the none pathogen-host interaction terms which look the same.

Also in the early days Alayne added some terms that were not required for annotation. These terms should be obsoleted. This could also be one of those terms. It might good to prioritise removal of any of the early additional which have not been used. We should only add things when we need them for annotations. James could you identify terms that are not yet used in the pathogen-host interaction branch and we can check if they are required?

will be addressed by https://github.com/PHI-base/phipo/issues/218

jseager7 commented 4 years ago

That really helps clarify things, thanks. Just as a note, the current definition of 'pathogen host interaction phenotype' in PHIPO is:

Any of the set of observable disease formation characteristics arising from the interaction of a potentially pathogenic organism with a host organism.

Which might not be saying the same thing as the definition you gave; I'm not sure.

Also, with regards to this:

the implication is that the phenotypic change is occurring as a result in a change in the 'other species' (usually the pathogen)...

Do we have any way to indicate what the 'other species' is? What if it's the host in one case and the pathogen in another? Is this information important enough to be captured in the annotation, perhaps as an extension?


James could you identify terms that are not yet used in the pathogen-host interaction branch and we can check if they are required?

Yes, I can do that. I'll make a new issue once I've located the terms.

ValWood commented 4 years ago

Which might not be saying the same thing as the definition you gave; I'm not sure. I'll open a ticket to review this definition. I'm not sure we have thought about it very deeply. We know what we mean but we probably struggled to define it!

Do we have any way to indicate what the 'other species' is? What if it's the host in one case and the pathogen in another? Is this information important enough to be captured in the annotation, perhaps as an extension?

Well it is captured in the annotation (via the genotype which includes species/strain/genotype)

..and the direction of the interaction should be obvious once we get below the grouping terms to those we use for annotation. These would usually specify increased host membrane blebbing during pathogen invasion supression of host PTI etc.

At this point it is clear that the pathogen is doing something to the host. For the terms we use, this should always be the case. I will make an issue.

will be addressed as part of this review https://github.com/PHI-base/phipo/issues/218

ValWood commented 4 years ago

sorry for turnign this into a long rambling thread !

there are certainly exceptions under pathogen host to my interactions , where another species is clearly involved but seem to me that they should be captures as a 'pathogen only phenotype'

pathogen entry into host phenotype

pathogen entry into host obviously requires a host, but I would only think this was an interaction if there were host variants which altered the pathogen entry. If the pathogen was mutated so that the host can not form a penetration shtucture this would not be a pathogen host interaction phenotype ?

@CuzickA is that correct?

I opened a separate ticket to discuss these cases. https://github.com/PHI-base/phipo/issues/219

CuzickA commented 4 years ago

Interesting ticket.

I agree it would be really useful to have a list of PHIPO terms not yet used in the pathogen-host interactions branch. Initially I created a bunch of terms for both branches based on FYPO and these may need readdressing now PHIPO is developing further.

@ValWood we have discussed in the past about some 'overlap' of terms between the single-species (pathogen or host only phenotypes) and the pathogen-host interaction branch and concluded that as PHI-Canto loads the two branches separately we almost needed to duplicate some of the terms across the branches with minor modifications.

The 'pathogen entry into host phenotype' would not be a 'pathogen only (single species branch) phenotype' because it is being annotated to a metagenotype. (Def: A pathogen colonization of host phenotype which affects the penetration of the pathogen into the body, tissues, or cells of the host organism (GO:0044409).)

The 'pathogen penetration across barrier phenotype' is a 'pathogen only (single species branch) phenotype' because it is annotated to a pathogen genotype. (Def: A single species phenotype which affects the pathogen's ability to penetrate a given experimental substance/membrane in an in vitro assay.)

These PHIPO terms were developed based on previously used PHI-base data recording penetration assays in planta or in vitro.

ValWood commented 4 years ago

when we use The 'pathogen entry into host phenotype' with a metagenotype, are we always using it in the context where the alterations to the host genotype change the ability of the pathogen to enter the host? In this case it is probably OK, but the terms could be more precise. I will transfer this example to the discussion about the scope of 'pathogen-host' interaction https://github.com/PHI-base/phipo/issues/219

CuzickA commented 4 years ago

when we use The 'pathogen entry into host phenotype' with a metagenotype, are we always using it in the context where the alterations to the host genotype change the ability of the pathogen to enter the host? In this case it is probably OK, but the terms could be more precise. I will transfer this example to the discussion about the scope of 'pathogen-host' interaction

I think we are currently only using it when an alteration has been made to the pathogen genotype.