Closed ValWood closed 1 year ago
Hi Val, I agree with you that the above IEP annotations should be "response to nitrogen starvation". In the particular case for those 7 plant fungus genes and taking into account the authors intend, it seems that these genes are expressed in response to nitrogen starvation and also during infection of the rice plant. This seems to be a physiological condition that likely allows fungi to colonize various ecological niches including infected host plants.
Now whether IEP should be used in this way, I will just paste what we have in our training material for IEP and it was taken from the wiki page (http://wiki.geneontology.org/index.php/Inferred_from_Expression_Pattern_(IEP)), which I noticed it is empty now. In that paragraph, I have highlighted the text that was actually stating that this code was the right one to use here.
Inferred from Expression Pattern (IEP)
_Where the annotation is inferred from the timing or location of expression of a gene.
Use this code with caution! It can be difficult determine whether the expression pattern really indicates that a gene plays a role in a given process.
Genes upregulated during a stress condition may be annotated to the process of stress response (for example, heat shock proteins)
Genes selectively expressed at specific developmental stages in specific organs may be annotated to xxx development
Transcript levels or timing (e.g. Northerns, microarray data)
Protein levels (e.g. Western blots)
IEP evidence code is usually used with high level GO terms in the biological process ontology.
Only the normal expression pattern should lead to an IEP annotation._
We are encourage all curators to check the wiki pages when choosing the most appropriate evidence code for the experiments done in the papers they curate. If these pages are not up-to-date, curators will keep adding the same types of annotations. Could I please request the pages for the experimental evidence codes to be update as a matter of urgency to help with quality and consistency?
Thanks,
Penelope
@pgarmiri - I made an update to the links on Friday and that is why the pages are being mis-directed. I'll fix that right now
Okay, all the links have been restored. Thanks for letting me know @pgarmiri
@vanaukenk, thanks for doing that so promptly. I was also wondering whether the context of the pages and the examples in them are up-to-date. For example, in the IC page a protein annotated with '... transcription factor activity' gets an IC annotation to nucleus. I remember that in the Cambridge mtg, it was concluded that such an annotation would qualify an IDA annotation to nucleus. Is what I remember correct? If yes, perhaps another example should be added.
I agree that we need to revisit the use of IEP/HEP.
Historically, these evidence codes have been used to provide 'hints', I think, for what processes otherwise poorly characterized gene products might be involved in.
With our new gp2term relations where we can more explicitly capture the relation between a gene product and a GO BP term, though, it is hard to imagine which relation could be used for differential expression experiments. Even our most generic relation, 'acts upstream of or within', would not be appropriate.
In a sense, the 'response to terms' seem to sometimes be used in an opposite way in GO, i.e. a gene is somehow affected by a BP. It's not that this information isn't useful in some context, it's just that for GO it's hard to then make an accurate statement about the relation between the gene and the process.
@pgarmiri - can you point me to that decision in the Cambridge mtg minutes? Thx.
@vanaukenk , it was during the Background knowledge/author intend/‘Representing biologist’s view of biology’ of the Brainstorming Session that it was mentioned to us. It was not actually decided there but @rachhuntley mentioned to us that it was decided at a different mtg and they have been doing this for a while. Maybe she remembers more details? Thanks
Personally, I think we should obsolete the IEP evidence code for process annotations :) We are eliminating them as soon as we can from PomBase (only 48 to go). There isn’t much that you can infer about a process in a micro-organism from IEP alone. I can see it being more useful for multi-cellular organisms for developmental stages etc. Maybe this is a restriction we should consider going forwards?
IEP has always been problematic. The guidelines are not great, they don't really tell you precisely what you can and cannot annotate using IEP. I thought that you could not use just a change in expression level form a HTP expression dataset/microarray to make an annotation and that it needed to be a more directed study?
Based on the abstract of this paper https://www.uniprot.org/citations/16731015 I would not make these annotations.
The authors looked for genes which change expression in response to nitrogen starvation and found 520. They then found that a subset of these were linked to amino acid metabolism, presumably by a data-mining approach, and narrowed their focus on these. They then see that 7 are expressed during infection (but so will many 1000’s of others, probably). It’s a rather cherry picked subset to start off with….
So although it is true that these genes are involved in some way in the “response to Nitrogen starvation” so, potentially are many 1000s of other gene products. It isn’t really very useful from a “biological process” perspective, because there is no information here about the physiological role. We capture this information at pombase but we use a different datatype “Expression”, and say that a particular gene is “unregulated” or “down regulated” in response to a particular nutrient starvation.
There are some open tickets about problems with “response to” but I can’t find them…will try to link them later…
v
I see the point that the involvement of genes in those BPs is somehow passive, but that would be the case for all annotations to 'response to ..' terms. It's good I suppose that those term and their annotations will be under revision soon.
Just to point out that the definitions of the 'response to ..' terms, as they are now, are inviting IEP annotation.
'Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of deprivation of nitrogen.'
As said, expression data can give hints for poorly characterized proteins but when they are used for passive annotations, I see that they can be seen as problematic. Maybe a different way of capturing this type of information would be more appropriate.
In any case, I did some QC and a few annotations that were verified with RT-PCR were missing (initial screening was with microarrays). I decided that rather than re-curating this partially HTP paper, it would be better just to delete the 45 annotations to this paper.
If the IEP evidence code and 'response to ..' terms are to be revised /restricted, it will sort out this issue. But in general IEP is still providing experimental information, that even though it is not very precise, it might be useful for the users compare to nothing at all.
As for the first case, @ValWood , the original experimental annotation would be responsible for this. This is annotation issue that it could be solved with a ticket or a dispute.
So, are we okay for Tony to proceed with this?
You are right, we should discuss both of these issues at the next meeting, time permitting 1) IEP evidence code 2) response to terms for direct annotation
I notice a lot are already a lot of response to terms are blocked for direct annotation https://github.com/geneontology/go-site/issues/699#issuecomment-405971893
I would prefer that they were only available for annotation extensions. There is a long ticket about this somewhere. I still did not locate that though.....
@pgaudet I was going to add a label for "Go meeting" but there is no such label available for this tracker.
@pgarmiri @vanaukenk Is this the discussion you are talking about regarding TF activity and nucleus? https://github.com/geneontology/go-ontology/issues/12892
Sorry I don't understand why you are suggesting that response to terms are only allowed in annotation extensions. If these are included in annotation extensions then the GOC pipeline will make the annotation as a direct annotation. I just am not sure that in all cases it is possible to say what signaling pathway (for example) the protein is associated with. For example knock down of a gene may lead to a lack of response to a stimulus, I think this should be captured.
Ruth @rachhuntley
It was a proposal, there is a ticket about it, I just can't find it... will locate later.
I think we just need to be clearer about how we are using these terms to be sure we are not just recording phenotypes in GO. We are supposed to be annotating processes....
The previous discussion is here: https://github.com/geneontology/go-ontology/issues/14303 I'm just saying the application needs to be stricter, and clearer. At present they are not useful. Over one third of mouse genes are annotated with "response to terms" and these are not on their own, informative about the process. See: https://www.slideshare.net/ValerieWood/go-slimming-tips-82783137
They might be useful as a classifier, but they are not processes in the way that they are normally used, they are othogonal data type (usually relating to an increase or decrease in protein or RNA expression level)
The extension pipeline could easily be configured NOT to create annotations if they were made not for direct annotation, as we do already for the "cell cycle phase" branch.
closing; "Respone to" terms are mostly not useful GO process annotations, but hey ho...
conflicting-nd-annotations-new.xlsx
Is respiratory complex assembly, not transport
2114178916 | G4MV05 | G4MV05_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process
2114179548 | G4NH30 | G4NH30_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process
2114179817 | G4N807 | G4N807_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process
2114180441 | G4MS58 | G4MS58_MAGO7 | GO:0008150 | P | PMGG | ECO:0000315 (IMP) | PMID:17850257 | GO:0009405 | pathogenesis
2114180441 | G4MS58 | G4MS58_MAGO7 | GO:0008150 | P | PMGG | ECO:0000315 (IMP) | PMID:17850257 | GO:0030437 | ascospore formation
2114182087 | G4N674 | G4N674_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process
2114185503 | G4NGA5 | G4NGA5_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process
2114187736 | G4MQP7 | G4MQP7_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process
It isn't possible to infer biosynthesis from IEP, this is probably meant to be "response to nitrogen starvation", but that isn't really informative about the process. We really need to think about allowing "response to terms" only in certain contexts. Otherwise lots of things will appear to have process annotation when really there isn't anything at all.
I'm not really sure we should use IEP in this way, full stop