geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
34 stars 10 forks source link

annotation checks from ND removal pile #2025

Closed ValWood closed 1 year ago

ValWood commented 6 years ago

conflicting-nd-annotations-new.xlsx

Is respiratory complex assembly, not transport

2114178916 | G4MV05 | G4MV05_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process

2114179548 | G4NH30 | G4NH30_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process

2114179817 | G4N807 | G4N807_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process

2114180441 | G4MS58 | G4MS58_MAGO7 | GO:0008150 | P | PMGG | ECO:0000315 (IMP) | PMID:17850257 | GO:0009405 | pathogenesis

2114180441 | G4MS58 | G4MS58_MAGO7 | GO:0008150 | P | PMGG | ECO:0000315 (IMP) | PMID:17850257 | GO:0030437 | ascospore formation

2114182087 | G4N674 | G4N674_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process

2114185503 | G4NGA5 | G4NGA5_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process

2114187736 | G4MQP7 | G4MQP7_MAGO7 | GO:0008150 | P | PMGG | ECO:0000270 (IEP) | PMID:16731015 | GO:0044271 | cellular nitrogen compound biosynthetic process

It isn't possible to infer biosynthesis from IEP, this is probably meant to be "response to nitrogen starvation", but that isn't really informative about the process. We really need to think about allowing "response to terms" only in certain contexts. Otherwise lots of things will appear to have process annotation when really there isn't anything at all.

I'm not really sure we should use IEP in this way, full stop

pgarmiri commented 6 years ago

Hi Val, I agree with you that the above IEP annotations should be "response to nitrogen starvation". In the particular case for those 7 plant fungus genes and taking into account the authors intend, it seems that these genes are expressed in response to nitrogen starvation and also during infection of the rice plant. This seems to be a physiological condition that likely allows fungi to colonize various ecological niches including infected host plants.

Now whether IEP should be used in this way, I will just paste what we have in our training material for IEP and it was taken from the wiki page (http://wiki.geneontology.org/index.php/Inferred_from_Expression_Pattern_(IEP)), which I noticed it is empty now. In that paragraph, I have highlighted the text that was actually stating that this code was the right one to use here.

Inferred from Expression Pattern (IEP)

_Where the annotation is inferred from the timing or location of expression of a gene.

Use this code with caution! It can be difficult determine whether the expression pattern really indicates that a gene plays a role in a given process.

Genes upregulated during a stress condition may be annotated to the process of stress response (for example, heat shock proteins)

Genes selectively expressed at specific developmental stages in specific organs may be annotated to xxx development

Transcript levels or timing (e.g. Northerns, microarray data)

Protein levels (e.g. Western blots)

IEP evidence code is usually used with high level GO terms in the biological process ontology.

Only the normal expression pattern should lead to an IEP annotation._

We are encourage all curators to check the wiki pages when choosing the most appropriate evidence code for the experiments done in the papers they curate. If these pages are not up-to-date, curators will keep adding the same types of annotations. Could I please request the pages for the experimental evidence codes to be update as a matter of urgency to help with quality and consistency?

Thanks,

Penelope

vanaukenk commented 6 years ago

@pgarmiri - I made an update to the links on Friday and that is why the pages are being mis-directed. I'll fix that right now

vanaukenk commented 6 years ago

Okay, all the links have been restored. Thanks for letting me know @pgarmiri

pgarmiri commented 6 years ago

@vanaukenk, thanks for doing that so promptly. I was also wondering whether the context of the pages and the examples in them are up-to-date. For example, in the IC page a protein annotated with '... transcription factor activity' gets an IC annotation to nucleus. I remember that in the Cambridge mtg, it was concluded that such an annotation would qualify an IDA annotation to nucleus. Is what I remember correct? If yes, perhaps another example should be added.

vanaukenk commented 6 years ago

I agree that we need to revisit the use of IEP/HEP.
Historically, these evidence codes have been used to provide 'hints', I think, for what processes otherwise poorly characterized gene products might be involved in. With our new gp2term relations where we can more explicitly capture the relation between a gene product and a GO BP term, though, it is hard to imagine which relation could be used for differential expression experiments. Even our most generic relation, 'acts upstream of or within', would not be appropriate. In a sense, the 'response to terms' seem to sometimes be used in an opposite way in GO, i.e. a gene is somehow affected by a BP. It's not that this information isn't useful in some context, it's just that for GO it's hard to then make an accurate statement about the relation between the gene and the process.

vanaukenk commented 6 years ago

@pgarmiri - can you point me to that decision in the Cambridge mtg minutes? Thx.

pgarmiri commented 6 years ago

@vanaukenk , it was during the Background knowledge/author intend/‘Representing biologist’s view of biology’ of the Brainstorming Session that it was mentioned to us. It was not actually decided there but @rachhuntley mentioned to us that it was decided at a different mtg and they have been doing this for a while. Maybe she remembers more details? Thanks

ValWood commented 6 years ago

Personally, I think we should obsolete the IEP evidence code for process annotations :) We are eliminating them as soon as we can from PomBase (only 48 to go). There isn’t much that you can infer about a process in a micro-organism from IEP alone. I can see it being more useful for multi-cellular organisms for developmental stages etc. Maybe this is a restriction we should consider going forwards?

IEP has always been problematic. The guidelines are not great, they don't really tell you precisely what you can and cannot annotate using IEP. I thought that you could not use just a change in expression level form a HTP expression dataset/microarray to make an annotation and that it needed to be a more directed study?

Based on the abstract of this paper https://www.uniprot.org/citations/16731015 I would not make these annotations.

The authors looked for genes which change expression in response to nitrogen starvation and found 520. They then found that a subset of these were linked to amino acid metabolism, presumably by a data-mining approach, and narrowed their focus on these. They then see that 7 are expressed during infection (but so will many 1000’s of others, probably). It’s a rather cherry picked subset to start off with….

So although it is true that these genes are involved in some way in the “response to Nitrogen starvation” so, potentially are many 1000s of other gene products. It isn’t really very useful from a “biological process” perspective, because there is no information here about the physiological role. We capture this information at pombase but we use a different datatype “Expression”, and say that a particular gene is “unregulated” or “down regulated” in response to a particular nutrient starvation.

There are some open tickets about problems with “response to” but I can’t find them…will try to link them later…

v

pgarmiri commented 6 years ago

I see the point that the involvement of genes in those BPs is somehow passive, but that would be the case for all annotations to 'response to ..' terms. It's good I suppose that those term and their annotations will be under revision soon.

Just to point out that the definitions of the 'response to ..' terms, as they are now, are inviting IEP annotation.

'Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of deprivation of nitrogen.'

As said, expression data can give hints for poorly characterized proteins but when they are used for passive annotations, I see that they can be seen as problematic. Maybe a different way of capturing this type of information would be more appropriate.

In any case, I did some QC and a few annotations that were verified with RT-PCR were missing (initial screening was with microarrays). I decided that rather than re-curating this partially HTP paper, it would be better just to delete the 45 annotations to this paper.

If the IEP evidence code and 'response to ..' terms are to be revised /restricted, it will sort out this issue. But in general IEP is still providing experimental information, that even though it is not very precise, it might be useful for the users compare to nothing at all.

As for the first case, @ValWood , the original experimental annotation would be responsible for this. This is annotation issue that it could be solved with a ticket or a dispute.

So, are we okay for Tony to proceed with this?

ValWood commented 6 years ago

You are right, we should discuss both of these issues at the next meeting, time permitting 1) IEP evidence code 2) response to terms for direct annotation

I notice a lot are already a lot of response to terms are blocked for direct annotation https://github.com/geneontology/go-site/issues/699#issuecomment-405971893

I would prefer that they were only available for annotation extensions. There is a long ticket about this somewhere. I still did not locate that though.....

ValWood commented 6 years ago

@pgaudet I was going to add a label for "Go meeting" but there is no such label available for this tracker.

rachhuntley commented 6 years ago

@pgarmiri @vanaukenk Is this the discussion you are talking about regarding TF activity and nucleus? https://github.com/geneontology/go-ontology/issues/12892

RLovering commented 6 years ago

Sorry I don't understand why you are suggesting that response to terms are only allowed in annotation extensions. If these are included in annotation extensions then the GOC pipeline will make the annotation as a direct annotation. I just am not sure that in all cases it is possible to say what signaling pathway (for example) the protein is associated with. For example knock down of a gene may lead to a lack of response to a stimulus, I think this should be captured.

Ruth @rachhuntley

ValWood commented 6 years ago

It was a proposal, there is a ticket about it, I just can't find it... will locate later.

I think we just need to be clearer about how we are using these terms to be sure we are not just recording phenotypes in GO. We are supposed to be annotating processes....

ValWood commented 6 years ago

The previous discussion is here: https://github.com/geneontology/go-ontology/issues/14303 I'm just saying the application needs to be stricter, and clearer. At present they are not useful. Over one third of mouse genes are annotated with "response to terms" and these are not on their own, informative about the process. See: https://www.slideshare.net/ValerieWood/go-slimming-tips-82783137

They might be useful as a classifier, but they are not processes in the way that they are normally used, they are othogonal data type (usually relating to an increase or decrease in protein or RNA expression level)

The extension pipeline could easily be configured NOT to create annotations if they were made not for direct annotation, as we do already for the "cell cycle phase" branch.

ValWood commented 1 year ago

closing; "Respone to" terms are mostly not useful GO process annotations, but hey ho...