geneontology / touchup

for keeping the PAINT GAF files up to date
0 stars 3 forks source link

Remove restriction for 'Do not manually annotate' #31

Closed pgaudet closed 7 years ago

pgaudet commented 8 years ago

Please remove the restriction that forbids annotation to terms marked 'Do not manually annotate'. (Note that we wan to keep the restriction on 'Do not annotate'. )

Thanks, Pascale

krchristie commented 8 years ago

Why do we want to remove this restriction?

pgaudet commented 8 years ago

Hi Karen,

I have had at least two cases where the 'Do not manually annotate terms would have been informative for PAINT annotations:

  1. 'Cell cycle' got a 'do not manually annotate tag since some changes were made, but 'mitotic' and 'meiotic' cell cycle are allowed. It seems the gain in information is not worth going back to re-annotate; 'cell cycle' is about as informative.
  2. When I annotated some MAP kinases, I found they were involved in multiple stress responses, so I ended up annotating to 'response to stress' to the root, which I used to think was not informative enough to annotate. That is still true for EXP data, but for PAINT the annotation seems pertinent.

What do you think ?

Pascale

krchristie commented 8 years ago

Hi Pascale,

Personally, I'm not a fan of annotating to really general terms. I don't find them that helpful. It seems to me that there are generally good reasons for the 'Do not manually annotate' tags. I think I'd just use the more specific terms, even if it meant reannotating. It doesn't seem like this has happened that often.

More importantly, I think it is not a good idea for PAINT to go against the GOC recommended usage of terms by propagating "Do not manually annotate" terms, especially not by doing it unilaterally without discussing it more generally amongst GOC members.

my 2 cents,

-Karen

pgaudet commented 8 years ago

Suzi,

This says 'Please test' - which version of PAINT2.0 should I use ? The version I have is from October 1st.

Thanks, Pascale

krchristie commented 8 years ago

I still think we should discuss whether we should be dropping this restriction in PAINT

thanks, Karen

selewis commented 8 years ago

I agree it should be discussed further.

In the meantime, the most recent version of PAINT2.0 has this implement (i.e. sans restriction)

tberardini commented 8 years ago

In our latest Jenkins TAIR GAF check, I've just come across a set of annotations to Arabidopsis genes from PAINT using IBA and the following terms, "response to stimulus', 'metabolic process', and 'response to stress' - all terms that are marked as 'do_not_manually_annotate'. They are all flagged as warnings. It looks like the current version of PAINT allows such annotations. Since they then show up on the individual DB's QC reports, this issue does need addressing. Perhaps yet another category: 'do_not_manually_annotate_except_if_using_PAINT'. (only partially kidding, that could be cross-checked against the evidence code used and if IBA/IKR/etc, then ok)

GO_AR:0000008 No annotations should be made to uninformative high level terms .

pgaudet commented 8 years ago

Full list of terms is here: GO:0007610 behavior GO:0005488 binding GO:0042710 biofilm formation GO:0044848 biological phase GO:0007049 cell cycle GO:0000075 cell cycle checkpoint GO:0022402 cell cycle process GO:0097285 cell-type specific apoptotic process GO:0071214 cellular response to abiotic stimulus GO:0071229 cellular response to acid chemical GO:0071216 cellular response to biotic stimulus GO:0070887 cellular response to chemical stimulus GO:0071495 cellular response to endogenous stimulus GO:0071496 cellular response to external stimulus GO:0051716 cellular response to stimulus GO:0033554 cellular response to stress GO:0097549 chromatin organization involved in negative regulation of transcription GO:0034401 chromatin organization involved in regulation of transcription GO:0001539 cilium or flagellum-dependent cell motility GO:0000910 cytokinesis GO:0016265 death GO:0009790 embryo development GO:0031572 G2 DNA damage checkpoint GO:0098589 membrane region GO:0008152 metabolic process GO:0098798 mitochondrial protein complex GO:0071174 mitotic spindle checkpoint GO:1901977 negative regulation of cell cycle checkpoint GO:0002832 negative regulation of response to biotic stimulus GO:0032102 negative regulation of response to external stimulus GO:0048585 negative regulation of response to stimulus GO:0090233 negative regulation of spindle checkpoint GO:0051348 negative regulation of transferase activity GO:0001071 nucleic acid binding transcription factor activity GO:0097659 nucleic acid-templated transcription GO:0098802 plasma membrane receptor complex GO:0098590 plasma membrane region GO:0002833 positive regulation of response to biotic stimulus GO:0032103 positive regulation of response to external stimulus GO:0048584 positive regulation of response to stimulus GO:0090232 positive regulation of spindle checkpoint GO:0051347 positive regulation of transferase activity GO:1901976 regulation of cell cycle checkpoint GO:1903504 regulation of mitotic spindle checkpoint GO:0002831 regulation of response to biotic stimulus GO:0032101 regulation of response to external stimulus GO:0048583 regulation of response to stimulus GO:0080134 regulation of response to stress GO:0090231 regulation of spindle checkpoint GO:0051338 regulation of transferase activity GO:0009628 response to abiotic stimulus GO:0001101 response to acid chemical GO:0009607 response to biotic stimulus GO:0042221 response to chemical GO:0009719 response to endogenous stimulus GO:0009605 response to external stimulus GO:0050896 response to stimulus GO:0006950 response to stress GO:0071173 spindle assembly checkpoint GO:0031577 spindle checkpoint GO:0016271 tissue death GO:0000988 transcription factor activity, protein binding

pgaudet commented 8 years ago

This is fixed. Tested with PTHR10006, added the annotation "GO:0000988 transcription factor activity, protein binding", which was correctly saved in the GAF.

Thanks, Pascale

tberardini commented 8 years ago

Hang on. Annotations made through PAINT to any of these terms will continue to show up on people's Jenkins GAF checking reports as ERRORS. If it's been decided that these types of annotations are legitimate, something needs to change either (1) in the ontology - mark as a different kind of subset that won't get caught by the script or (2) in the Jenkins GAF checking script so that annotations made to the 'Do not manually annotate' terms with evidence codes from PAINT (IBA, IKR, etc) are not flagged as errors.

krchristie commented 8 years ago

Pascale, we STILL need to have a discussion about WHETHER PAINT should be allowed to make these annotations. I do not agree that PAINT should be allowed to make annotations to the "do not annotate" terms, we have not discussed this fully within the PAINT group, nor have we brought it up to the larger GOC.

ValWood commented 8 years ago

Re

  1. 'Cell cycle' got a 'do not manually annotate tag since some changes were made, but 'mitotic' and 'meiotic' cell cycle are allowed. It seems the gain in information is not worth going back to re-annotate; 'cell cycle' is about as informative.

I wouldn't remove this restriction. If this is all that is demonstrated it isn't so informative. At PomBase we don't even allow "mitotic cell cycle" or "meiotic cell cycle", because if this is all you have, then it could be almost anything. Defects in transcription, ribosome biogenesis, splicing, translation, transport display 'mitotic cell cycle defects'. Many genes when mutated cause defects in cell cycle progression in a mutant, but these would not occur in a Wt cell because the "real" regulation is via protein modification.

see http://www.ncbi.nlm.nih.gov/pubmed/23697806

If we annotated all of these genes to 'mitotic cell cycle' genes at PomBase we would now have over 1000 cell cycle annotations. However researchers studying cell cycle would not expect most of these to be annotated to a cell cycle process in GO, even though they result in a cell cycle phenotype because the cause is way, way upstream, and they have no role in regulation in a normal cell.

If you really believe that a GP is involved in cell cycle, you should be able to pin it down to either a) a cell cycle process (i.e DNA replication, spindle organization, chromosome segregation) OR b) real regulation of a cell cycle transition (G1/S, G2/M, or metaphase/anaphase). If you can't do this its likely to be a false positive, so transferring them isn't so useful.

My 2p Val

cmungall commented 8 years ago

PAINT must implement the same checks as the rest of the pipeline, and should warn about these cases, we can't have ad-hoc exceptions.

If anyone wants to contest the placement of terms in subsets on a by-term basis, open a ticket on the ontology tracker (but it sounds like we want to keep cell cyle in the D-N-M-A subset).

If we decide that the phylogenetic annotations constitute a species case in general, then we should have an additional weaker set:

But this seems like complicating things unnecessarily. I favor keeping the simple two-level system

tberardini commented 8 years ago

Any idea when we might be able to resolve this issue of PAINT creating annotations to 'gocheck_do_not_manually_annotate' terms? I (and other DBs) am still getting errors like this:

GO_AR:0000008 No annotations should be made to uninformative high level terms Warning count: 71

79893 GO_AR:0000008 Warning Do not annotate to: GO:0006950 'response to stress' The term is considered to high level, as marked via the subset tag: gocheck_do_not_manually_annotate TAIR locus:2051294 AT2G05330 GO:0006950 TAIR:Communication:501741973 IBA PANTHER:PTN001974461 P AT2G05330 AT2G05330|F5G3.23|F5G3_23 protein taxon:3702 20160524 GOC TAIR:locus:2051294

Thanks for any update.