geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
46 stars 89 forks source link

Fix gorule-0000008.md #699

Closed pgaudet closed 6 years ago

pgaudet commented 6 years ago

Currently both subsets ( gocheck_do_not_annotate and gocheck_do_not_manually_annotate) are treated the same way; but it should be:

@ValWood @vanaukenk Is this correct?

ValWood commented 6 years ago

Historically InterPro asked to be able to make annotations to "do not annotate" terms which is why we have the exception.

Sometimes, I think they may have a protein family where they can't be specific. I cant think of an example right now. But your interpretation is correct.

How many "gocheck_do_not_manually_annotate" do we have?

pgaudet commented 6 years ago

How many "gocheck_do_not_manually_annotate" do we have?

89 terms

'G2 DNA damage checkpoint' 'SUMO-specific protease activity' 'SWI/SNF superfamily-type complex' 'biofilm formation' 'biological phase' 'cell cycle checkpoint' 'cell cycle' 'cellular response to abiotic stimulus' 'cellular response to acid chemical' 'cellular response to biotic stimulus' 'cellular response to chemical stimulus' 'cellular response to endogenous stimulus' 'cellular response to external stimulus' 'cellular response to fluoxetine' 'cellular response to haloperidol' 'cellular response to stimulus' 'cellular response to stress' 'chimeric colonial development' 'chimeric non-reproductive fruiting body development' 'chimeric sorocarp development' 'chromatin organization involved in negative regulation of transcription' 'chromatin organization involved in regulation of transcription' 'cilium or flagellum-dependent cell motility' 'embryo development' 'membrane region' 'metabolic process' 'mitochondrial protein complex' 'mitotic spindle checkpoint' 'molecular transducer activity' 'negative regulation of cell cycle checkpoint' 'negative regulation of response to biotic stimulus' 'negative regulation of response to external stimulus' 'negative regulation of response to stimulus' 'negative regulation of spindle checkpoint' 'negative regulation of transferase activity' 'nucleic acid-templated transcription' 'obsolete cell-type specific apoptotic process' 'plasma membrane receptor complex' 'plasma membrane region' 'positive regulation of response to biotic stimulus' 'positive regulation of response to external stimulus' 'positive regulation of response to stimulus' 'positive regulation of spindle checkpoint' 'positive regulation of transferase activity' 'postsynaptic signal transduction' 'presynaptic signal transduction' 'regulation of cell cycle checkpoint' 'regulation of mitotic spindle checkpoint' 'regulation of response to biotic stimulus' 'regulation of response to external stimulus' 'regulation of response to stimulus' 'regulation of response to stress' 'regulation of spindle checkpoint' 'regulation of transferase activity' 'response to 4'-epidoxorubicin' 'response to 5-fluorouracil' 'response to abiotic stimulus' 'response to acid chemical' 'response to anticonvulsant' 'response to antidepressant' 'response to biotic stimulus' 'response to bronchodilator' 'response to chemical' 'response to cyclophosphamide' 'response to diuretic' 'response to docetaxel trihydrate' 'response to doxorubicin' 'response to endogenous stimulus' 'response to etoposide' 'response to external stimulus' 'response to fluoxetine' 'response to gemcitabine' 'response to haloperidol' 'response to iloperidone' 'response to lapatinib' 'response to methylphenidate' 'response to simvastatin' 'response to statin' 'response to stimulus' 'response to stress' 'response to temozolomide' 'response to ximelagatran' 'spindle assembly checkpoint' 'spindle checkpoint' 'spindle pole body separation' 'transcription factor activity, protein binding' behavior binding cytokinesis

ValWood commented 6 years ago

I think most of these could be made "do not annotate"

Probably 'SWI/SNF superfamily-type complex'

but most others I don't see any reason why Interpro etc should not be subjected to the same, more stringent guidelines. Especially for instance 'biological phase' GO:0044848 (this entire branch is only avaiable for use in annotation extensions).

v

ValWood commented 6 years ago

Is it possible to find out if there are any InterPro mapping direct to these terms, and if so see if they can be made more specific?

pgaudet commented 6 years ago

Maybe someone from interpro can tell us? Where is the IPR2GO mapping file? Just compare?

Le mer. 18 juil. 2018 à 6:54 PM, Val Wood notifications@github.com a écrit :

Is it possible to find out if there are any InterPro mapping direct to these terms, and if so see if they can be made more specific?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-site/issues/699#issuecomment-406001238, or mute the thread https://github.com/notifications/unsubscribe-auth/AEj7UKvpWvTY85LwgjJAEVFU0pDUmpQ1ks5uH2g9gaJpZM4VLd9z .

ValWood commented 6 years ago

@asangrador @almitchell can you let us know if you have any direct InterPro mappings to these terms. We would like to see if we can restrict for all annotation, or only manual.

thanks Val

asangrador commented 6 years ago

Hi, We are currently using 18 of these terms in InterPro, a few have a lot of annotations. They are; GO:0008152 metabolic process GO:0007049 cell cycle GO:0006950 response to stress GO:0000910 cytokinesis GO:0042710 biofilm formation GO:0033554 cellular response to stress GO:0009607 response to biotic stimulus GO:0009605 response to external stimulus GO:0001539 cilium or flagellum-dependent cell motility GO:0070603 SWI/SNF superfamily-type complex GO:0000075 cell cycle checkpoint GO:0005488 binding GO:0009790 embryo development GO:0031577 spindle checkpoint GO:0051716 cellular response to stimulus GO:0016929 SUMO-specific protease activity GO:0050896 response to stimulus GO:0071229 cellular response to acid chemical

We could try to find more specific terms, though in some cases it might not be possible, or delete some of them (e.g. metabolic process or binding). I will have a look at our annotations.

ValWood commented 6 years ago

GO:0008152 metabolic process maybe we could relax this and allow for IEA, but it isn't very useful. YOu will only usually have it for high level EC classificaitns where you can't say anything also. I would say that for these it is better just to have the MF annotation (oxidoreductase, transferase etc), and not bother with a biological process (because we don't really know anything about the physiological role. To illustrate 3230/4659 pombe proteins with a process annotation are annotated to metabolic process.

GO:0007049 cell cycle We could relax this. But most will be "mitotic cell cycle" or even more specific

GO:0006950 response to stress I don' think this is a useful term for process. I would like it to be disallowed completely

GO:0000910 cytokinesis

99% of these will be "mitotic cytokinesis" . I have 602 papers about cytokinesis, and only 1 about meiotic cytokinesis. Meiotic cytokinesis is very different at least in fungi where the spindle pole body nucleates the forespore membrane. I don't really even know how it happens...

GO:0042710 biofilm formation I don't really understand this term. It's more of a phenotype....

GO:0033554 cellular response to stress See above

GO:0009607 response to biotic stimulus See above

GO:0009605 response to external stimulus See above

GO:0001539 cilium or flagellum-dependent cell motility Should be easy to specify...

GO:0070603 SWI/SNF superfamily-type complex We can probably relax this, but it should be possible to specify

GO:0000075 cell cycle checkpoint most studies are mitotic checkpoints, there are not so many checkpoint genes in total, so it isn't many domains and it should be easy to specify

GO:0005488 binding should be possible to say at least protein, nucleic acid, lipid or something?

GO:0009790 embryo development should be possible to specify

GO:0031577 spindle checkpoint will normally be mitotic (although mitotic and meiotic will be very similar)

GO:0051716 cellular response to stimulus see above

GO:0016929 SUMO-specific protease activity we can probably relax this one but most SUMO protease will perform both activities. There is a GitHub ticket about this...

GO:0050896 response to stimulus See above

GO:0071229 cellular response to acid chemical see above

asangrador commented 6 years ago

Hi Val, Most of these annotations have now been changed to more specific terms or deleted, with one exception: SWI/SNF superfamily-type complex. In one case, it is not possible to be more specific, because the entry with this annotation include proteins that are a component of the brahma complex from drosophila, and the SWI/SNF complex from mammals, both children terms of SWI/SNF superfamily-type complex. As you mentioned, giving terms to protein families is always a bit more complicated.

I also have some doubts about a couple of terms that you consider phenotypes: Cellular response to stress/response to stress. What BP annotation could we give , for example, to a plant protein family that includes proteins involved in resistance to several abiotic and biotic stress conditions [PMID: 28207043] ? For example, some members but not all are involved in cellular response to sulfur starvation, some are known to be involved in salt stress, etc. Sure in certain cases saying that they are involved in response to stress is more informative than nothing?

The same for 'biofilm formation'. If a protein is involved in regulation of biofilm formation [PMID:23378512] , is there any appropriate annotation in GO?

ValWood commented 6 years ago

Great that you have remapped so many.

SWI/SNF superfamily-type complex. In one case, it is not possible to be more specific, because the entry with this annotation include proteins that are a component of the brahma complex from drosophila, and the SWI/SNF complex from mammals, both children terms of SWI/SNF superfamily-type complex.

OK that makes sense, we can keep this one as "not for EXP"

"Response to" stress is tricky, and I agree there is a core environmental stress response pathway. I think we need to be more precise what is covered by response to stress in GO to prevent the explosion of "response to " terms which are usually for compounds which activate one of the core pathways.

FOr the one you mention if this is a direct regulator of superoxide dismutase the process would be perhaps GO:0019430 removal of superoxide radicals (GO:0019430) which is part of "cellular detoxification"

My issue with "response to x terms" is largely two-fold

1.The general terms could mean anything the way they are currently worded and used (any process can be in response to stress, transcription, translation, transport, and probably every single gene product changes expression in response to some stress). So far >2000 fission yeast GPs have a recorded response to "some chemical". We need to think more about what we are trying to capture a little bit more deeply and tighten up the GO in this area.

  1. At present we accumulate many different annotations describing the same process because the same pathways respond to multiple stimuli (this is why I suggest that these are used as extensions): MAPK signalling involved_in "response to x stress" regulation of transcription involved_in "response to x stress" detoxification involved_in "response to x stress"

but to do this we also need to ensure that we do represent each individual pathway clearly, and we don't so this at present.

So we should leave the stress terms that you still require for direct annotation (because any clean up in this area will be longer term). This effort to block terms which are uninformative, or should always be more specific concurrent with ontology improvement, but I agree we can only block terms for direct annotation if there are suitable terms to use instead!

asangrador commented 6 years ago

Ok, we'll keep SWI/SNF superfamily-type complex and avoid the others.

dougli1sqrd commented 6 years ago

https://github.com/biolink/ontobio/pull/252

pgaudet commented 6 years ago

Is this implemented then ?

dougli1sqrd commented 6 years ago

It is