Closed pgaudet closed 6 years ago
Historically InterPro asked to be able to make annotations to "do not annotate" terms which is why we have the exception.
Sometimes, I think they may have a protein family where they can't be specific. I cant think of an example right now. But your interpretation is correct.
How many "gocheck_do_not_manually_annotate" do we have?
How many "gocheck_do_not_manually_annotate" do we have?
89 terms
'G2 DNA damage checkpoint'
I think most of these could be made "do not annotate"
Probably 'SWI/SNF superfamily-type complex'
but most others I don't see any reason why Interpro etc should not be subjected to the same, more stringent guidelines. Especially for instance 'biological phase' GO:0044848 (this entire branch is only avaiable for use in annotation extensions).
v
Is it possible to find out if there are any InterPro mapping direct to these terms, and if so see if they can be made more specific?
Maybe someone from interpro can tell us? Where is the IPR2GO mapping file? Just compare?
Le mer. 18 juil. 2018 à 6:54 PM, Val Wood notifications@github.com a écrit :
Is it possible to find out if there are any InterPro mapping direct to these terms, and if so see if they can be made more specific?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-site/issues/699#issuecomment-406001238, or mute the thread https://github.com/notifications/unsubscribe-auth/AEj7UKvpWvTY85LwgjJAEVFU0pDUmpQ1ks5uH2g9gaJpZM4VLd9z .
@asangrador @almitchell can you let us know if you have any direct InterPro mappings to these terms. We would like to see if we can restrict for all annotation, or only manual.
thanks Val
Hi, We are currently using 18 of these terms in InterPro, a few have a lot of annotations. They are; GO:0008152 metabolic process GO:0007049 cell cycle GO:0006950 response to stress GO:0000910 cytokinesis GO:0042710 biofilm formation GO:0033554 cellular response to stress GO:0009607 response to biotic stimulus GO:0009605 response to external stimulus GO:0001539 cilium or flagellum-dependent cell motility GO:0070603 SWI/SNF superfamily-type complex GO:0000075 cell cycle checkpoint GO:0005488 binding GO:0009790 embryo development GO:0031577 spindle checkpoint GO:0051716 cellular response to stimulus GO:0016929 SUMO-specific protease activity GO:0050896 response to stimulus GO:0071229 cellular response to acid chemical
We could try to find more specific terms, though in some cases it might not be possible, or delete some of them (e.g. metabolic process or binding). I will have a look at our annotations.
GO:0008152 metabolic process maybe we could relax this and allow for IEA, but it isn't very useful. YOu will only usually have it for high level EC classificaitns where you can't say anything also. I would say that for these it is better just to have the MF annotation (oxidoreductase, transferase etc), and not bother with a biological process (because we don't really know anything about the physiological role. To illustrate 3230/4659 pombe proteins with a process annotation are annotated to metabolic process.
GO:0007049 cell cycle We could relax this. But most will be "mitotic cell cycle" or even more specific
GO:0006950 response to stress I don' think this is a useful term for process. I would like it to be disallowed completely
GO:0000910 cytokinesis
99% of these will be "mitotic cytokinesis" . I have 602 papers about cytokinesis, and only 1 about meiotic cytokinesis. Meiotic cytokinesis is very different at least in fungi where the spindle pole body nucleates the forespore membrane. I don't really even know how it happens...
GO:0042710 biofilm formation I don't really understand this term. It's more of a phenotype....
GO:0033554 cellular response to stress See above
GO:0009607 response to biotic stimulus See above
GO:0009605 response to external stimulus See above
GO:0001539 cilium or flagellum-dependent cell motility Should be easy to specify...
GO:0070603 SWI/SNF superfamily-type complex We can probably relax this, but it should be possible to specify
GO:0000075 cell cycle checkpoint most studies are mitotic checkpoints, there are not so many checkpoint genes in total, so it isn't many domains and it should be easy to specify
GO:0005488 binding should be possible to say at least protein, nucleic acid, lipid or something?
GO:0009790 embryo development should be possible to specify
GO:0031577 spindle checkpoint will normally be mitotic (although mitotic and meiotic will be very similar)
GO:0051716 cellular response to stimulus see above
GO:0016929 SUMO-specific protease activity we can probably relax this one but most SUMO protease will perform both activities. There is a GitHub ticket about this...
GO:0050896 response to stimulus See above
GO:0071229 cellular response to acid chemical see above
Hi Val, Most of these annotations have now been changed to more specific terms or deleted, with one exception: SWI/SNF superfamily-type complex. In one case, it is not possible to be more specific, because the entry with this annotation include proteins that are a component of the brahma complex from drosophila, and the SWI/SNF complex from mammals, both children terms of SWI/SNF superfamily-type complex. As you mentioned, giving terms to protein families is always a bit more complicated.
I also have some doubts about a couple of terms that you consider phenotypes: Cellular response to stress/response to stress. What BP annotation could we give , for example, to a plant protein family that includes proteins involved in resistance to several abiotic and biotic stress conditions [PMID: 28207043] ? For example, some members but not all are involved in cellular response to sulfur starvation, some are known to be involved in salt stress, etc. Sure in certain cases saying that they are involved in response to stress is more informative than nothing?
The same for 'biofilm formation'. If a protein is involved in regulation of biofilm formation [PMID:23378512] , is there any appropriate annotation in GO?
Great that you have remapped so many.
SWI/SNF superfamily-type complex. In one case, it is not possible to be more specific, because the entry with this annotation include proteins that are a component of the brahma complex from drosophila, and the SWI/SNF complex from mammals, both children terms of SWI/SNF superfamily-type complex.
OK that makes sense, we can keep this one as "not for EXP"
"Response to" stress is tricky, and I agree there is a core environmental stress response pathway. I think we need to be more precise what is covered by response to stress in GO to prevent the explosion of "response to " terms which are usually for compounds which activate one of the core pathways.
FOr the one you mention if this is a direct regulator of superoxide dismutase the process would be perhaps GO:0019430 removal of superoxide radicals (GO:0019430) which is part of "cellular detoxification"
My issue with "response to x terms" is largely two-fold
1.The general terms could mean anything the way they are currently worded and used (any process can be in response to stress, transcription, translation, transport, and probably every single gene product changes expression in response to some stress). So far >2000 fission yeast GPs have a recorded response to "some chemical". We need to think more about what we are trying to capture a little bit more deeply and tighten up the GO in this area.
but to do this we also need to ensure that we do represent each individual pathway clearly, and we don't so this at present.
So we should leave the stress terms that you still require for direct annotation (because any clean up in this area will be longer term). This effort to block terms which are uninformative, or should always be more specific concurrent with ontology improvement, but I agree we can only block terms for direct annotation if there are suitable terms to use instead!
Ok, we'll keep SWI/SNF superfamily-type complex and avoid the others.
Is this implemented then ?
It is
Currently both subsets ( gocheck_do_not_annotate and gocheck_do_not_manually_annotate) are treated the same way; but it should be:
@ValWood @vanaukenk Is this correct?