Closed ValWood closed 6 years ago
It's true that I did this quite a bit. I dont know if the other curators did @marcfeuermann @krchristie
The reasoning being that it's not to know that the protein is NOT secreted, even if the actual localization is not quite clear (or varies too much among paralogs to be sure).
We can discuss removing them if people find it's not useful.
Thanks, Pascale
cox4 though?
I can see for a GP with no other annotation...although I don't think it is very meaningful. If you can establish with certainty that a gene product is intracellular you should be able to say something else ?
like what ? In particular the mitochondrial proteins are a pain to localize, in plants many are found either (or both) in the mitochondrion and the chloroplast. I know I have also done 'membrane-bound organelle. for those
Where do you see that for cox4 ? It's not here: http://amigo.geneontology.org/amigo/gene_product/PomBase:SPAC1296.02
because we filter it.... but is it a useful annotation more generally?
Cox4 could be respiratory chain (GO:0070469) (true for everything?)
What I mean is I dont see it in the PAINT annotations - see http://www.pantree.org/node/annotationNode.jsp?id=PTN000012880
Oh sorry yes this wasn't made by PAINT. It was made by the function process inference pipeline (I'm not sure which term to term relationship but it is because of some relation hardcoded in the ontology).
I think this term should be made so that it is not possible to make the annotation via an automated pipeline. This would also prevent Ensembl annotations etc, but could be applied judiciously if required?
In general I Try not to use "intracellular" when doing PAINT annotation because I'm not convinced that it really brings a relevant information. I may understand Pascal's argument to discriminate from secreted proteins, but I consider it at the same level as "protein binding" for biological function or "growth" for biological process. Regards, Marc.
But it avoids a 'ND'
True but in this case I wonder if ND is better.
For instance, internally at PomBase we block many "high level" process terms for annotation , because the annotation is so minimal we prefer to say the process is unknown.
This way, we can provide our users with well-curated lists of "unknowns" https://www.pombase.org/status/priority-unstudied-genes
This is one of our most popular lists.
We submit these to GO with BP ND and woe betide anyone who tries to stick a non-informative high-level process term, or some random annotation from a phenotype onto them via a pipeline.
So this to me is a bit similar. I know we annotate to inferred things with GO, but there should be some level of information required.
I agree. It is sometimes better to put "unknown" instead of minimal/vague/non-relevant/confusing information.
I have never annotated to 'intracellular', either manually or when doing PAINT. It has never seemed particularly useful to me.
OK; for PAINT there is no problem for me to remove.
For others: There are >1000 manual annotations + >200 manual ISS, distributed as follows:
Group | Number of annotations |
---|---|
AspGD | 519 |
UniProt | 208 |
GOC | 141 |
MGI | 115 |
TIGR | 92 |
LIFEdb | 70 |
GeneDB | 43 |
dictyBase | 33 |
SGD | 27 |
BHF-UCL | 24 |
ARUK-UCL | 23 |
AgBase | 18 |
UniProtKB | 18 |
GR | 15 |
TAIR | 15 |
CGD | 14 |
WB | 14 |
CAFA | 10 |
PINC | 9 |
RGD | 9 |
ParkinsonsUK-UCL | 8 |
FlyBase | 7 |
ZFIN | 7 |
HGNC | 5 |
Although personally, I don't think it is a useful term (and I have never had a situation where I have seen that something it intracellular but I have not been able to be more specific), it might be useful for other resources. Might be worth asking why people needed it? I would rather be able to obtain the list of "unknown localization" than know that something was intracellular (but surely cytoplasmic, cytosolic, or cell cortex, or something else can be said ?).
We should not transfer this annotation. However, maybe we should allow direct annotation manually if people have a reason for making these?
BTW: Most of the Aspergillus ones (IDA) come from: https://www.ncbi.nlm.nih.gov/pubmed/20797444 I could not see any localization data in this paper. It's about translational changes. Of course, these gene products are intracellular when they are translated....... Preventing this is a good reason to block the term!
I suspect that most people inspecting these annotations would find a better annotation to use (if not already existing from another source!).
For example the first on in the SGD list is RTP1
RTP1 is clearly a nucleocytoplasmic shuttling protein. It should be annotated to "nucleus" and "cytoplasm"
To check whether Rtp1p shuttles between the nucleus and the cytoplasm in an Xpo1p-dependent manner, we examined the localization of Rtp1-GFP in an XPO1T539C mutant strain. In this strain, leptomycin B (LMB) addition inhibits Xpo1p-mediated transport (36). The cellular distribution of Rtp1-GFP was not affected by LMB addition, even with long incubations times (Fig. 2C).
Second SGD one GSH1 I can't see any localization data in this paper? This seems to be an "inference" I'm not checking any others but I'm sure that should all either be more specific, or removed...
BTW, I've just found an annotation to the CC term GO:0005623: "cell" Def: The basic structural and functional unit of all organisms. Includes the plasma membrane and any external encapsulating structures such as the cell wall and cell envelope. This seems even worse than "intracellular", don't you think so ? Regards, Marc.
Would we not want to reserve use of cell for use in GO-CAMs? E.g. occurs_in some (cell and part_of some S)
Do we have instances of this ? I also find 'cell' is not very informative. I thought it had been created to distinguish from non-cellular organisms.
I just noticed there are a lot of P-C predictions to intracellular and cell (for eg
If we make a term 'too high level' for annotation, aren't its parents automatically excluded ? (as would be the case for 'cell' if 'intracellular' is too high level for annotation).
Thanks, Pascale
I would be delighted if we quit generating annotations to 'cell' and to 'intracellular' due to the P-C links, so if marking these terms as 'too high level' would block these P-C link generated annotations, I think that's a step forwards.
I would like to see them blocked too (which would not prevent their use in extensions via GO-Cam
If we make a term 'too high level' for annotation, aren't its parents automatically excluded
No because sometimes the parent is OK . For example we blocked transport but we would not want to block "localization'. I already mad a task a while ago to make a list of the parents of blocked terms which could be blocked. I'll submit a ticket soon-ish.
which would not prevent their use in extensions via GO-Cam
That seems complicated to implement the rules.
It isn't really any different from what we do with biological phase terms right now. We can use them in extensions, but we can't annotate to them directly: https://www.ebi.ac.uk/QuickGO/term/GO:0044848
At UCL we have looked at the majority of our annotations and in the progress of removing our 2 cell and 60 or so intracellular annotations. Although I haven't tried to look at our use of intracellular in the AE field.
Hi @RLovering as far as I know there isn't any issue with "intracellular" in the AE field. Although, at present I think the inference pipeline would unfold them to instantiate the annotation. Personally, I don't think we really want to do that? @cmungall
My preference would be to allow these terms in extensions (for GO-CAM etc) but not allow for direct annotation (which would prevent direct annotations from being created by any pipeline).
I would add that so far, I never came across a situation for yeast where we could not be more specific than "intracellular", I'd be interested if such an annotation situation exists more generally. I suspect blocking for direct annotation will enforce more informative annotation.
Examples I saw included examples like "nucleus and cytoplasm". MAybe "intracellular" was selected because the "location of activity" was not known. However, with new relationships for location this would not be a problem. Biological end users definitely want to know these multiple locations. For most of the recent gene characterizations relating the pombe cytokinetic/spindle pole body/centrosome the location was known and annotated first. Processes followed based on biologists following the lead from GO CC data of proteins with location of interest. If we are supporting bench biologists we really need to capture these assayed locations in the absence of functional data.
Later we have been able to add extensions to describe which locations occurred_during which phases of the cell cycle. For many, we are confident about the locations (for example medial ring during interphase, spindle pole body during mitosis, spindle midzone during mitotic metaphase etc). Although we sometimes know little about the processes and even less about the functions.
name: intracellular +subset: gocheck_do_not_annotate
I don't know if manual annotations for "intracellular" are useful or not (we don't use this term).
But we should not get annotations to this term from any inference pipeline. It's just clutter....For example this one to a well-annotated gene product:
PomBase SPAC1296.02 cox4 GO:0005622 PMID:21873635 IBA PANTHER:PTN000012880|WB:WBGene00000371 C cytochrome c oxidase subunit IV (predicted) protein taxon:4896 20170228 GOC
PomBase SPAC1296.02 cox4 GO:0005623 PMID:21873635 IBA PANTHER:PTN000012880|WB:WBGene00000371 C cytochrome c oxidase subunit IV (predicted) protein taxon:4896 20170228 GOC
PomBase SPAC1296.02 cox4 GO:0005739 PMID:21873635 IBA PANTHER:PTN000012880|WB:WBGene00000371 C cytochrome c oxidase subunit IV (predicted) protein taxon:4896 20170228 GOC