geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
34 stars 10 forks source link

should we really use "colocalizes_with" with cc complex terms #1500

Closed ValWood closed 5 years ago

ValWood commented 7 years ago

This is an issue which arose through some matrix consistency checks.

https://github.com/geneontology/go-annotation/issues/1490#issuecomment-271813323

I'm assessing intersections of cohesin complex with GO slim processes.

This results in annotation outliers because non-cohesin proteins are annotated to cohesin.

In general, these are GFP colocalization studies and they are really only used to indicate the general location in a cell, not even to indicate binding, so making a "colocalizes_with" complex annotation seems a stretch to far.

I would use these for instance to indicate the same cellular location (chromosome, mitochondrial inner membrane, or very large molecular complexes like spindle pole body or ribosome) which is usually what the author intended.

ValWood commented 7 years ago

@hattrill commented

with regard to colocalizes_with complexes the documentation states "Gene products that are transiently or peripherally associated with an organelle or complex may be annotated to the relevant cellular component term, using the colocalizes_with qualifier."

A couple of years ago I queried the use of "colocalizes_with" for instances where a gene product is not part of a complex: See snippets below: Q: "While we're on the subject, could I ask about the "colocalizes_with" qualifier? The documentation says "Gene products that are transiently or peripherally associated with an organelle or complex may be annotated to the relevant cellular component term, using the colocalizes_with qualifier".
I take this to mean: if protein X is shown associate with complex Y and this is pertinent to the biology described in the paper but it is unclear whether it is a bona fide subunit, annotate to the specific complex with the colocalizes_with qualifier." Answer 1 "I think this is a very good usage of the qualifier. The stability of interactions is an important factor in whether GO or Intact makes a complex term (which can then be used as the subject of GO annotation). But that doesn't preclude capturing weaker relationships in annotation." Answer 2 "The best example that comes to my mind is ribosome. We all know what the subunits of the ribosome are. But often there are proteins that bind to it for various reasons and they will all get annotated to a subunit of the ribosome with this colocalizes qualifier. hope that helps."

So this is the rule we've applied, although not with great gusto as we have other means of capturing protein-protein interactions.

Personally, I'm not so in-favour of coloc_with for complexes - it's ambiguous and if they get stripped out by downstream tools it's a pain (but not as bad as a NOT qualifier being lost!). The last time we discussed this on a call, there was a general feeling that we should get new complex binding terms but I this hasn't trickled down to a hard rule or into the documentation. So, perhaps an official decision is needed and we need to think about those cases where we are not completely sure whether the a component is binding to or part of the complex.

ValWood commented 7 years ago

Is it possible to get a number for how many annotations there are for colocalizes_with "complex"

@cmungall

hattrill commented 7 years ago

From Amigo: colocalizes_with and GO:0032991 macromolecular complex and children 4637 annotations, 1104 experimental

ValWood commented 7 years ago

Spotted this, will reduce somewhat https://github.com/geneontology/go-ontology/issues/12916

ValWood commented 7 years ago

and this, but only by 4 ! https://github.com/geneontology/go-ontology/issues/12850

hattrill commented 7 years ago

Yes, just looking at children of macromolecular complex and thing like chromatin pumping-up the numbers. So, limiting to protein complex:

colocalizes_with GO:0043234 protein complex and children =2231 annotations, 601 experimental.

hattrill commented 7 years ago

There are some examples of big/non-discrete complexes (i.e. structures) that co-loc may be appropriately applied to:

filamentous actin and kinetochore are under protein complex .......as is podosome (? - this sounds like a subcellular structure to me).

ValWood commented 7 years ago

I think so. I tried filtering some of these terms, then I realised that I was also filtering complexes that were children of these structures.

hattrill commented 7 years ago

Adding this paper as an example of a protein that colocalizes with a complex in IF but not present in IPs. IF is a bit of funny thing - open to over-interpretation and you can argue what is it actually co-localizing with? - the complex or the DNA-complex or another mark on the chromatin. I am going to have a think about this paper as it also has a co-loc with synaptonemal complex for Nipped-B as well. I can see how, if annotating to author intent, these annotations were made, but I'm not sure if I am happy with them.


PMID:17909832 "none of the cohesin subunit precipitations brought down detectable amounts of Nipped-B. Precipitation of Nipped-B, however, brought down small but detectable amounts of SA and Rad21 but not Smc1" "We immunostained Drosophila oocyte chromosome spreads with anti-Nipped-B, and observed that it coloc- alizes extensively with the Smc1 and Smc3 cohesin subunits in a thread-like pattern along the arms of meiotic chromosomes".

hattrill commented 7 years ago

From PMID:17909832, decided that impossible to interpret immunofl. coloc with complexes - more of a coincident of proteins - removed annotations.

ValWood commented 7 years ago

Here is an example issue from the Matrix project.

SGD annotate CDC5 polo like kinase to "cohesin complex" with "colocalizes_with" It isn't a bona fida subsunit but it binds and phosphorylates cohesin during DNA damage

qualifier

....then CGD pick this up and annotate numerous species wthout the "colocalizes" qualifier

no qualifier

ValWood commented 7 years ago

Durr, I just realised that was why I opened this ticket in the first place. Anyway good to have the example...

pgaudet commented 6 years ago

At the GOC meeting Oct 2017, the following suggestions were made:

bmeldal commented 6 years ago

This ticket is unassigned, @pgaudet?

For info only, in IntAct we don't annotate IF experiments as IPI unless we have another IPI evidence for the same interactions from THE SAME paper.

pgaudet commented 6 years ago

I am assigning @vanaukenk since this is related to an action point from the GOC meeting that she was working on - @vanaukenk let me know if this is OK with you !

Pascale

bmeldal commented 6 years ago

Can also be added to user survey.

pfey03 commented 6 years ago

I got interested in our 'colocalizes_with' and we have 5 EXP to complexes with 5 different PMIDs. If required I can look at them

bmeldal commented 6 years ago

Hi Petra,

Looks like you really don't use that qualifier much. Could you post the list of 5 so we can see where you did use it and why? That could inform us if it's really necessary to use the qualifier or maybe we have better terms now.

Birgit

pfey03 commented 6 years ago

Hi Birgit,

4 of the 5 are quite old annotations, one by Pascale when she was with us, only one more recent by @rjdodson. I'l look at them then and probably can tell why we did it. Post it here by Monday

ValWood commented 6 years ago

I'll review PomBase ones too...we only have 12. I think most can go.... https://github.com/pombase/curation/issues/1940

pfey03 commented 6 years ago

@bmeldal Here is the first one. PMID:28424231 Used because confocal microscopy (Fig. 4) shows colocalization of Mrhoh1 and the WASH complex. And authors decribe it in the text: "The similarity in the appearance of Mroh1 on the surface of lysosomes to the known localization of the WASH complex (Park et al., 2013) led us to investigate their possible colocalization. Co- expression of Mroh1 with WASH (Fig. 4A; Movie 3) showed that they indeed strongly colocalize. This colocalization is not merely to the same vesicles, but to the same regions of these vesicles." Bob did this annotation and said he never used this qualifier much but was probably swayed by the author's strong words.

pfey03 commented 6 years ago

Oh and here is the annotation, might make this easier UniProtKB | Q54F23 | mroh1 | colocalizes_with | GO:0071203 | WASH complex | ECO:0000314 | IDA | PMID:28424231

pgaudet commented 6 years ago

As far as I remember, I did this because this was my understanding of the guidelines. I don't remember questioning whether these annotations were really informative (also 10 years later, with so much extra data available, the value of almost all the data we used to curate could probably be reassessed). But since this is being questioned now, my opinion would be to get rid of these.

Pascale

pfey03 commented 6 years ago

@pgaudet Don't worry Pascale. The other 4 are all old and at least 9 years, so I go one by one and check them out quickly, so we have examples. For the above that Bob curated and looked at, it's the question if a new qualifier would help, I myself have not yet looked at the paper.

pfey03 commented 6 years ago

UniProtKB | Q54GL7 | mlcB | colocalizes_with | GO:0045160 | myosin I complex | ECO:0000314 | IDA | PMID:16415352

This paper from 2006 in my today's opinion clearly shows that mlcB is a myosin I complex and the light chain of myoB that had been already identified as a myo I heavy chain. They nicely summarize their results in the beginning of the discussion: "The evidence that MlcB is a MyoB light chain may be summarized as follows: (i) purified MyoB examined by SDS-PAGE does not contain a “normal sized” (16–30 kDa) light chain but does contain a protein the size of MlcB; (ii) a peptide derived from MlcB was isolated with good yield from a digest of MyoB, showing that MlcB co-purifies with MyoB; (iii) sequence comparisons show that MlcB is closely related to known myosin light chains; (iv) MlcB binds to a peptide corresponding to the IQ motif in the MyoB neck region; (v) FLAG-MlcB expressed in Dictyostelium co-immunoprecipitates with the MyoB heavy chain; and (vi) a MyoB head-neck construct (MyoB-S332E-ΔTail) co-immunoprecipitates with MBP-tagged MlcB. We believe that, taken together, these results provide convincing evidence that MlcB is a light chain that binds to the neck region of MyoB."

In sum, here I will remove this qualifier and it should be part_of

bmeldal commented 6 years ago

Summary from call on 23/4/18: Attendees: @hdrabkin , Ben Good, @krchristie , Edith, @deustp01 , Sandra, @vanaukenk , @pgaudet , @hattrill

Decision: This annotation is more misleading than useful when the qualifier is striped. If the GO term is GO:0032991 protein-containing complex & children annotations should NOT be allowed with the contributes_to qualifier.

There are over 1000 EXP annotations and ~500 non-EXP manual annotation with contributes_to GO:0032991 protein-containing complex & children

Action:

Remove or change annotations: https://github.com/geneontology/go-annotation/issues/1940

If physical binding is shown, consider use GO:0044877 protein-containing complex binding WITH Complex Portal AC (request complexes for curation via https://www.ebi.ac.uk/support/intact)

Birgit

ValWood commented 6 years ago

PomBase had 12, 10 very old (over a decade) https://github.com/pombase/curation/issues/1940

I re-annotated remaining PomBase ones (to appropriate binding or assembly terms), or removed as appropriate. https://github.com/pombase/curation/issues/1940

I really think we could ditch this qualifier completely, it's had its day..... If people agree we could do in stages: i) prevent new use at submission (date check?) ii) remove from "complex" terms where it is particularly problematic iii) eventually remove from all cell structures (should usually be able to make better annotations?)

bmeldal commented 6 years ago

Re-annotating example with request for a new "X complex binding term": https://github.com/geneontology/go-ontology/issues/15692

RLovering commented 6 years ago

I would not be happy with removing all co-localizes with qualifiers. We still haven't got an agreement for the evidence code to use for annotating plasma membrane proteins. This qualifier provides options for capturing immunofluorescent data

ValWood commented 6 years ago

This step is only about colocalizes_with "complex".

This causes more problems because these often get treated as bona fida complex members (both in analysis and for annotation transfer).

We still need good guidelines for annotating proteins at the cell periphery....

ValWood commented 6 years ago

My suggestion would be to 'chip away' at them. Maybe we end up keeping them for plasma membrane?

at present there seem to be 3086 total, with EXP evidence. We could block for certain terms next. For example I see "colocalizes_with nucleus and colocalizes_with | cytoplasm ...this should not be necessary...it either is, or it isn't......

bmeldal commented 6 years ago

example I see "colocalizes_with nucleus and colocalizes_with | cytoplasm ...this should not be necessary...it either is, or it isn't......

Many proteins shuttle between cytoplasm and nucleus so it depends on the context you find them in and you can have both annotations.

But I'd argue that that's not a colocalisation, it's part_of the compartmetn as long as the expt shows which compartment it's in. We don't use colocalisations in IntAct unless there is a stronger, physical (IPI) evidence in the same paper so I'm not familiar with annotating these.

But, as Val said before, at the moment we are looking at fixing colocalizes_with protein complex, not compartment...

For CP annotations, if fluorescence indicates compartment localisation I'd use it for the part_of annotation.

ValWood commented 6 years ago

Many proteins shuttle between cytoplasm and nucleus so it depends on the context you find them in and you can have both annotations.

yes these don't need "colocalizes_with", just concurrent annotations...

In most of these situations "colocalizes_with" is a type of assay and should be captured with a specific evidence code if it's really, really necessary.

bmeldal commented 6 years ago

Question from NYU GOC mtg (Ruth):

Do tools strip qualifiers or whole lines if they contain qualifiers?

Comments from mtg: If use is not permitted with complexes can subcellular locations be used? Not always possible to assign to subcellular component --> Need more granular terms --> request terms.

suzialeksander commented 5 years ago

Will this be on an upcoming annotation call? SGD has gone through most of our annotations, and we have a few examples where we would like to keep the qualifier- generally, these are ones where there is a fractionation assay to show approximate location, but no evidence of direct binding OR proteins are not considered to be part of a large complex but are transiently associated/transiently function there:

Q02773 | RPM2 | colocalizes_with | GO:0000932 | SGD | NCBITaxon:559292 | IDA | PMID:17267405 | 20070208

ValWood commented 5 years ago

For your first example "P body", is is so bad not to have colocalizes_with? Possibly the problem is describing a P-body as a "complex" in GO it is really a cytoplasmic foci:

Processing bodies (P-bodies) are distinct foci within the cytoplasm of the eukaryotic cell consisting of many enzymes involved in mRNA turnover.

and it would be absolutely correct to annotate to P-body without co-localizes with if it has been observed to physically interact with the P-body. It is not a defined complex as such.

For the second example, do we use colocalizes_with in this way for complexes? It isn't only prp2 which is a transient component of the spliceosome (or is not a member of the final spliceosome). Isn't this more precisely GO:0071011 precatalytic spliceosome ?

suzialeksander commented 5 years ago

A lot of the cases like the P-body example are where we're confident in the annotation enough to make it, but there's not enough evidence for a direct interaction/binding: GFPs or fractionation.

And SGD does use colocalizes_with according to the language

"Gene products that are transiently or peripherally associated with an organelle or complex may be annotated to the relevant cellular component term, using the colocalizes_with qualifier. "

I thought I read Prp2 only attaches to the activated spliceosome, I'd have to double check. But even if this was changed to GO:0071011 precatalytic spliceosome, the paper doesn't seem to indicate Prp2 is necessarily part of this poorly defined complex, but instead pops in & out briefly.

Maybe if the restriction for "colocalizes_with" was restricted to complexes with clearly defined members, like https://www.yeastgenome.org/complex/CPX-1700/https://www.ebi.ac.uk/complexportal/complex/CPX-1700, and exclude ones like spliceosome and polymerase, SGD would be happier. I think @RLovering was also in support of keeping the qualifier?

ValWood commented 5 years ago

but for the spliceosome cycle most things are transient, it is very dynamic...

prp22 U2-type catalytic step 2 spliceosome prp16 U2-type catalytic step 2 spliceosome brr2 U4/U6 x U5 tri-snRNP complex snu114 U4/U6 x U5 tri-snRNP complex prp5 commitment complex are all transient.

None of these are permanent complex members either?

The problem is that colocalizes_with has 2 meanings historically, the intended one for a 'likely localization" from GFP etc, and this later derived meaning. This should never have happened... however it's probably not a big deal for splicosome intermediates if people want to keep it, because if ignored it's still OK, but it's pretty inconsistent in its application, it doesn't really add anything and still leaves us with inconsistent usage.

bmeldal commented 5 years ago

I think, in principle, we agreed that colocalizes_with [subcellular location] is ok but the issue we have is with colocalizes_with [protein-containing complex or children]. It's the latter ones we proposed to remove. If there is no physical binding evidence, what does a colocalizes_with [protein-containing complex or children] annotation add? We agreed that it would be better instead to make the subcellular location branch more granular.

P-Body is odd and I agree with Val that its definition isn't one of a complex but a focus but it's a child of protein-containing complex. Maybe its hierarchy needs fixing and then it would be fine to annotate to it with colocalizes_with.

ValWood commented 5 years ago

I thought about the splicosome example some more.

So if prp2 isn't part of the complex it's probably not good to use coloclizes_with To recap, the reason we want to clean these up is that it's not good to have a qualifier that isn't still true if it is ignored. A qualifier should be more specific than an annotation, not change the annotations meaning. Cleaning up the use for complexes is a first step to dealing with this issue. Hopefully, eventually, the recommended guidelines can be changed. So you are correct that this is not against the current guidelines, BUT it is a use case that was added later after the qualifier was introduced to describe GFP-screens etc, and added a different meaning. I am still not convinced that we really need this particular qualifier at all.

Here you are trying to capture prp2 position in the spliceosome cycle

nrg https://www.nature.com/articles/nrm.2017.86.pdf

Is it necessary to annotate prpr2 to GO:0071006 U2-type catalytic step 1 spliceosome if it acts on the complex. This is captured by the process

I would do this RNA-dependent ATPase activity
has_direct_input SO:mRNA or possibly intron 5' splice site occurs_at GO:0071006 (can be included if you really want to associate to the complex, but this should be evident from the process) part_of generation of catalytic spliceosome for first transesterification step *

(* Is this the correct term for "U2-type catalytic step 1 spliceosome generation" ? should this synonym be added?)

For CC I would use GO:0005681 spliceosomal complex Any of a series of ribonucleoprotein complexes that contain snRNA(s) and small nuclear ribonucleoproteins (snRNPs), and are formed sequentially during the spliceosomal splicing of one or more substrate RNAs, and which also contain the RNA substrate(s) from the initial target RNAs of splicing, the splicing intermediate RNA(s), to the final RNA products.

and not a specific complex.

If the ATPase is involved in complex maturation, this would be in with how we currently annotate chaperones and other proteins that join and leave a complex. We don't annotate protein complex assembly chaperones with "colocalizes_with".

ValWood commented 5 years ago

And I am removing this https://www.pombase.org/gene/SPBC19C2.01 where I annotated to GO:0071006 U2-type catalytic step 1 spliceosome by ISS without the qualifier..... ;(

bmeldal commented 5 years ago

From annotation call on 8/1/2019:

@RLovering: Are tools not stripping the whole annotation rather than only the qualifier? @ukemi: Will check for tools that only remove the qualifier, not the annotation. Thinks all tools remove complete NOT annotations, though. @vanaukenk: Will look at evaluating tools

ValWood commented 5 years ago

The question really should be "Do we really really need this qualifier" There are only 2669 colocalizes_with Experimental. When I look at these in many cases it seems that the qualifier isn't really necessary.

There are 800 experimental annotations with colocalizes_with cytoplasm. What does that even mean? That could be stripped out bringing the number down to under 2000.

Maybe stripping the clearly incorrect ones would reduce the number enough that people would not mind checking them.......

ValWood commented 5 years ago

Actually it isn't that many. I didn't filter for direct. But for nucleus there are 43 direct EXP with colocalizes_with.

The problem is the qualifier has never been used with a specific single meaning, so if people strip these out they are often stripping out validf annotation.

ValWood commented 5 years ago

There are only 23 annotation cytoplasm EXP colocalizes with. But the question remains why would you use colocalizes_with with this term? This question applies across the board....

ukemi commented 5 years ago

I agree we have never used this term consistently. It seems to have drifted from its original intent. I'm unsure that we want to strip these annotations from all tools. As Val says, in many cases there is useful information in the colocalizes_with annotations.

hattrill commented 5 years ago

Perhaps 2019 is the year that we phase the colocalizes_with qualifier out. I don't think that it will be missed. No new ones from now on and groups can whittle their collection down as and when they have the time.

RLovering commented 5 years ago

This is a list of 132 human proteins annotated with 'colocalizes with' and a child of GO:0032991 protein-containing complex. None of these are annotated directly to GO:0032991 protein-containing complex. Therefore this list could be used in functional analysis tools and if 32991 is enriched then the annotations have not been removed (you're welcome)

GENE PRODUCT ID

H7C1Q1 P14373 P23497 Q6ZRQ5 Q8NAP1 Q8NG08 Q8NHM5 Q8WTX7 Q96N67 Q96SN8 Q9H7E2 Q9Y4E5 A6NHX0 Q03112 Q17R98 Q86VW0 Q96L92 Q9BYB0 Q4VCS5 Q6ZVH7 Q7Z3K3 Q7Z5L2 Q8N488 Q9NRR5 O95718 P19784 P31689 Q8NHU2 Q96JM2 Q9UBP0 Q9UJW3 Q9Y6P5 O43491 P51149 Q6IQ26 Q8TBY9 Q8WZ73 Q9C0G0 Q9Y5X2 O75152 O75165 Q7Z4T9 Q8N6M0 Q96FC9 Q9NNW5 Q9UBK9 Q9UM11 A0A1B0GVQ0 L0R8F8 O00291 O95278 P13688 P51572 Q8TCN5 Q9H869 Q9UJ55 Q9Y3T6 P25686-2 P41214 P51530 P80723 Q14191 Q7Z7F0 Q8N8E2 Q96NW7 Q96Q80 Q9H6E5 Q9UKV3-1 P25686 Q14184 Q6NTE8 Q8NA57 Q9NQV6 Q9P2R3 Q9UKV3-3 O60493 O60927 P61026 Q15700 Q53HC9 Q5VY09 Q96GE9 Q99755 Q9NPA5 Q9UPQ0 Q9Y3L3 A0A087WUM0 Q13325 Q1RMZ1 Q8IY63 Q8WWG9 Q9HCM1 Q9UPP5 O75052 P20645 P57088 Q9BX66 Q9H1J1 Q9NVH1 P28289 Q96G01 Q96LJ8 Q9NP61 Q9UPX8 O75146 O94972 P30203 Q10588 Q5T6S3 Q8N6H7 Q8NBW4 Q8WXC6-2 Q9UJZ1 A0A087WTH5 P57105 Q13740 Q8IZQ1 Q96C92 Q9BXM7 Q9UKV3-2 O43189 Q12893 Q13492 Q6UW78 Q70YC4 Q96EH3 Q96HA7 Q99497 Q9GZP9 Q9NYF0 Q9NZC9 Q9NZQ3

cmungall commented 5 years ago

Would protein-containing complex be an informative result in an enrichment analysis?

ukemi commented 5 years ago

I suspect that in reality everything should be annotated to it. If not, I'd be interested in the exception.