geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
34 stars 10 forks source link

Proposal for guidelines for using contributes_to #1650

Closed pgaudet closed 2 years ago

pgaudet commented 7 years ago

Action item from GOC meeting Oct 2017 + May 2018 meeting Aim to have a proposal for the April 2018 meeting.

bmeldal commented 6 years ago

https://github.com/geneontology/go-ontology/issues/14847: Issue about how to annotate SRP (signal recognition particle) to its function.

bmeldal commented 6 years ago

Can someone please elaborate why contributes_to became a contentious relationship? I'm trying to design the survey question.

Current work-around suggestions for regulatory complex subunits are:

[GP] RO:0002327 enables GO:0003674 Molecular Function with AE [MF of complex] RO:0002333 enabled_by [Complex Portal AC]

or

[GP] RO:0002327 contributes_to [MF of complex] RO:0002333 enabled_by [Complex Portal AC]

In any case we need to make it clear that the GP is part of the complex, not an interactor that somehow activated the complex function but isn't stably bound.

Thanks, Birgit

bmeldal commented 6 years ago

@ValWood please explain pombase use of contributes_to @vanaukenk please remind us why its usage in inconsistent

ValWood commented 6 years ago

Some examples where we have used it:

GO:0003688 origin recognition complex subunit Orc1 GO:0003688 origin recognition complex subunit Orc2 GO:0003688 origin recognition complex subunit Orc5 GO:0003688 origin recognition complex subunit Orc6 GO:0003688 origin recognition complex subunit Orp3 contributes_to GO:0003688 | DNA replication origin binding

GO:0003689 DNA replication factor C complex subunit Ctf8 GO:0003689 DNA replication factor C complex subunit Rfc1 GO:0003689 DNA replication factor C complex subunit Rfc2 GO:0003689 DNA replication factor C complex subunit Rfc3 GO:0003689 DNA replication factor C complex subunit Rfc4 contributes to GO:0003689 | DNA clamp loader activity

GO:0003697 MCM complex subunit Mcm4/Cdc21 GO:0003697 MCM complex subunit Mcm4/Cdc21 GO:0003697 MCM complex subunit Mcm6 GO:0003697 MCM complex subunit Mcm6 GO:0003697 MCM complex subunit Mcm7 GO:0003697 MCM complex subunit Mcm7 contributes_to GO:1990518 | single-stranded DNA-dependent ATP-dependent 3'-5' DNA helicase activity

GO:0003887 DNA polymerase delta small subunit Cdc1 GO:0003887 DNA polymerase delta small subunit Cdc1 GO:0003887 DNA polymerase delta subunit Cdc27 GO:0003887 DNA polymerase delta subunit Cdc27 GO:0003887 DNA polymerase delta subunit Cdm1 contributes_to GO:0003887 | DNA-directed DNA polymerase activity

GO:0004175 20S proteasome complex subunit alpha 1 GO:0004175 20S proteasome complex subunit alpha 2, Pre8 GO:0004175 20S proteasome complex subunit alpha 3 Pre9 GO:0004175 20S proteasome complex subunit alpha 4 Pre6 GO:0004175 20S proteasome complex subunit alpha 5, Pup2 GO:0004175 20S proteasome complex subunit alpha 6 Pre5 GO:0004175 20S proteasome complex subunit alpha 7, Pre10 GO:0004175 20S proteasome complex subunit beta 1 Pre3 GO:0004175 20S proteasome complex subunit beta 2 Pup1 GO:0004175 20S proteasome complex subunit beta 3, Pup3 GO:0004175 20S proteasome complex subunit beta 4 Pre1 GO:0004175 20S proteasome complex subunit beta 5 GO:0004175 20S proteasome complex subunit beta 6 Pam1 GO:0004175 20S proteasome complex subunit beta 7, Pre4

contributes_to GO:0004298 | threonine-type endopeptidase activity

GO:0046933 F0-ATPase subunit 6 (predicted) GO:0046933 F0-ATPase subunit 8 (predicted) GO:0046933 F0-ATPase subunit 9 (predicted) GO:0046933 F0-ATPase subunit D (predicted) GO:0046933 F0-ATPase subunit E (predicted) GO:0046933 F0-ATPase subunit F (predicted) GO:0046933 F0-ATPase subunit G (predicted) GO:0046933 F0-ATPase subunit J (predicted) GO:0046933 F0-ATPase subunit K (predicted)

contributes_to GO:0046933 | proton-transporting ATP synthase activity, rotational mechanism

So basically we use "contributes_to" if the activity is a "function of the complex" rather than of independent subunits.

I'd be happy to change what we call this to make it clearer, but it doesn't fit an annotation extension because it isn't extending an annotation (what would the relation and the object in the extension be?).

ValWood commented 6 years ago

So in general people ignore qualifiers.

People generally (at least at PomBase) also use function annotation slightly differently than process annotations.

Function annotations are rarely used in analysis (enrichment etc) as they are not usually informative. This makes sense because many functions are parts of lots of different processes. For fission yeast I haven't really seen functions used in enrichments, or slims. Where I have these terms have been used as a proxy for processes (transporter for transport, transcription factor for transcription).

Our users use function terms mainly for searching and retrieving/report gene products with a specific function i.e all protein kinases, transcription factors, GTPases, GAFs, GEFs, specific types of transporters etc.

Because of this, and the fact that qualifiers are often ignored (we ignore them for searching), we only use "contributes_to" in cases where the annotations still make sense if the qualifier is stripped away.

We would not use "contributes_to" for example for a cyclin, with an annotation to "cyclin dependent protein kinase activity" or for other enzyme regulators which are required for the activity, instead we use "enzyme regulator activity" terms for these, so that they are not retrieved with a query for "protein kinase activity"

It could be argued that the cyclin is an intrinsic part of the kinase activity because there is no kinase activity without it, it's almost a licensing interaction, but doing calling it a regulator seems to align with community expectations and how they partition cyclins and kinases.

I thought from theGO meeting that the proposal was to have a better set of relationships to use for the current different uses of contributes_to in the qualifier column? For instance, we might want a different relationship for "function of complex" to "licensing the function of a".

bmeldal commented 6 years ago

Thank you, Val, that's really helpful.

So in general people ignore qualifiers.

Why do we bother? Who IS using this sort of data?

Our users use function terms mainly for searching and retrieving/report gene products with a specific function i.e all protein kinases, transcription factors, GTPases, GAFs, GEFs, specific types of transporters etc.

I'd imagine that's how people will use it for complexes, too, and I DO use it for retrieving all complexes with X function - esp if I have to fix an annotation :)

Because of this, and the fact that qualifiers are often ignored (we ignore them for searching), we only use "contributes_to" in cases where the annotations still make sense if the qualifier is stripped away.

Absolutely.

We would not use "contributes_to" for example for a cyclin, with an annotation to "cyclin dependent protein kinase activity" or for other enzyme regulators which are required for the activity, instead we use "enzyme regulator activity" terms for these, so that they are not retrieved with a query for "protein kinase activity" It could be argued that the cyclin is an intrinsic part of the kinase activity because there is no kinase activity without it, it's almost a licensing interaction, but doing calling it a regulator seems to align with community expectations and how they partition cyclins and kinases.

I thought from the GO meeting that the proposal was to have a better set of relationships to use for the current different uses of contributes_to in the qualifier column? For instance, we might want a different relationship for "function of complex" to "licensing the function of a".

That was the main discussion point, we were wondering if we can somehow infer function annotations from the complexes to these subunits without turning 'qualifier-stripped' annotations into false annotations. But if users ignore qualifiers anyway then what's the point arguing this point? We couldn't infer anything for the "licencing" proteins as we couldn't make an inference to the "enzyme regulator activity" terms programmatically, I don't think we'd be able to "hit" the right term.

The following proposal came from ticket https://github.com/geneontology/go-annotation/issues/1662 and required users to distinguish qualifiers and at least use with/from column data. I can't remember who added the AE bit to the proposal, it came out of the breakout group at the Cambridge mtg and I don't fully understand it either!

Active subunit: [GP] RO:0002327 enables [MF of GP] with AE part_of [CP ACs of complex].

I had a related conversation with @tonysawfordebi this week (as part of fixing the gpa file!) and he suggested:

Active subunit: [GP] RO:0002327 enables [MF of GP] with [CP ACs of complex].

We only need a new evidence code that is a child (or sibling?) of 305 [IC]. How does that sound? It's almost the same as for "regular" annotation practices but we add the Complex Portal AC in the with/from column.

Regulatory subunit: [GP] RO:0002327 enables GO:0003674 Molecular Function with AE [MF of complex] RO:0002333 enabled_by [Complex Portal AC]

This was to circumvent contributes_to and add the complex AC... Whoever came up with this proposal please shout and elaborate!!!

bmeldal commented 6 years ago

@RLovering do you have user stories for this?

bmeldal commented 6 years ago

Draft survey: https://docs.google.com/document/d/1P_VLM9g13kj9lu3CRAgotAI3cUmWkS1yWaVg95u2Vbk/edit?usp=sharing

ValWood commented 6 years ago

So in general people ignore qualifiers.

I should qualify this. They ignore them when searching, and when using gafs, doing enrichments or slimming. However, they do see them on gene pages where the context is clear. Most of our users consume GO data via gene pages. From here they can easily access other complex members, and see all of their functions.

I thought enables was the current qualifier for Molecular functions, when a gene product is performing the function?

bmeldal commented 6 years ago

Ok, so still useful to use qualifiers :)

I thought enables was the current qualifier for Molecular functions, when a gene product is performing the function?

Yes, AKAIK. But the idea was to link the MF annotations for the GP to the complex where it carries out the function, so using the complex AC in the with/from column for GP annotations inferred from complex annotations.

ValWood commented 6 years ago

But that would be redundant as the gene products woul be annotated to the complex anyway? I don't think I'm really understanding what the proposal is so I can wait until the meeting!

bmeldal commented 6 years ago

I'll draw up a diagram for the call to make it all clearer where what comes from ;-). Only issue is that I have no access to P2GO, I can only see QuickGO and AmiGO (and the MODs of course).

bmeldal commented 6 years ago

another use case: https://github.com/geneontology/go-annotation/issues/1829 Rat PRPP Associated Proteins should be modified as (CONTRIBUTES TO)

bmeldal commented 6 years ago

From https://github.com/geneontology/go-annotation/issues/1662, following WG TC on 22/2/18:

How GO annotators use the qualifier contributes_to:

Ruth: What about homodimers? Annotate directly. Discussion highlighted issue that we can never know if the function is carried out by the monomer or homodimer (or even homomultimer) if protein selfassembles in solution. AI: Birgit to add PDGF examples

Summary: Different groups use slightly different guidelines (and it may even vary within groups) either annotating all regulatory subunits of a complex with contributes_to or only in cases where the catalytic subunit has not been identified.

Solution:

pgaudet commented 6 years ago

Hi @bmeldal I propose we define the different cases slightly differently:

a) single catalytic subunit in complex:

b) reaction catalyzed by a complex, for example RNA polymerase:

c) catalytic subunit not identified (unknown): no annotation

Or perhaps this is exactly what you wrote !

Thanks, Pascale

bmeldal commented 6 years ago

It's not that easy!

1) single catalytic subunit in complex (unlimited regulatory or accessory subunits):

2 annotation practices in place:

1a)

Example?

1b)

Nancy's revised practice for telomerase core complex. @vanaukenk added a bunch of papers for telomerase to the Google doc (Case 5)

Q: What about homodimers/multimers? Direct annotation? We might know that the monomer is inactive and requires multimerisation.

Case: SWI/SNF complexes have one enzyme subunit (BRG1 or BRM) that is active possibly on its own (expt from a cell extract) but in vivo is part of at least a minimum core complex (e.g. PMID:10078207 - messy paper when it comes to core complex definition but very typical of these large complexes. INI1, BAF155, BAF170 and BRG1 form a core complex (F1). Remodelling activity shown for "isolated" (from cell extract, so definitely not pure) BRG1 and BRM but is not additive (F2) --> other papers show only one ATPase is found in any given assembly. "data not shown" expts showed that ATPase activity of BRG1 is lower than for isolated complex activity. Cue adding INI1, BAF155 and BAF170 to the mix and remodelling activity goes up! Exclude BRG1 and no activity (F3). --> BRG1 has ATPase activity - direct annotation. Associated subunits INCREASE activity but are not required for basic activity - logically should get contribute_to qualifier. @RLovering I think you were contemplating a case like this and wondered how to deal with it.)

2) reaction catalyzed by a complex, for example RNA polymerase:

Q: What happens when we know the catalytic subunit(s) within the complex but experimentally it has been proven that a minimum core complex is required for the activity? Do we use multiple direct annotations for catalytic subunits and contributes_to for the other obligatory subunits? e.g. Telomerase: TERT is catalytic subunit but has no activity unless bound to its RNA partner. Nancy originally annotated both subunits with contributes_to but after the last call she changed them to TERT with direct annotation to MF and RNA with qualifier. (papers in Google doc case 5)

3) catalytic subunit not identified (unknown): no annotation

Q: How does that differ from 2) ? If the experiment shows the complex has an activity then subunits should be annotated with contributes_to according to 2) above.

bmeldal commented 6 years ago

For the purpose of inferring MFs to regulatory subunits we decided to only do it for GPs annotated with biological role=enzyme regulator via the ontology using the appropriate term from the class regulator activity, see summary from call on 15/3/18: https://github.com/geneontology/go-annotation/issues/1662

No decision made as to "classic" GO annotation practices.

RLovering commented 6 years ago

Following from call on 15th March. I am assuming that when you know the DNA sequence bound by a specific subunit of a complex you will be creating an IntAct (or Complex) annotation which captures this information and presumably this data is then exported to GO using the term 'DNA binding' evidence IPI and the sequence ID in the With Field.

Although this assumes that this data is captured more than once for the subunit in order to meet the threshold required for export - is that true or are some data being exported to GO database without more than 1 experimental evidence (I realise it is more complicated than this but I didn't want to put in the full details here)?

Anyway, in cases where DNA binding is shown but genomic sequence not available I think I proposed that you could offer to send these papers to GO groups for their annotation (if these annotations were 'missing'). I just want to say that the Protein2GO tool is very easy to use and that you might prefer to just submit this data yourself, this might help with your stats for grant reporting too.

Annotation of the functional role of specific subunits using the Protein2GO tool might be an easier approach to take to ensure the Complex data is captured following GO guidelines (assuming these are agreed) rather than trying to create complicated rules to enable this data to be available in GO, or to end up without the data being incorporated into GO.

In addition, with the new CausalTab ontology being developed, if the data is in IntAct, maybe we need to consider whether the data exported to GO from IntAct needs new 'rules' to enable more 'functional data' from IntAct to be included in GO.

Finally, I just wanted to point out that you need to be careful what you mean by 'chromatin binding'. After a long discussion it was agreed that 'chromatin' includes transcription factors as well as histones. There will be some complexes that do 'bind' chromatin but I think many of your complexes will be 'part_of' chromatin rather than binding it.

Ruth

bmeldal commented 6 years ago

Thank you, Ruth, very good points raised. Answering in line to each point:

I am assuming that when you know the DNA sequence bound by a specific subunit of a complex you will be creating an IntAct (or Complex) annotation which captures this information and presumably this data is then exported to GO using the term 'DNA binding' evidence IPI and the sequence ID in the With Field.

The IntAct GAF only contains protein-protein interaction data. If you want protein-X data we'll have to have a chat :)

Anyway, in cases where DNA binding is shown but genomic sequence not available I think I proposed that you could offer to send these papers to GO groups for their annotation (if these annotations were 'missing'). I just want to say that the Protein2GO tool is very easy to use and that you might prefer to just submit this data yourself, this might help with your stats for grant reporting too.

Annotation of the functional role of specific subunits using the Protein2GO tool might be an easier approach to take to ensure the Complex data is captured following GO guidelines (assuming these are agreed) rather than trying to create complicated rules to enable this data to be available in GO, or to end up without the data being incorporated into GO.

That's true. We didn't consider using P2GO as the complex ACs won't be available until the data is released by us and then it becomes cumbersome to copy it back into P2GO. So we decided to provide our own GPAD file which is what we are fixing now. That was before I was considering inferring the annotation from the complex to the GPs - which is a more recent - and possibly crazy! - idea of mine! It may well be the way forward to simply add the missing GP annotation in P2GO. Would the EBI curators be able to provide training?

In addition, with the new CausalTab ontology being developed, if the data is in IntAct, maybe we need to consider whether the data exported to GO from IntAct needs new 'rules' to enable more 'functional data' from IntAct to be included in GO.

Well, other people in the team are part of that consortium so please raise it with them :)

Finally, I just wanted to point out that you need to be careful what you mean by 'chromatin binding'. After a long discussion it was agreed that 'chromatin' includes transcription factors as well as histones. There will be some complexes that do 'bind' chromatin but I think many of your complexes will be 'part_of' chromatin rather than binding it.

Thank you, that distinction always get me and I do go to the definitions! You wouldn't have got a list of erroneous annotations already??? happy to fix from there :) I know I have to systematically go through all the "DNA binding" annotation as they are definitely incomplete as I mentioned in our previous calls with Astrid et al. I'll do that as part of my effort to finish the epigenetic complexes which has been neglected - again - due to fixing our ECOs for the new GO GPAD! (only about 200 complex left to check manually :o)

bmeldal commented 6 years ago

Question from NYU GOC mtg (Ruth):

Do tools strip qualifiers or whole lines if they contain qualifiers?

suzialeksander commented 6 years ago

SGD consistently uses contributes_to to indicate that a subunit may not have the activity of a complete complex, but the complex doesn't have the activity without the subunit. If I'm following correctly, it's in line with Val's comment

So basically we use "contributes_to" if the activity is a "function of the complex" rather than of independent subunits.

For example: Slx5 (https://www.yeastgenome.org/locus/S000002171/go) and Slx8 (https://www.yeastgenome.org/locus/S000000918) individually do not possess GO:0004842 ubiquitin-protein transferase activity (at least not Slx5 in my last search, Slx8 may under some conditions); each has 1+ SUMO binding site and the two must form a complex to have full activity.

We annotated both proteins to contributes_to GO:0004842 ubiquitin-protein transferase activity. (The complex is functionally conserved in pombe: Rfp1 and Rfp2, refs listed in PMID:18499666)

References:

PMID:18032921, "Here we show that the Slx5-Slx8 complex, but not its individual subunits, stimulates several human and yeast Ub conjugating enzymes, including Ubc1, 4, 5, and Ubc13-Mms2."

PMID:18499666 "Structure-function analysis indicates that the Slx5–Slx8 complex contains multiple SUMO-binding domains that are collectively required for in vivo function." "Among point mutation alleles that map to the RING domains of Slx5 and Slx8, four were shown to eliminate Ub ligase activity in vitro (slx5-6, slx5-8, slx8-1, and slx8-3) and those displayed lethality in the sgs1Δ background (Fig. 2A) (23, 37))."

bmeldal commented 6 years ago

Hi Suzie,

Conceptionally, that makes sense as long as the downstream tools can deal with it appropriately. The discussion we had is in cases where tools strip the qualifiers and you are left with the direct annotation to the MF.

I went to the Networks and Systems Biology conference at EMBL Heidelberg in April and met a few users who do enrich across MF. They are cautious but nevertheless, people do use MF for large-scale analysis as well as just looking at the details on the gene pages.

bmeldal commented 5 years ago

From call on 31/1/19:

Issues summary:

  1. When tools strip the qualifier the MF is assigned directly to the GP. This needs to be avoided for any subunits that have no catalytic activity, eg. regulatory subunits or scaffold proteins.
  2. Varying use cases: 2.1 Pombase use it for all components of a complex where no single GP has the activity, 2.2 other groups use it for regulatory subunits. Pombase used X regulator activity instead.
  3. Homo-multimers: do they have catalytic activity as monomers or does each protomer contribute to the MF?
  4. Catalytic cores: one GP is a defined catalytic subunit but it requires a core of regulatory subunits to function in vivo. (e.g. SWI/SNF complexes)
  5. What if 2 GPs make a composite binding site that is needed to provide MF?

Proposals (numbering not related to above!):

  1. Homo-multimers: allow direct annotation to MF as it's usually very difficult to ascertain if only the multimer has the MF or also the monomer. [Ruth] One can also annotate to the CP AC in parallel.
  2. Catalytic subunits: annotate directly
  3. Regulators: annotate to "X regulator activity" (children of GO:0098772 molecular function regulator) - request new terms if needed.
  4. Any other GPs that are part of the complex: "GO:0032947 protein-containing complex scaffold activity" or children - request new terms if needed.
  5. Composite binding arrangements to make catalytic site: these are the hard ones. Retain contributes_to on both/all GPs that contribute to (see what I'm doing here?!?!) the catalytic site to make it expressive on gene pages etc. It's not completely wrong when stripped. Alternatively, do not annotate to the GP and only to the CP AC?
  6. In all cases, an AE of type part_of "CP AC" is also encouraged. Or should it be occurs_in?
bmeldal commented 5 years ago

From call on 13/2/19 [edited after call]:

Present: Birgit, Harold, Ruth, Kimberly, Darren, Peter, Leonore, Suzie, Judy, Edith, Helen

Composite binding sites: are the most tricky cases.

Action:

Questions: What would be the time frame to remove qualifier?

Correction for above: AE should be occurs_in

bmeldal commented 5 years ago

Use cases are here: https://docs.google.com/document/d/1ZtAcjIyIQ_ycbuMHyvLA-KIJQtGenh82lxS-MKC6a_A/edit#

This is the paper that was mentioned last night: https://www.ncbi.nlm.nih.gov/pubmed/?term=11841213 https://pubs.acs.org/doi/abs/10.1021/bi011998t I've added it to the use case list (case 8).

bmeldal commented 5 years ago

25/2/19 Call Action:

vanaukenk commented 5 years ago

@bmeldal The next site-wide annotation call is on the 12th of March.

bmeldal commented 5 years ago

From call on 25th Feb 2019:

Present: Birgit, Harold, Thomas Hayman, Ruth, Kimberly, Darren, Peter, Suzie, Judy, Edith, Helen, Val

Birgit gave a quick recap and we agreed that Birgit will send a survey to the GOC by the end of next week to discuss the use (or not!) of contributes_to for GPs that only have the function as part of a complex.

Action:

bmeldal commented 5 years ago

From call on 4th March:

Present: Birgit, Harold, Kimberly, Darren, Peter, Edith, Suzie, Helen, Judy, Leonore, Ruth

@hattrill suggested that there are too many examples where we the catalytic unit(s) don't have the function on their own and we really can only annotate with contributes_to.

@vanaukenk agreed to look at this with @ukemi and prepare a proposal for the Cambridge meeting.

Survey no longer required.

pgaudet commented 5 years ago

Added to the agenda.

bmeldal commented 5 years ago

@vanaukenk agreed to look at this with @ukemi and prepare a proposal for the Cambridge meeting.

Is this a separate agenda point, not complex WG feedback?

vanaukenk commented 5 years ago

@bmeldal - that used to be a separate agenda point, but is no longer.

pgaudet commented 5 years ago

PROPOSAL FOR CONTRIBUTES_TO:

related

Problem: If GP1 only “contributes_to” function F and tools strip the qualifier --> false inference of GP1 | enables | some MF instead of GP1 | contributes_to | some MF

~I don’t know if Rule 6 (below) is still needed: If the catalytic subunit is known, the GPs would either be catalytic, regulatory or adaptor subunits and if the catalytic subunit is not known, all are annotated to the MF with contributes_to. If the MF for the complex itself is unknown, we have nothing to annotate to!~

Usage of contributes_to (as of 18/4/19):

Annotation review:

ValWood commented 5 years ago

Re

New Rule 3: If a GP is part of a complex and has been identified as a crucial part of the complex without catalytic or regulatory activity, annotate to "GO:0060090 molecular adaptor activity" or children. Request new adaptor activity terms if needed. Optionally, place the CP AC in the AE with qualifier occurs_in.

I don't think we can assume that all complex subunits that are not catalytic or regulatory are adaptors.

pgaudet commented 5 years ago

Thanks Val, I edited the rule to reflect this.

bmeldal commented 5 years ago

Since there are cases where the GP is neither the catalyst, regulator or adaptor then we definitely need rule 6 (I wasn't sure when I summarised it post GOC mtg).

I'm going to edit the post accordingly.

hattrill commented 4 years ago

Can we have the rules for contributes_to written into public-facing documentation? At the moment the website (http://geneontology.org/docs/go-annotations/) says: "Contributes to appears in a GO annotation when a function of a protein complex is facilitated, but not directly carried out by one of its subunits on its own. Annotating individual gene products according to attributes of a complex is especially useful for molecular function annotations in cases where a complex has an activity, but not all of the individual subunits do. (For example, there may be a known catalytic subunit and one or more additional subunits, or the activity may only be present when the complex is assembled.) Molecular function annotations of individual subunits working as complexes in which no individual subunit has the activity must include contributes to in the annotation. The contributes to qualifier should not be used in biological process annotations. All gene products annotated using contributes to must also be annotated to a cellular component term representing the complex that possesses the activity. Note that contributes to is not needed to annotate a catalytic subunit. Furthermore, contributes to may be used for any non-catalytic subunit, whether the subunit is essential for the activity of the complex or not."

Which not at all correct.

pgaudet commented 4 years ago

Just wondering - what is not correct in that text ? Maybe it could be made clearer by listing the rules above, but these seem more useful for curators than end users.

hattrill commented 4 years ago

The whole text doesn't reflect the rules as laid out, but encompass the old use - the closing sentence is particularly misleading:

'Furthermore, contributes to may be used for any non-catalytic subunit, whether the subunit is essential for the activity of the complex or not'

Having a summary statement in this location is ok (but needs to be updated), but we need to have the complete set of rules somwhere public as curators can't find them.

pgaudet commented 4 years ago

Thanks for bringing that up. I have added a much simplified version of the 'rules' on the wiki, where @vanaukenk and I had already added a few examples:

http://wiki.geneontology.org/index.php/Contributes_to

Please if you have any feedback do not hesitate to send it !

I'll update the website as soon as we are relatively happy with the wiki documentation.

Thanks, Pascale

hattrill commented 4 years ago

Thanks! It looks good. I could probably dig out some examples as well.

bmeldal commented 4 years ago

Is this sentence:

Usage guidance The 'contributes_to' qualifier is used differently depending on whether a gene product is

missing the end?

pgaudet commented 4 years ago

Hi @bmeldal Thanks for reading this far in the document ;) !! Fixed.

bmeldal commented 4 years ago

I read all of it ;-)

I think it should read

The 'contributes_to' qualifier is used differently only in cases where a gene product does not perform a molecular function on its own. Being part of a complex is not sufficient to use this qualifier.

hattrill commented 2 years ago

@hattrill pinging myself to draft text for GOC site

pgaudet commented 2 years ago

I have updated this a few weeks ago

https://wiki.geneontology.org/Contributes_to

sjm41 commented 2 years ago

Looks good, thanks. But I find the "Subunits of nuclear RNA polymerases" example confusing.

First sentence seems to say that all subunits should get the 'contributes to' annotation: "_none of the individual subunits have RNA polymerase activity, yet all of these subunits contributeto DNA-dependent RNA polymerase activity." But then the next 2 sentences say that some subunits "serve other functions besides the polymerase activity" and these "would not be annotated to polymerase activity using the 'contributes to' qualifier"

It's established that a subset of 10 RNApol subunits form the 'catalytic core' of each RNApol (I, II and III) enzyme, and these are the subunits that should get "contribute_to DNA-dependent RNA polymerase activity". E.g. see Table 1 of PMID:22365827. And we've edited InterPro2GO and PAINT annotations over the past couple of years to fit that definition.

So I suggest rewording this example to: Subunits of nuclear RNA polymerases: none of the individual subunits have RNA polymerase activity, yet a subset form the catalytic core (PMID:22365827) - these subunits should be annotated with "contributes_to DNA-dependent RNA polymerase activity". Nuclear RNA polymerase complex subunits outside the catalytic core serve other functions - these subunits should therefore not be annotated to polymerase activity using the 'contributes to' qualifier. Annotation for S. pombe RNA polymerase II large subunit Rpb1:

pgaudet commented 2 years ago

Updated, thanks

pgaudet commented 2 years ago

I also updated http://geneontology.org/docs/go-annotations/

Can this ticket be closed?

sjm41 commented 2 years ago

I think so, but @hattrill should confirm.