geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

OTR: GO:0016514 SWI/SNF complex (and related) - update defs? #14106

Closed bmeldal closed 7 years ago

bmeldal commented 7 years ago

Hi,

GO:0016514 SWI/SNF complex has a def of "A SWI/SNF-type complex that contains nine or more proteins, including both conserved (core) and nonconserved components; the Swi2/Snf2 ATPase is one of the core components."

which doesn't distinguish it from its sibling GO:0035060 brahma complex "A SWI/SNF-type complex that contains the ATPase product of the Drosophila brahma gene, or an ortholog thereof."

as the Brahma gene or their orthologs are also part of the Swi2/Snf2 ATPase family.

I think the def should be "A SWI/SNF-type complex that contains nine or more proteins, including both conserved (core) and nonconserved components; the SMARCA4/BAF190A/BRG1/SNF2B/SNF2L4 ATPase is one of the core components." - I don't know which are the most common names for the ATPase in this complex. In mammals the traditional name is BRG1 but the official gene name is now SMARCA4. Neither NAME exists in drome or yeast. (same system goes for the BRM/SMARCA2 gene for the brahma complex.)

Also: GO:0070604 PBAF complex: "A SWI/SNF-type complex that contains the ATPase product of the mammalian BAF180 gene." BAF180 in NOT an ATPase, should that be reworded? Or do I not read the phrase correctly?

Birgit

bmeldal commented 7 years ago

I should have added, we need the following, common synonyms: GO:0016514 SWI/SNF complex EXACT synonym SWI/SNF-A complex and GO:0035060 brahma complex EXACT synonym SWI/SNF-B complex

krchristie commented 7 years ago

Birgit, I'm not sure if I understood correctly, are you saying that the 'SWI/SNF complex' doesn't exist in yeast? I'm not sure that is true. Searching for "swi/snf AND yeast" in PubMed brings up quite a number of papers.

This isn't my ticket, so I don't want to spend any more time on this, but please make sure to account for yeast complexes in any changes to these terms.

thanks,

-Karen

bmeldal commented 7 years ago

No, the complex exists in yeast but the proteins are named differently so I don't know how to cater for all the variant naming in the new def :(

I amended my original ticket to indicate it's the gene/protein NAME that's the issue.

krchristie commented 7 years ago

With respect to the definitions, I think there is precedent to give specific names from multiple species, so the def could be amended to something more like this:

A SWI/SNF-type complex that contains nine or more proteins, including both conserved (core) and nonconserved components. In mammals, the SMARCA4/BAF190A/BRG1/SNF2B/SNF2L4* ATPase is one of the core components. In S. cerevisiae, ... (if you know what it's called in yeast)]

Regarding the synonyms, if the "SWI/SNF-A complex" name is NOT applicable to the yeast complex, i.e. if this synonym is only applicable to a subset of organisms that have this complex, then it would probably be better to call the synonym NARROW, rather than EXACT.

bmeldal commented 7 years ago

Yeast and drome, and presumably other non-mammals, only have one SWI/SNF ATPase version which appears to be either a homology of the main ATPase for GO:0016514 SWI/SNF complex or GO:0035060 brahma complex

SGD annotate to GO:0016514 SWI/SNF complex which already contains SNF2 as name for the catalytic unit in the def, Flybase annotate to GO:0035060 brahma complex, the name of which derives from the fly gene :)

Suggested new defs:

GO:0016514 SWI/SNF complex "A SWI/SNF-type complex that contains EIGHT TO FOURTEEN proteins, including both conserved (core) and nonconserved components; contains the ATPase product of the yeast SNF2 or mammalian SMARCA4/BAF190A/BRG1 gene, or an ortholog thereof."

GO:0035060 brahma complex "A SWI/SNF-type complex that contains EIGHT TO FOURTEEN proteins, including both conserved (core) and nonconserved components; contains the ATPase product of the Drosophila brm (brahma) or mammalian SMARCA2/BAF190B/BRM gene, or an ortholog thereof."

Component number changed as new complex discovered with only 8 components. Or leave components out altogether! (PMID:12368262 for brain-specific SWI/SNF complex)

Does that make sense?

krchristie commented 7 years ago

I like your proposed new definitions. I think these kind of details are the things that allow curators to accurately pick the correct term.

ValWood commented 7 years ago

Agreed. I think its important that we can still use SWI/SNF for yeast. I think it was first identified in yeast and this term should still be useable for the canonical complex.

bmeldal commented 7 years ago

thanks, @krchristie absolutely, @ValWood

pgaudet commented 7 years ago

I changed the definitions as requested:

id: GO:0016514 name: SWI/SNF complex OLD def: "A SWI/SNF-type complex that contains nine or more proteins, including both conserved (core) and nonconserved components; the Swi2/Snf2 ATPase is one of the core components." [GOC:mah, PMID:12672490] NEW def: "A SWI/SNF-type complex that contains 8 to 14 proteins, including both conserved (core) and nonconserved components; contains the ATPase product of the yeast SNF2 or mammalian SMARCA4/BAF190A/BRG1 gene, or an ortholog thereof." [GOC:bm, PMID:12672490] synonym: "SWI-SNF complex" EXACT [GOC:mah] is_a: GO:0090544 ! BAF-type complex

id: GO:0035060 name: brahma complex OLD def: "A SWI/SNF-type complex that contains the ATPase product of the Drosophila brahma gene, or an ortholog thereof." [GOC:bf, PMID:10809665, PMID:12482982] NEW def: "A SWI/SNF-type complex that contains 8 to 14 proteins, including both conserved (core) and nonconserved components; contains the ATPase product of the Drosophila brm (brahma) or mammalian SMARCA2/BAF190B/BRM gene, or an ortholog thereof." [GOC:bm, PMID:10809665, PMID:12482982]

pgaudet commented 7 years ago

Hi @bmeldal @krchristie @ukemi
To make sure I understand how we deal with these: it looks like GO:0016514 SWI/SNF complex and GO:0035060 brahma complex are homologous, based on PMID:26601204

image
  1. The parent term, 'GO:0090544 BAF-type complex', is defined as 'A SWI/SNF-type complex that contains a subunit from the BAF (Brahma-Associated Factor) family.'. @krchristie: Is this appropriate for yeast ? I thought brahma was not in yeast.

    • There are 28 annotations to GO:0090544: 27 mouse and 1 human. Wouldn't SWI/SNF complex be more appropriate @krchristie ?
  2. If these terms are species-specific they should probably have taxon restrictions:

    • brahma complex: flies ?
    • nBAF complex, npBAF complex: metazoa
    • PBAF complex: mammals
  3. Should the parents of all these , 'GO:0070603 SWI/SNF superfamily-type complex', be flagged 'do not annotate? (only 3 annotations: 2 human, 1 mouse).

Thanks, Pascale

krchristie commented 7 years ago
  1. The parent term, 'GO:0090544 BAF-type complex', is defined as 'A SWI/SNF-type complex that contains a subunit from the BAF (Brahma-Associated Factor) family.'. @krchristie: Is this appropriate for yeast ? I thought brahma was not in yeast.

My understanding is that this name is used for the whole protein family, and that the Swi2/Snf2 ATPase present in the yeast complex is considered to be equivalent to Drosophila brahma, so I believe it is fine the way it is.

  1. There are 28 annotations to GO:0090544: 27 mouse and 1 human. Wouldn't SWI/SNF complex be more appropriate @krchristie ?

I wouldn't assume which BAF-type complex is appropriate without checking the papers.

  1. If these terms are species-specific they should probably have taxon restrictions:

    brahma complex: flies ? nBAF complex, npBAF complex: metazoa PBAF complex: mammals

I couldn't say if this is true without doing some work. Might be worth checking with the author of the paper you included the figure from to confirm this.

  1. Should the parents of all these , 'GO:0070603 SWI/SNF superfamily-type complex', be flagged 'do not annotate? (only 3 annotations: 2 human, 1 mouse).

I have no objection to marking this grouping term as 'do not annotate', seems like curators should be able to figure out which type of complex it is. I have checked the one mouse one and it's a bit bizarre, all of the other subunits are annotated to the slightly more specific term 'BAF-type complex', but they all could be moved to 'nBAF complex' [I can't do this until next week when one of the MGI software guys is back from vacation to fix my access to the MGI curation interface.].

ValWood commented 7 years ago

Some of these complexes just species specific complex names. For example, It seems that the mammalian PBAF complex is equivalent to the yeast RSC complex.

BAf complex is defined A SWI/SNF-type complex that contains a subunit from the BAF (Brahma-Associated Factor) family. BAF (Brahma-Associated Factor) is Rsc1 in yeast. Rsc1 is not a subunit of SWI/SNF but "canonical" SWI/SNF complex is under the is under this BAF (Brahma-Associated Factor) term.

I think most of these complexes are universally conserved (canonical SWI/SNF and RSC have a core overlapping set of subunits, but each have distinct subunits), but there is a big mix up here between universally conserved but distinct complexes and species specific names.

srengel commented 7 years ago

i may have lost the plot here...but we need to keep 'SWI/SNF complex' for yeast. we're keeping that, aren't we?

bmeldal commented 7 years ago

@srengel don't worry, nothing is getting obsoleted, just tidying up the defs and possibly some relationships so they make sense across all taxa that use these terms :)

@ValWood Yes, naming came from whichever species was annotated first and then it got messy. I propose to align the ontology as closely as possible to evolutionary history and update the defs so they fit all relevant taxa.

@krchristie @pgaudet

There are 28 annotations to GO:0090544: 27 mouse and 1 human. Wouldn't SWI/SNF complex be more appropriate?

  • I agree with Karen, you can't extrapolate to one of the children. Annotations may pre-date the child terms and all child complexes appear to exist in mouse... (I haven't checked any annotation dates)

GO:0016586 RSC complex Def: A protein complex similar to, but more abundant than, the Swi/Snf complex. The RSC complex is generally recruited to RNA polymerase III promoters and is specifically recruited to RNA polymerase II promoters by transcriptional activators and repressors; it is also involved in non-homologous end joining. PMID:11937489 PMID:12672490 PMID:15870268

GO:0070603 SWI/SNF superfamily-type complex A protein complex that contains an ortholog of the Saccharomyces ATPase Swi2/Snf2 as one of the core components and mediates assembly of nucleosomes, changes to the spacing or structure of nucleosomes, or some combination of those activities in a manner that requires ATP. PMID:16155938

GO:0090544 BAF-type complex -Def: A SWI/SNF-type complex that contains a subunit from the BAF (Brahma-Associated Factor) family. +Def: A SWI/SNF-type complex that contains an ATPase from the BAF (Brahma-Associated Factor) family. Includes the ATPase products of the yeast SNF2, drosophila Brahma or mammalian SMARCA2 or SMARCA4 genes, or any orthologs thereof.

Taxon restrictions:

I also have esBAF (embryonic stem cell-specific), bBAF (brain-specific) and mBAF (muscle-specific). - Should we have specific terms for these, too? The human and mouse complexes are being released this week. GO CC so far only GO:0090544 BAF-type complex.

I hope I commented on all aspects! Birgit

ValWood commented 7 years ago

Re RSC

missing any comment about its ATPase. It's needed to make sure annotator know which family they belong to.

The parent term SWI/SNF superfamily-type complex has contains an ortholog of the Saccharomyces ATPase Swi2/Snf2 so this is a prerequisite of all child complexes including RSC. The RSC complex def has the differentia.

These are the pombe SWI/SNF type complexes and their members:

Ino80 complex act1, alp5, arp42, arp5, arp8, iec1, iec3, iec5, ies2, ies4, ies6, ino80, nht1, pht1, rvb1, rvb2

RSC complex arp42, arp9, rsc1, rsc4, rsc58, rsc7, rsc9, sfh1, snf21, ssr1, ssr2, ssr3, ssr4

SWI/SNF complex arp42, arp9, snf22, snf30, snf5, snf59, sol1, ssr1, ssr2, ssr3, ssr4, tfg3

Swr1 complex act1, alp5, arp6, bdf1, msc1, pht1, rvb1, rvb2, swc2, swc3, swc4, swc5, swr1, vps71, yaf9

(I think these are conserved between pombe and cerevisiae pretty much 1:1, and except for Ino80 they seem to be conserved 1:1 with vertebrates)

bmeldal commented 7 years ago

The RSC complex def has the differentia. It has the differentia for the function (which promoters it targets) but is missing the differentia with regards of the type of ATPase (snf21).

krchristie commented 7 years ago

@bmeldal - Do you know if bBAF is distinct from nBAF?

Yesterday, I checked the paper for the mouse gene that Pascale pointed out has an annotation directly to the "SWI/SNF superfamily-type complex" term and bBAF has the same catalytic subunit as nBAF. Also, when I put "bBAF" into PubMed, the only paper I got back was the one for the mouse annotation, so I was thinking that bBAF might be an early characterization for the same complex as nBAF. I'd certainly be hesitant to make a term for bBAF if there is just one paper on it, but I'll check later if the rest of the bBAF subunits match nBAF.

bmeldal commented 7 years ago

Thanks, @krchristie . After many weeks of teasing out all the components I constructed this table: https://docs.google.com/spreadsheets/d/1IDfasqQBK6wIGMdvU5pqEiPBqaOFArnsUMsoT1Sy4B0/edit?usp=sharing and this notes doc: https://docs.google.com/document/d/1zxeLJ4BAEF4Oynrt06dVBFiw53ET28vnuBoj0N_f_nU/edit?usp=sharing But not after collating all info on post-its: https://twitter.com/complexportal/status/885064981603115008

I'm sure it's not entirely comprehensive but there was a point I had to stop and actually curate the complexes :-) A set of 31 human complexes and 31 mouse orthologs plus the yeast SWI/SNF complex will be released soon (end of the week, hopefully!). You can then find them all using the GO terms as search terms. The yeast RSC complex is already out.

And yes, bBAF and nBAF are different - unless you are right and nBAF succeeded the bBAF def but I didn't read anything saying "nBAF, formerly known as bBAF...". nBAF also exists in neuron-muscular neurons.

pgaudet commented 7 years ago

Hello,

add to Def: "A SWI/SNF-type complex that contains the ATPase product of the yeast STH1 or fission yeast SNF21 gene, or an ortholog thereof." - or The mammalian ortholog of the RSC complex appears to be pBAF (and the drome PBAF), according to PMID:15627498 http://www.sciencedirect.com/science/article/pii/S0167478104002349?via%3Dihub --> Does that warrant two terms or should GO:0016586 RSC complex (yeasts only) and GO:0035060 brahma complex (drome and mammals at least) be merged given that they have homologous ATPases, akin to GO:0016514 SWI/SNF complex?

Thanks, Pascale

ValWood commented 7 years ago

I'm not sure about RSC. Maybe it's OK as it is. We have all of the yeast complexes identified and annotated, so as long as the terms stay I'm happy. It seems a bit complicated!

pgaudet commented 7 years ago

@bmeldal @krchristie if you have nothing to add I'll close this.

Thanks, Pascale

bmeldal commented 7 years ago

I wonder why GO:0016586 RSC complex is_a GO:0070603 SWI/SNF superfamily-type complex and not its child GO:0090544 BAF-type complex, @ValWood ?

@pgaudet I'm not planning on asking for GO terms for all compositional variants, just the tissue and functional variants :) So, as we don't have mBAF, bBAF or esBAF shall I request those 3 new terms?

@pgaudet you can also add CL:0000540 ! neuron to GO:0071565 (nBAF complex).

Yes, the postits were a God-sent even though it caused much amusement among my colleagues. There was just too much info for making a classic spider diagram.

pgaudet commented 7 years ago

@pgaudet I'm not planning on asking for GO terms for all compositional variants, just the tissue and functional variants :) So, as we don't have mBAF, bBAF or esBAF shall I request those 3 new terms?

Yes, please create a new issue.

ValWood commented 7 years ago

Re

I wonder why GO:0016586 RSC complex is_a GO:0070603 SWI/SNF superfamily-type complex and not its child GO:0090544 BAF-type complex,

Definition (GO:0070603) A protein complex that contains an ortholog of the Saccharomyces ATPase Swi2/Snf2 as one of the catalytic subunit components (ATPase) and mediates assembly of nucleosomes, changes to the spacing or structure of nucleosomes, or some combination of those activities in a manner that requires ATP. PMID:16155938

BAF-type complex Definition (GO:0090544) A SWI/SNF-type complex that contains an ATPase from the BAF (Brahma-Associated Factor) family. Includes the ATPase products of the yeast SNF2, drosophila Brahma, mammalian SMARCA2 or SMARCA4 genes, or any orthologs thereof.

Rsc fits the definition of the SWI/SNF type super complex, because it contains Snf2 ortholog. Is Brahma the same thing as Swi2/Snf2? It seems that "brama associated factor" encompasses a number of unrelated proteins, so which is the ATPase subunit in this complex ? can you provide an InterPro family ID?

It seems odd to switch the ATPase differentia in the definitions between the parent and child terms. They should all be the same?

bmeldal commented 7 years ago

The defs for those two grouping terms are really confusing, esp with regards to BAF complexes. There is nothing to distinguish those two unless you look at the other children of SWI/SNF superfamily-type complex that have more specific defs. All BAFs have a SNF2 ortholog, Brahma being one...

Interpro entry: http://www.ebi.ac.uk/interpro/entry/IPR000330

I would argue RSC complex fits well into BAF-type complex.

ValWood commented 7 years ago

So the thing which differentiates Rsc from SWi-SNF in budding and fission yeast are these additional subunits which are not present in canonical SWI-SNF complex:

rsc1 (SPBC4B4.03) | RSC complex subunit Rsc1 (RSC1&2 in S. cerevisiae) rsc4 (SPBC1734.15) | RSC complex subunit Rsc4 (RSC4 in S. cerevisiae) rsc58 (SPAC1F3.07c) | RSC complex subunit Rsc58 (RSC58 in S. cerevisiae) rsc7 (SPCC1281.05) | RSC complex subunit Rsc7 (NPL6 in S. cervisiae) rsc9 (SPBC1703.02) | RSC complex subunit Rsc9 (RSC9 in S. cerevisiae) sfh1 (SPCC16A11.14) | RSC complex subunit Sfh1 (SFH1 in S. cerevisae)

This difference is not conveyed by the current def. The def could be extended to say Also includes at least one bromodomain, ARID DNA binding domain and and SNF5/SMARCB1/INI1 family member.

I don't think this is Baf-type though?

Rsc1 Bromodomain (fly polybromo human PBRM1) Rsc4 Bromodomain (fly polybromo human PBRM1) Rsc58 IPR013933 (fly inv and En, human EN1 and EN2) Rsc9 IPR016024 ARID DNA-binding domain/IPR016024 Armadillo-type fold (fly osa, retn, human ARID 4A,4B,5A,5B) Sfh1 IPR006939/IPR000679 (fly Snr1, human SMARCB1)

There are species specific expansion sin higher eukaryotes probably representing tissue-specific variants. Do you have a complex corresponding to RSC?

ValWood commented 7 years ago

OK, all of the complexes have a Interpro entry: http://www.ebi.ac.uk/interpro/entry/IPR000330

in which case there is no difference between BAF-type complex and SWI/SNF superfamily-type complex

they should be merged?

bmeldal commented 7 years ago

@ValWood in the paper cited earlier it states that the mammalian pBAF is the ortholog of the pombe RSC complex. There's already a term for pBAF complex in GO, as child of BAF-type complex. therefore I asked the question earlier, if those two should be merged...

If we merge BAF-type complex and SWI/SNF superfamily-type complex, what do we do with the current BAF-type children? Does anyone need the grouping term for BAF-types to separate them from the other SWI/SNF-type complexes? I guess there's a lot of history here and new terms were added here and there...

In mammalian complexes there are several constants: ARID-type protein ACTL6 member actin B SMARCB1/BAF47 SMARCC1/BAF155 and/or SMARCC2/BAF170 SMARCE1/BAF57 and at least one of SMARCD1/BAF60A, SMARCD2/BAF60B and (SMARCD3/BAF60C).

Do you propose to add any of these if they also have yeast, pombe, drome etc orthologs?

bmeldal commented 7 years ago

New terms requested in ticket #14143

ValWood commented 7 years ago

These are the SWI-SNF family complexes in yeast. The difference between SWI-SNF and RSC (apart from sometimes different sub-family members for the non-shared subunits) are that 1) RSC-type has bromo domains but canonical SWI-SNF doesn't 2) SWI-SNF has YEATs domain member but RSC does not

I don't think we are expecting many more SWI-SNF complexes in the 2 yeasts. Mainly because they are pretty well studied and the same complexes came out of pombe and cerevisiae despite being 500-1000 million years apart, and very close to the major eukaryotic split.

swi-snf

bmeldal commented 7 years ago

Yes, and the mammalian pBAF stands for poly-bromo-domain containing BAF. It has 1 extra subunit (PB1/BAF180) with lots of bromo domains. Otherwise its subunits are very similar to canonical BAF Snf2-type SWI/SNF and share the ATPase.

ValWood commented 7 years ago

OK then I agree they are the same.

Perhaps this could be RSC-type complex with And the necessity for a "bromodomain containing- protein would be the additional differetia for both.

"RSC-type complex" should really be the grouping term (with "pBAF-type complex" as an exact synonym), since this was first published in Cell in 1996 ...

RSC, an essential, abundant chromatin-remodeling complex. Cairns BR, Lorch Y, Li Y, Zhang M, Lacomis L, Erdjument-Bromage H, Tempst P, Du J, Laurent B, Kornberg RD. Cell. 1996 Dec 27;87(7):1249-60. PMID: 8980231

bmeldal commented 7 years ago

Then we are in agreement, @ValWood :)

Actions:

Is that everything?

ValWood commented 7 years ago

That's clearer. Sorry if that is what you were saying. I was getting confused by all of the unfamiliar names!

bmeldal commented 7 years ago

And v.v, as the pombe names have not all been carried over. Over to you, @pgaudet for editing.

ValWood commented 7 years ago

Note that it should be the S. cerevisiae names, where the complexes were first identified. The names are usually equivalent in pombe as we tried to align them. S. cerevisiae gene names are written in upper case so S. cerevisiae SNF2 etc. If you do proteins rather than genes its Snf2p. I can supply the names to fit the definitions. But it's only the 4 complexes above that we would use.

pgaudet commented 7 years ago
pgaudet commented 7 years ago

Hi @bmeldal GO:0090544 BAF-type complex has the following Subclasses:

image

Should these all be moved over to 'GO:0070603 SWI/SNF superfamily-type complex'?

Thanks, Pascale

pgaudet commented 7 years ago

Hello again, I have merged GO:0090544 BAF-type complex children into GO:0070603 SWI/SNF superfamily-type complex and GO:0070604 PBAF complex with GO:0016586 RSC complex , and added the new definition to that latter term. I couldn't quite figure out if I needed to change the hierarchy (see screenshot below):

image

@bmeldal let me know if you need other corrections.

Thanks for your help and patience, Pascale

bmeldal commented 7 years ago

O:0090544 BAF-type complex has the following Subclasses: Should these all be moved over to 'GO:0070603 SWI/SNF superfamily-type complex'?

No, they need to be moved to the old BAF-type complex Children: image

as they are not applicable to all of the other children (CB-WICH..., HD-type..., INO80-type..., ISWI-type...).

They are applicable to the RSC-type complex as that's been merged with pBAF and therefor falls into the same old category.

Thank you for your patience! These complexes are never easy!

pgaudet commented 7 years ago

Done

bmeldal commented 7 years ago

Hi @pgaudet this came as email notification but not visible here:

Thanks for the clarification. Just one inconsistent point: above you asked to merge GO:0070604 PBAF complex with GO:0016586 RSC complex and now you talk about RSC-type complex having been merged with pBAF - which one did you mean?

It's the RSC complex, I added the "-type" bit accidentally as the other terms were all ...-type... And yes, pBAF to be merged with RSC complex retaining RSC complex as term name (as per above).

Birgit

pgaudet commented 7 years ago

ok :) got it (I hope!)

pgaudet commented 7 years ago

It's the RSC complex, I added the "-type" bit accidentally as the other terms were all ...-type... And yes, pBAF to be merged with RSC complex retaining RSC complex as term name (as per above).

Right I noticed my mistake - I think all is sorted now. The changes have been merged.

Pascale

bmeldal commented 6 years ago

Thanks @pgaudet