geneontology / go-releases

Tasks and notes for monthly GO releases
0 stars 0 forks source link

Complex portal makes GO:protein-containing complexes for SGD complexes #59

Open pgaudet opened 7 months ago

pgaudet commented 7 months ago

This is not allowed according to GO rules (gorule-0000039, see http://snapshot.geneontology.org/reports/assigned-by-ComplexPortal-report.html#gorule-0000039)

Emailed Sandra.

pgaudet commented 7 months ago

I am writing because the GO rules pick up a violation coming from the Complex Portal annotations to yeast (SGD) complexes: http://snapshot.geneontology.org/reports/assigned-by-ComplexPortal-report.html#gorule-0000039 The rule is that CPX cannot be annotated to a GO protein complex term, since this is a mapping rather than an annotation. We had discussed this with Birgit before she left, but for some reason these are still being generated for SGD complexes, as listed in the page I linked above. Can these be removed from the files generated from Complex Portal?

suzialeksander commented 7 months ago

Note, SGD put in a fix to remove GO:0032991 & all children from the Complex Portal annotations ingested by SGD, but this does mean we exclude potentially useful annotations like "chromatin". The gorule might be slightly too inclusive but we are now filtering by the rule as-is.

pgaudet commented 7 months ago

Right - we are struggling to classify chromatin and ribosomes as protein-containing complexes or cellular anatomical entity.

pgaudet commented 6 months ago

There are still 12 annotations failing this rule @srengel

ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`SGD S000218024 CPX-548 located_in GO:0031262 PMID:15809444 IPI C NDC80 complex|Tid3 complex|Nuf2-Ndc80 complex|NDC80-NUF2-SPC24-SPC25 complex|2ftx|2FV4 protein_complex taxon:559292 20141016 ComplexPortal `
ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`SGD S000217935 CPX-1162 located_in GO:0005835 PMID:18725634 IPI C Fatty-acyl-CoA synthase|FAS|Fatty acid synthase complex|holo-[acyl-carrier-protein] synthase complex|acyl-CoA:malonyl-CoA C-acyltransferase (decarboxylating, oxoacyl- and enoyl- reducing)|2.3.1.86|2uv8|2vkz|3hmj|2pff|2.3.1.41|1.1.1.100|2.3.1.38|2.3.1.39|4.2.1.59|1.3.1.9|3.1.2.14 protein_complex taxon:559292 20191007 ComplexPortal `
ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`SGD S000217937 CPX-1186 located_in GO:0000818 PMID:20723757 IPI C Kinetochore MIS12 complex|MIND complex|Mtw1p including Nnf1p-Nsl1p-Dsn1p complex|MTW1 complex protein_complex taxon:559292 20161031 ComplexPortal `
ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`SGD S000217970 CPX-1648 located_in GO:0071004 PMID:22314233 IPI C SF3A complex|4dgw protein_complex taxon:559292 20151202 ComplexPortal `
ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`SGD S000218057 CPX-1670 located_in GO:1905347 PMID:15590332 IPI C MUS81-MMS4 structure-specific endonuclease complex|MUS81-MMS4 endonuclease complex|Mms4-Mus81 endonuclease complex|3.1.22|Deoxyribonuclease complex MUS81-MMS4 protein_complex taxon:559292 20160804 ComplexPortal `
ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`SGD S000217765 CPX-1865 located_in GO:0008287 PMID:19749176 IPI C SIT4-SAP185 phosphatase complex|SIT4 holoenzyme complex|3.1.3.16 protein_complex taxon:559292 20200902 ComplexPortal `
ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`SGD S000217790 CPX-1895 located_in GO:0005849 PMID:22026644 IPI C mRNA cleavage factor complex CFIA|Cleavage factor IA|CF IA|2npi|2l9b protein_complex taxon:559292 20201026 ComplexPortal `
ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`SGD S000218127 CPX-2910 located_in GO:1990302 PMID:19531475 IPI C BRE1-RAD6 ubiquitin ligase complex|BRE1-RAD6 complex|BRE1-UBC2 complex|BRE1-RAD6/UBC2 complex|BRE1-UBC2 ubiquitin ligase complex|BRE1-RAD6/UBC2 ubiquitin ligase complex|2.3.2.27|2.3.2.23 protein_complex taxon:559292 20141016 ComplexPortal `
ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`SGD S000218016 CPX-3139 located_in GO:0033062 PMID:22020281 IPI C RAD55-RAD57 complex|Rhp55-Rhp57 complex protein_complex taxon:559292 20141112 ComplexPortal `
ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`ComplexPortal CPX-1844 pp4_human-2b part_of GO:0000785 PMID:18614045 IDA C PPP4C-PPP4R2-PPP4R3B protein phosphatase 4 complex PPP4C-PPP4R2-PPP4R3B PP4 complex|PPP4C:PPP4R2:SMEK2 protein_complex taxon:9606 20180307 ComplexPortal `
ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`ComplexPortal CPX-6263 fa_human part_of GO:0000785 PMID:22343915 IDA C Fanconi anemia ubiquitin ligase complex Fanconi anaemia nuclear complex|FA complex|2xFAAP100:FAAP20:FANCA:2xFANCB:FANCC:FANCE:FANCF:FANCG:2xFANCL protein_complex taxon:9606 20211013 ComplexPortal `
ERROR - Violates GO Rule:GORULE:0000039: Protein complexes can not be annotated to GO:0032991 (protein-containing complex) or its descendants--`ComplexPortal CPX-6266 fancm-faap24-1 part_of GO:0000785 PMID:20347429 IDA C Fanconi anemia FANCM-FAAP24-MHF anchoring complex 2xCENPS:2xCENPX:FAAP24:FANCM protein_complex taxon:9606 20211013 ComplexPortal `
srengel commented 6 months ago

ugh that's weird, not sure why these are still slipping through.... ??

they are all source=ComplexPortal. only 9 of them are annotations to yeast complexes, the others are human (SGD can't do anything about those 3).

srengel commented 5 months ago

I took a look at these 12 annotations. they are not coming from SGD, which we can see becuase:

  1. they are in the GO report entitled: "GORULE violations assigned by ComplexPortal"
  2. they are not in the GPAD that SGD produces.

therefore we are unassigning SGD from this ticket.

what was the response from CP when they were contacted? i don't see their response here in this ticket.

pgaudet commented 5 months ago

Here's the email exchange: (in reverse-chronological order)


Hi

If they were coming from us, there would be ~25 other species represented in the set.

Sandra

On 29/11/2023 08:36, Pascale Gaudet wrote: Thanks for the clarification! We assumed they were coming from you. Stacia, can you check the SGD pipeline?

Best wishes,

Pascale


From: Sandra Orchard orchard@ebi.ac.uk Date: Wednesday, November 29, 2023 at 9:35 AM To: Pascale Gaudet Pascale.Gaudet@sib.swiss Cc: Stacia R Engel stacia@stanford.edu Subject: Re: SGD complex annotations to GO complexes

Hi Pascale

We don't need to export them to GO but we still want to make the mapping as this is the way we group orthologous complexes and enable people to search for them across species.

I assume, if you are not receiving them from the Complex Portal, Birgit already stopped our export - I wasn't involved in the conversation but can find out. If they are only coming from SGD, you will have to ask SGD to stop the export from their end. Or we can work on their import so they don't appear in SGD either. Your call, SGD.

Sandra

On 29/11/2023 08:29, Pascale Gaudet wrote: Hi Sandra,

How are you? I am writing because the GO rules pick up a violation coming from the Complex Portal annotations to yeast (SGD) complexes:

http://snapshot.geneontology.org/reports/assigned-by-ComplexPortal-report.html#gorule-0000039

The rule is that CPX cannot be annotated to a GO protein complex term, since this is a mapping rather than an annotation. We had discussed this with Birgit before she left, but for some reason these are still being generated for SGD complexes, as listed in the page I linked above.

Can these be removed from the files generated from Complex Portal?

Thanks, Pascale