geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
223 stars 40 forks source link

Rationalise 'respiratory chain complex' terms #27253

Closed sjm41 closed 4 months ago

sjm41 commented 8 months ago

Following the similar efforts to rationalise the alpha-ketoglutarate dehydrogenase complexes and TCA cycle complexes (see #26122 and #26882), I propose we do something similar for respiratory chain complexes. We currently have separate terms for each of these (sub)complexes for eukaryotes (labeled as 'mitochondrial X complex' and sometimes also 'chloroplast X complex') and for bacteria (labeled as 'plasma membrane X complex'). And then we have orthogonal grouping systems based on the complex type or the subcellular location ('X respirasome'). This all results in many interconnected terms and an overly complicated tree. These complexes are homologous across species, so I think we can simplify these terms and make things easier for curators/users.

Currently, we have complexes I, II, III and IV grouped under 'respiratory chain complex' like this:

respiratory chain complex
    |__fumarate reductase complex
    |   |__mitochondrial respiratory chain complex II, succinate dehydrogenase complex (ubiquinone)
    |   |__plasma membrane fumarate reductase complex
    |__mitochondrial respiratory chain complex I, membrane segment
    |__mitochondrial respiratory chain complex I, peripheral segment
    |__NarGHI complex
    |__respiratory chain complex I
    |   |__mitochondrial respiratory chain complex I
    |   |    |__mitochondrial respiratory chain complex I, membrane segment
    |   |    |__mitochondrial respiratory chain complex I, peripheral segment
    |   |__plasma membrane respiratory chain complex I
    |__respiratory chain complex II
    |   |__fumarate reductase complex
    |   |   |__mitochondrial respiratory chain complex II, succinate dehydrogenase complex (ubiquinone)
    |   |   |__plasma membrane fumarate reductase complex
    |   |__plasma membrane respiratory chain complex II
    |   |   |__plasma membrane fumarate reductase complex
    |   |   |__plasma membrane succinate dehydrogenase complex
    |   |       |__plasma membrane succinate dehydrogenase complex (ubiquinone)
    |   |__succinate dehydrogenase complex
    |       |__plasma membrane succinate dehydrogenase complex
    |       |   |__plasma membrane succinate dehydrogenase complex (ubiquinone)
    |       |__succinate dehydrogenase complex (ubiquinone)
    |           |__mitochondrial respiratory chain complex II, succinate dehydrogenase complex (ubiquinone)
    |           |__plasma membrane succinate dehydrogenase complex (ubiquinone)
    |__respiratory chain complex III
    |   |__mitochondrial respiratory chain complex III
    |   |__plasma membrane respiratory chain complex III
    |__respiratory chain complex IV
    |   |__mitochondrial respiratory chain complex IV
    |   |__plasma membrane respiratory chain complex IV
    |__succinate dehydrogenase complex
        |__plasma membrane succinate dehydrogenase complex
            |__plasma membrane succinate dehydrogenase complex (ubiquinone)
        |__succinate dehydrogenase complex (ubiquinone)
            |__mitochondrial respiratory chain complex II, succinate dehydrogenase complex (ubiquinone)
            |__plasma membrane succinate dehydrogenase complex (ubiquinone)

And then we have a (completely separate?) tree for complex V under 'proton-transporting ATP synthase complex':

|__proton-transporting ATP synthase complex
    |__chloroplast proton-transporting ATP synthase complex
    |   |__chloroplast proton-transporting ATP synthase complex, catalytic core CF(1)
    |   |__chloroplast proton-transporting ATP synthase complex, coupling factor CF(o)
    |__mitochondrial proton-transporting ATP synthase complex
    |   |__mitochondrial proton-transporting ATP synthase complex, catalytic sector F(1)
    |   |   |__mitochondrial proton-transporting ATP synthase, catalytic core
    |   |   |__mitochondrial proton-transporting ATP synthase, central stalk
    |   |__mitochondrial proton-transporting ATP synthase complex, coupling factor F(o)
    |       |__mitochondrial proton-transporting ATP synthase, stator stalk
    |__plasma membrane proton-transporting ATP synthase complex
    |   |__plasma membrane proton-transporting ATP synthase complex, catalytic core F(1)
    |   |   |__plasma membrane proton-transporting ATP synthase, catalytic core
    |   |   |__plasma membrane proton-transporting ATP synthase, central stalk
    |   |__plasma membrane proton-transporting ATP synthase complex, coupling factor F(o)
    |       |__plasma membrane proton-transporting ATP synthase, stator stalk
    |__proton-transporting ATP synthase complex, catalytic core F(1)
    |   |__chloroplast proton-transporting ATP synthase complex, catalytic core CF(1)
    |   |__mitochondrial proton-transporting ATP synthase complex, catalytic sector F(1)
    |   |   |__mitochondrial proton-transporting ATP synthase, catalytic core
    |   |   |__mitochondrial proton-transporting ATP synthase, central stalk
    |   |__plasma membrane proton-transporting ATP synthase complex, catalytic core F(1)
    |   |   |__plasma membrane proton-transporting ATP synthase, catalytic core
    |   |   |__plasma membrane proton-transporting ATP synthase, central stalk
    |   |__proton-transporting ATP synthase, catalytic core
    |   |   |__mitochondrial proton-transporting ATP synthase, catalytic core
    |   |   |__plasma membrane proton-transporting ATP synthase, catalytic core
    |   |__proton-transporting ATP synthase, central stalk
    |       |__mitochondrial proton-transporting ATP synthase, central stalk
    |       |__plasma membrane proton-transporting ATP synthase, central stalk
    |__proton-transporting ATP synthase complex, coupling factor F(o)
        |__chloroplast proton-transporting ATP synthase complex, coupling factor CF(o)
        |__mitochondrial proton-transporting ATP synthase complex, coupling factor F(o)
        |   |__mitochondrial proton-transporting ATP synthase, stator stalk
        |__plasma membrane proton-transporting ATP synthase complex, coupling factor F(o)
        |   |__plasma membrane proton-transporting ATP synthase, stator stalk
        |__proton-transporting ATP synthase, stator stalk
            |__mitochondrial proton-transporting ATP synthase, stator stalk
            |__plasma membrane proton-transporting ATP synthase, stator stalk

I suggest we greatly simply all of the above to this:

respiratory chain complex
|__NarGHI complex*
|__fumarate reductase complex*
|__respiratory chain complex I
    |__ mitochondrial respiratory chain complex I, membrane segment
    |__mitochondrial chain complex I, peripheral segment
|__respiratory chain complex II (succinate dehydrogenase)
|__respiratory chain complex III
|__respiratory chain complex IV
|__proton-transporting ATP synthase complex
    |__proton-transporting ATP synthase complex, catalytic core F(1)
    |   |__proton-transporting ATP synthase, catalytic core
    |   |__proton-transporting ATP synthase, central stalk
    |__proton-transporting ATP synthase complex, coupling factor F(o)
        |__proton-transporting ATP synthase, stator stalk

I'm assigning this to you Raymond, since I know you enjoyed working on the related tickets ;-). If the proposals look good to folks, I can provide a more useful list of the individual terms to obsolete and their 'replace by' terms.

sjm41 commented 8 months ago

These wikipedia links are useful: https://en.wikipedia.org/wiki/Electron_transport_chain https://en.wikipedia.org/wiki/Oxidative_phosphorylation

raymond91125 commented 8 months ago

Noticed an incorrect relationship GO:0098803 respiratory chain complex PART_OF GO:0070469 respirasome PMID:23828195 "Most of Complex II was found in a free, non-associated form in plant as well as mammalian mitochondria, while only a small proportion associated with supercomplex I/III/IV (Eubel et al., 2003; Acín-Pérez et al., 2008; Muster et al., 2010) (Fig. 1A). While Complex V as dimer co-migrates with other supercomplexes but rarely as part of supercomplexes. " It seems that our definition of respirasome is more expansive than that of wikipedia:Supercomplex or the literature, see https://github.com/geneontology/go-ontology/issues/12846.

raymond91125 commented 6 months ago

As it stands, all children of GO:0098803 respiratory chain complex are inferred. I added a relationship: proton-transporting ATP synthase complex part_of repirasome. The resulting hierarchy looks like: Screenshot from 2024-05-15 17-27-12

@sjm41 Is this good enough, albeit not as succinct as you typed out?

sjm41 commented 6 months ago

Hi @raymond91125 Thanks for working on this. My main proposal here is to obsolete all of the mitochondrial/chloroplast/bacterial ('plasma membrane')-specific complexes that are included under the headings shown in your Protege screenshot. I'm not sure if you've done/considered that suggestion?

raymond91125 commented 6 months ago

My main proposal here is to obsolete all of the mitochondrial/chloroplast/bacterial ('plasma membrane')-specific complexes that are included under the headings shown in your Protege screenshot. I'm not sure if you've done/considered that suggestion?

I see. So your proposal is to merge taxon-restricted terms up to the taxon-general parents. Many of these terms carry the reference GOC:mtg_sensu. I wonder if we should make sure there is a consensus to undo taxon considerations on these. Thanks.

sjm41 commented 6 months ago

Yep. Definitely be good to check consensus on this. There are 2 thumbs-up on my original comment, but we could bring this up at the next editor's meeting to check wider consensus, and then also see if we feel further outreach is required before making a final decision. Thanks.

pgaudet commented 5 months ago

I think we have decided to not create sensu-specific terms unless we describe widely different CC or BPs. In this case the merge seems correct.

Pinging @cmungall and @thomaspd for confirmation.

Thanks, Pascale

cmungall commented 5 months ago

The proposed merge is valid but would lead to incomplete results - for eukaryotes the complex would no longer be found under mitochondrion. For gene searches this may be less important since most genes are likely independently annotated to the mitochondrial membrane, and we could have a rule to check for this, but this is complex. We could also have a taxon GCI for this but this is also complex.

(analogous issues of course for proks)

I appreciate the need to simplify things for curators, but we should just filter out taxonomically irrelevant terms in the curation UI. We need to balance the proposed simplifications for us with effects on our users.

sjm41 commented 5 months ago

for eukaryotes the complex would no longer be found under mitochondrion. For gene searches this may be less important since most genes are likely independently annotated to the mitochondrial membrane, and we could have a rule to check for this, but this is complex.

Yep, that's true. I guess we have to decide whether it's more useful to users & curators to simplify the current terms/connections as proposed (~70 terms/branches reduced to ~15), or if it's better to keep the current situation to allow direct, taxon-specific annotations to mitochondrial/chloroplast/bacterial versions of complexes.

I guess that's a general question that is just exemplified by this ticket....

pgaudet commented 5 months ago

Maybe one way to assess this is to see which terms are being used. Out of the current set of 70, only 15 are being used (all evidence codes included):

If we do keep the eukaryotes versus prokaryote distinction, we should make the parent 'do not annotate' and have taxon constraints on the children, shouldn't we?

cmungall commented 5 months ago

Yep, that's true. I guess we have to decide whether it's more useful to users & curators to simplify the current terms/connections as proposed (~70 terms/branches reduced to ~15), or if it's better to keep the current situation to allow direct, taxon-specific annotations to mitochondrial/chloroplast/bacterial versions of complexes

For curator complexity, do we anticipate new annotations to these well-studied complexes

user complexity - I agree, I dislike taxonomic lattices, but overall I suspect most non-bioinformatics users don't care about the graph display and having complete gene sets for terms like MM is more important (it would be good to proactively survey our users about this though).

If we do keep the eukaryotes versus prokaryote distinction, we should make the parent 'do not annotate' and have taxon constraints on the children, shouldn't we?

Yes and yes, this should the standard pattern when we create taxonomically variable subclasses to accommodate different relationships

pgaudet commented 5 months ago

Ontology call:

@cmungall best solution is the simplification, in conjunction with rules that would make sure that the correct cellular anatomical entity is also annotated, in cases where this can be determined (as in this case).

raymond91125 commented 5 months ago

@ValWood These terms respiratory chain complex I, mitochondrial respiratory chain complex I have a taxon constraint never_in_taxon 'Schizosaccharomyces pombe' and 'Saccharomyces cerevisiae'. Is that an error?

raymond91125 commented 5 months ago
raymond91125 commented 5 months ago

@sjm41 Should all 3 terms merge: succinate dehydrogenase complex (ubiquinone), succinate dehydrogenase complex, and respiratory chain complex II?

sjm41 commented 5 months ago

@sjm41 Should all 3 terms merge: succinate dehydrogenase complex (ubiquinone), succinate dehydrogenase complex, and respiratory chain complex II?

Yes, I believe so.

sjm41 commented 5 months ago

Below, I've tried to list the full set of proposed obsoletions/replace_by. We may want to review the definitions of the remaining terms, especially for complex II.

==

OBSOLETE:

REPLACED BY: respiratory chain complex I (GO:0045271)

==

OBSOLETE:

REPLACED BY: respiratory chain complex II (GO:0045273) (and rename to 'respiratory chain complex II (succinate dehydrogenase)'?)

==

OBSOLETE:

REPLACED BY: fumarate reductase complex (GO:0045283)

==

OBSOLETE:

REPLACED BY: respiratory chain complex III (GO:0045275)

==

OBSOLETE:

REPLACED BY: respiratory chain complex IV (GO:0045277)

==

OBSOLETE:

REPLACED BY: proton-transporting ATP synthase complex (GO:0045259)

==

OBSOLETE:

REPLACED BY: proton-transporting ATP synthase complex, catalytic core F(1) (GO:0045261)

==

OBSOLETE:

REPLACED BY: proton-transporting ATP synthase, catalytic core (GO:0045267)

==

OBSOLETE:

REPLACED BY: proton-transporting ATP synthase, central stalk (GO:0045269)

==

OBSOLETE:

REPLACED BY: proton-transporting ATP synthase complex, coupling factor F(o) (GO:0045263)

==

OBSOLETE:

REPLACED BY: proton-transporting ATP synthase, stator stalk (GO:0045265)

==

OBSOLETE:

REPLACED BY: respiratory chain complex (GO:0098803)

==

ValWood commented 5 months ago

Hi Raymond, no that is correct. Weirdly some (but not all) yeast have no complex I - they use a single unrelated NADH dehydrogenase. The taxon restriction is added because we constantly get mapping errors. v

sjm41 commented 5 months ago

One more merge (I've added it to the list above):

id: GO:0045320 [0 annotations] name: chloroplast proton-transporting ATP synthase complex namespace: cellular_component def: "A proton-transporting ATP synthase complex found in the chloroplast thylakoid membrane; it catalyzes the phosphorylation of ADP to ATP during photo-phosphorylation." [GOC:mtg_sensu, GOC:pj, ISBN:0716743663] synonym: "chloroplast hydrogen-translocating F-type ATPase complex" EXACT [] synonym: "chloroplast proton-transporting F-type ATPase complex" EXACT [] synonym: "hydrogen-translocating F-type ATPase complex" BROAD [] intersection_of: GO:0045259 ! proton-transporting ATP synthase complex intersection_of: part_of GO:0009507 ! chloroplast relationship: part_of GO:0009535 ! chloroplast thylakoid membrane

id: GO:0009544 [3 annotations] name: chloroplast ATP synthase complex namespace: cellular_component def: "The protein complex that catalyzes the phosphorylation of ADP to ATP in chloroplasts." [ISBN:0198547684] is_a: GO:0032991 ! protein-containing complex relationship: part_of GO:0009507 ! chloroplast relationship: part_of GO:0009535 ! chloroplast thylakoid membrane relationship: part_of GO:0009579 ! thylakoid relationship: part_of GO:0016020 ! membrane

doughowe commented 5 months ago

ZFIN done..I updated google sheet to reflect this as well.

pgaudet commented 5 months ago

Fixed some relations to address issue reported by @alexsign :

GO:0045282 GO:0045258 P but they already is_a in the other direction

also affects: GO:0045281, GO:0045257, GO:0045273, GO:0045274

raymond91125 commented 5 months ago

@sjm41 I've completed the list of tasks and simplified the hierarchy. Please let me know if you spot any error. Thanks.

dsiegele commented 5 months ago

I think there are additional GO terms that could be included as children of 'respiratory chain complex.'

Although the respiratory chain in mitochondria is derived from a bacterial ancestor, not all bacteria capable of respiration utilize the 'canonical' respiratory chain of complexes I-IV. The example I am most familiar with is E. coli which uses different respiratory chains to capture energy released by catabolic pathways depending upon the compound serving as the energy source and the available terminal electron acceptor.

When growing in the presence of O2 and utilizing an energy source whose oxidation yields NADH, E. coli will utilize one or more of the following respiratory chains.

NADH to cytochrome bo oxidase electron transfer I NADH -> NDH-1 -> ubiquinone -> cytochrome bo3 ubiquinol:oxygen oxidoreductase (GO:0009319)

NDH-1 is NADH:ubiquinone oxidoreductase I (H+ transporting). NDH-1 is homologous to complex I so I think it would be annotated to GO:0045271.

NADH to cytochrome bo oxidase electron transfer II NADH -> NDH-2 -> ubiquinone -> cytochrome bo3 ubiquinol:oxygen oxidoreductase (GO:0009319)

NDH-2 is NADH:ubiquinone oxidoreductase II (doesn't translocate protons). NDH-2 is a homodimer so I think it would be annotated to GO:0030964.

NADH to cytochrome bd oxidase electron transfer I NADH -> NDH-1 -> ubiquinone -> cytochrome bd-1 ubiquinol:oxygen oxidoreductase

NADH to cytochrome bd oxidase electron transfer II NADH -> NDH-2 -> ubiquinone -> cytochrome bd-1 ubiquinol:oxygen oxidoreductase

I will add information about the anaerobic respiratory chains in another post.

dsiegele commented 5 months ago

Reply to @sjm41 @raymond91125 about fumarate reductase

Yes, E. coli quinol:fumarate oxidoreductase complex aka fumarate reductase primarily functions to reduce fumarate to succinate. It is essential for anaerobic growth on glycerol, lactate, or formate when fumarate serves as the terminal electron acceptor. So GO:0045283 fumarate reductase complex should remain a child of GO:0098803 respiratory chain complex.

FYI: Not relevant to GO but you might like to know that "The functions of QFR and SQR are partially interchangeable - a plasmid containing the frd genes is able to compensate for the growth deficiency of an sdh mutant [Guest81] while anaerobic expression of succinate dehydrogenase supports the growth of an frd mutant [Maklashina98].

sjm41 commented 4 months ago

Thanks a lot @raymond91125 ! Looks good to me. As proposed, here's what we have now:

respiratory chain complex
|__NarGHI complex
|__fumarate reductase complex
|__respiratory chain complex I
    |__ mitochondrial respiratory chain complex I, membrane segment
    |__mitochondrial chain complex I, peripheral segment
|__respiratory chain complex II (succinate dehydrogenase)
|__respiratory chain complex III
|__respiratory chain complex IV
|__proton-transporting ATP synthase complex
    |__proton-transporting ATP synthase complex, catalytic core F(1)
    |   |__proton-transporting ATP synthase, catalytic core
    |   |__proton-transporting ATP synthase, central stalk
    |__proton-transporting ATP synthase complex, coupling factor F(o)
        |__proton-transporting ATP synthase, stator stalk

I notice that the two children (subcomplexes) of "respiratory chain complex I" are the only remaining miso-specific terms - "mitochondrial respiratory chain complex I, membrane segment" has been used in 0 annotations, and "mitochondrial chain complex I, peripheral segment" has been used in 1 annotation.

sjm41 commented 4 months ago

Thanks for the feedback @dsiegele !

ValWood commented 4 months ago

Hi @sjm41 will you also consider obsoleting/,merging

GO:0010257 | NADH dehydrogenase complex assembly

I think this refers to Complex I assembly. Yeast have PAINT annotations to this term, but we don't have complex I and our dehydrogenase is single subunit

raymond91125 commented 4 months ago
  • So I propose those two annotations are merged into their parent 'respiratory chain complex I'.

@sjm41 We could also make these two terms generic, since the membrane/peripheral domain structures are conserved, per PMID:19732833, PMID:32058886. respiratory chain complex I, membrane segment respiratory chain complex I, peripheral segment (Those are action items above that I missed.)

raymond91125 commented 4 months ago

@ValWood How about adding a 'never in taxon' Schizosaccharomyces pombe and Saccharomyces cerevisiae, on GO:0030964 NADH dehydrogenase complex, which is used in LD of GO:0010257 | NADH dehydrogenase complex assembly.

raymond91125 commented 4 months ago

For reference, an illustration-rich article describing mito NADH dehydrogenases https://www.degruyter.com/document/doi/10.1515/hsz-2020-0254/html.

ValWood commented 4 months ago

Hi, @raymond91125, there are NADH dehydrogenases, but are they monomers or complexes? In GO currently GO:0030964 NADH dehydrogenase complex has exact synonym Complex I | exact so are respiratory complex I and NADH dehydrogenase complex separable?

sjm41 commented 4 months ago

@sjm41 We could also make these two terms generic, since the membrane/peripheral domain structures are conserved, per PMID:19732833, PMID:32058886. respiratory chain complex I, membrane segment respiratory chain complex I, peripheral segment (Those are action items above that I missed.)

Looking at this now, I'd still recommend merging these two into the parent. As I said, they have only ever been used in a single annotation, and moreover, the guidelines say that subcomplexes are out of scope for GO (https://wiki.geneontology.org/Protein_complexes#Out_of_scope)

sjm41 commented 4 months ago

In GO currently GO:0030964 NADH dehydrogenase complex has exact synonym Complex I | exact so are respiratory complex I and NADH dehydrogenase complex separable?

As noted above, GO:0030964 is a grouping term for NADH complexes involved in either respiration or photosynthesis. And as Debbie said, the 'Complex I' synonym on it should be described as a narrow synonym instead of an exact synonym (or better just removed, given that Complex I is a child).

ValWood commented 4 months ago

In that case, if it's a grouping term for complex I the existing taxon restriction can move up from complex I.

raymond91125 commented 4 months ago

@sjm41 Is it all done except NTR which best starts a new ticket? Thanks all!

sjm41 commented 4 months ago

Closing. Thanks for all your work on this @raymond91125

raymond91125 commented 4 months ago

Thanks, @sjm41