geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

merge fatty-acyl-CoA synthase activity (is a process) #8102

Closed gocentral closed 8 years ago

gocentral commented 13 years ago

This is a compound function term. EC 4.2.1.61 : IUBMB EC 2.3.1.39 : IUBMB EC 2.3.1.86 : IUBMB EC 2.3.1.38 : IUBMB EC 1.3.1.9 : IUBMB

Perhaps oit should merge with fatty acid biosynthetic process?

Reported by: ValWood

Original Ticket: geneontology/ontology-requests/7887

gocentral commented 13 years ago

Cindy Krieger is busy at a workshop this week. She should comment before any action is taken on this.

-Karen

Original comment by: krchristie

gocentral commented 13 years ago

This is exclusively a "process" in most bacteria, where the enzymatic activities are all in separate, discrete enzymes. However, in mammals, a single polypeptide encodes all of the enzymatic activities (the enzyme is a homodimer, but only one active subunit is required for activity), so there really is a single protein that has the function of complete fatty-acyl-CoA synthase. I'd suggest that we keep it as it is, so that more-complex organisms can me properly annotated.

Original comment by: diannafisk

gocentral commented 13 years ago

Er, doesn't that mean it should be a process? The compound term is just a gene product specific function?

Val

Original comment by: ValWood

gocentral commented 13 years ago

The EC number 2.3.1.86 actually specifies the function of fatty-acyl-CoA synthase, and is the equivalent of Molecular Function (GO:0004312). As written its a complete synthesis of fatty-acyl-CoA, not just one of the enzymatic reactions. I don't know why they've written "yeast fatty acid synthase" as an alternative name, since yeasts do the S. cerevisiae complex has subunits with different functions. It would be more appropriate to have "mammalian fatty acid synthase" as an alternative name.

Do we necessarily break down functions into individual reactions? So you can't have a function that involves more than one chemical reaction? This seems overly reductionist.

Original comment by: diannafisk

gocentral commented 13 years ago

I thought that the individual activities were represented and that a group of terms is a process (even if performed by a single molecule). Maybe I'm wrong, Amelia will know....

val

Original comment by: ValWood

gocentral commented 13 years ago

Val is correct - a series of reactions is a process, regardless of how many gene products are involved in it. A function is a step in the process, and if you find your function is a combination of several other functions, you are either representing a process (as this FA synthesis reaction would be), or you are trying to incorporate gene product or protein class information into the ontology (e.g. if you had a term 'actin activity', which combined ATPase activity and protein binding - you're trying to encode information about the actin protein in the ontology).

Original comment by: girlwithglasses

gocentral commented 13 years ago

I would leave fatty-acyl-CoA synthase activity (GO:0004321) as a MF term separate from the BP term for fa biosynthetic process.

Yes, fatty-acyl-CoA synthase activity (GO:0004321) is an enzymatic activity comprised of many different reactions (EC 2.3.1.38 [acyl-carrier-protein] S-acetyltransferase, EC 2.3.1.39 [acyl-carrier-protein] S-malonyltransferase, EC 2.3.1.41 3-oxoacyl-[acyl-carrier-protein] synthase, EC 1.1.1.100 3-oxoacyl-[acyl-carrier-protein] reductase, EC 1.1.1.279, (R)-3-hydroxyacid ester dehydrogenase, EC 4.2.1.61 3-hydroxypalmitoyl-[acyl-carrier-protein] dehydratase and EC 1.3.1.9 enoyl-[acyl-carrier-protein] reductase (NADH)). However, as Dianna mentioned, animals and yeast fatty acid syntheses (FAS) are multicomplex enzymes that contain all of the catalytic activities to catalyze the overall reaction on either one polypeptide as in mammals or two different polypeptides as in yeast.

Thus, I think you should keep the overall fatty-acyl-CoA synthase activity as a molecular function activity for all type 1 FAS systems (animal, fungal, some bacterial) b/c they catalyze this overall activity. Animal FAS systems can be directly curated to this term, and genes from the yeast FAS systems would only be annotated to it with a contributes to since it only catalyzes a subset of the reactions.

Type 2 FAS systems, which belong to plants, bacteria, and mitochondria, are comprised of small independent enzymes that usually catalyze one step of the multi-step cyclic reaction. I think that the genes from type 2 FAS could still be annotated to the MF term fatty-acyl-CoA synthase activity with a contributes to, but I don't feel as strongly. Hope this is clear.

Original comment by: cjkrieger

gocentral commented 13 years ago

The more accepted, and least redundant way to deal with this would be to annotatate the individual enzyme activities, and to capture the information about the complex with a complex annotation. With the existing arrangement groups can easily use the "fatty acid synthase" and omit the contributing activities, or vice versa, leading to annotation consistency.

Using contributes_to with the existing terms would also be another slightly different use of contributes_to than the already existing uses, so we should avoid this.

The decomposing of "fatty acid synthase" is analogous to the recent decomposing of the transcription factors to represent their elemental activites. The existing arrangement isn't ontologically correct. For instance, the FAS activity has a bunch of children, none of which are the enzyme activities above. I haven't looked at these children closely, but they may also be processes. If it is warranted, specific process terms which describe the 2 FAS systems should be created.

Whatever we do, it should match exactly the existing practices employed elsewhere in the ontology....

Original comment by: ValWood

gocentral commented 13 years ago

I do not agree that it is accepted practice that this type of situation should be dealt with by annotating to each of the individual activities. That idea represents the opinion of some, but definitely not of all. Please also look at the older SF item Cindy started for "fatty acid synthase activity": GO term changes related to Fatty acid synthesis https://sourceforge.net/tracker/?func=detail&aid=3011267&group\_id=36855&atid=440764

Personally, I think that it would be a real mistake to put eukaryotic annotators into the situation that when the find a paper that shows that an enzyme has "fatty acid synthase activity" (or "fatty-acyl-CoA synthase activity" which is exactly the same situation), which is a quite reasonable thing to assay, and the paper may not make any mention of the individual steps which can be distinguished if you mutate the gene or if you are able to halt the reaction by some other means, that they must go find which multiple different steps are part of fatty acid synthase activity. In addition, as a single gene product that contains all of these component parts and has catalytic activity, I would like to be able to annotate the fatty acid synthase gene to a single term that is a descendent of "catalytic activity", and not have to go dig around to figure out what 5 or 6 individual functions this corresponds to. As Dianna and Cindy have already said, in mammals, or even in yeast where the gene products split up the activities slightly differently but where the active entity is also a complex that performs "fatty acid synthase activity", the individual steps are not separable in these organisms, they occur as a unit.

Since you've invoked the transcription reorganization and suggested that we should do the same thing, I'll say what I think would be analogous. The term "sequence specific DNA binding transcription factor activity" has a has_part relationship to another function term that represents a single portion of its overall function, i.e. "RNA polymerase II regulatory transcription factor site sequence-specific DNA binding". To represent "fatty acid synthase activity" in a way that is analogous to this, I would keep "fatty acid synthase activity" as a function because to regular mammalian, or yeast, biologists this represents a catalytic activity. Then this overall term should have a has_part relationship to a function term for each of the steps that makes up a portion of "fatty acid synthase activity". Cindy and I have already proposed this for "fatty acid synthase activity" (the current is_a relationships are clearly not the correct relationship). "fatty-acyl-CoA synthase activity" is almost exactly the same situation as "fatty acid synthase activity" and should be dealt with in the same manner.

-Karen

Original comment by: krchristie

gocentral commented 13 years ago

This is the same argument that comes up again and again... what is a GO function? If you are going to have functions that can be decomposed into different parts, how do you distinguish them from (a) processes, and (b) gene product / protein complex annotations that you're hard-coding into the ontology? We need to get away from thinking about the language used in papers ("such-and-such a protein was shown to have fatty acid synthase activity" ==> they used "activity", therefore it must be a function!) and start thinking about what is actually happening. The majority of function terms represent single elemental reactions or events that cannot be broken down into other functions. Why should this term be an exception? How is it distinct from 'fatty acid biosynthesis'?

It seems to me that the reason for keeping this term is to allow eukaryotic annotators to have a function annotation without having to find out what fatty acid synthase actually does in their species. This propagates confusion, not information. Somewhere along the line, someone is going to have to do the hard work and find out what reactions FAS actually performs. And which FAS are we going to choose? Have a look at MetaCyc's data for FAS: the reactions vary widely between different eukaryotic species. Which one are we going to choose as the GO FAS?

With regards the divisibility or otherwise of a process into steps, this is usually a question of scale. For example, you might say that you cannot divide a process like walking up into steps (ha!). If you consider a body moving, it is indeed difficult to do so. However, when you start looking at the level of the body parts, you can start to divide it up: here we have the nervous impulse from the brain to the leg muscles, then the contraction of the leg muscles, etc.. Similarly an enzymatic reaction can be broken up into parts, e.g. substrate A binds to the enzyme, forming a charged intermediate; substrate B is attracted to the intermediate and binds, etc.. It is a question of deciding the appropriate level at which to divide the sequence of events.

Original comment by: girlwithglasses

gocentral commented 13 years ago

Yes, I agree this comes up over and over again, and I think we need to resolve it before you obsolete, or merge into process, terms like this. Perhaps it should be put on the agenda for the next GO meeting.

Original comment by: krchristie

gocentral commented 13 years ago

Yes there is no urgency. Let's put it on the agenda for the next meeting I will forward to Judy.

Vsl

Original comment by: ValWood

gocentral commented 13 years ago

I am reluctant to postpone this until the next GO meeting because of past experiences where people agree in the meeting and then afterwards, a whole lot of dissenters pipe up and we end up doing nothing.

IMHO these 'complex functions' need an annotation-based solution, not a hard-coding of biological data into the ontology.

My suggestion for a solution would be for databases to have local aliases for certain commonly used phrases or gene product/complex names which would create annotations to a set of GO terms, thus saving the annotator from having to find the terms individually. In this example, a multi-functional gene product like FAS would map to a set of GO terms including fatty acid biosynthesis, various function terms, component terms (if applicable), etc.. These could be edited if required. MODs could set these aliases up as required, based on common terms for their species. This saves curators some of the work of having to find terms, and obviates the need for potentially confusing terms like 'fatty acid synthase activity', which are ripe for misinterpretation.

The inability to create richer, more complex annotations is something that GO needs to implement ASAP because its lack leads to fudgy, ontology-based solutions which weaken the integrity of the ontology and cause us to argue over the same points again and again and again.

Original comment by: girlwithglasses

gocentral commented 13 years ago

Taking the case of "fatty acid synthase" which I have discussed with both Cindy Krieger and Peter D'Eustachio such that I feel I understand the situation reasonably well, I do NOT agree that this is an annotation issue, that would be best solved by having more complex annotations. I do agree that we need to be able to make complex annotations, but I do not think that this is the solution to the question of how to represent a complex enzymatic function like fatty acid synthase.

It is true in prokaryotes that there are a bunch of separate enzymes that catalyze individual steps, that collectively form a pathway. However, in eukaryotes the functional unit is a complex whose normal function is to catalyze the complete reaction. In normal eukaryotic cells, the parts of the reaction that correspond to individual reactions in E. coli are not separable, but rather are integrated into a single catalytic complex. This is true in both mammals, where it is a single gene product, and in cerevisiae, where the individual functions are combined into a couple gene products differently than in either mammals or prokaryotes.

Regarding "complex functions", we have already started encoding them into the Function ontology, using the has_part relationship. Here are a couple examples, one of which has 2 has_part relationships to indicate that this function encompasses both binding to the RNA polymerase and binding to DNA.

- sequence-specific regulatory transcription factor site binding RNA polymerase transcription factor activity (GO:0000982) -- has_part RNA polymerase II regulatory transcription factor site sequence-specific DNA binding (GO:0000978)

- sigma factor activity (GO:0016987) -- has_part bacterial-type RNA polymerase core promoter sequence-specific DNA binding (GO:0000985) -- has_part bacterial-type RNA polymerase core enzyme binding (GO:0001000)

Following on from these examples, I think that the appropriate way to represent fatty acid synthase is to do basically the same thing and have a MF term for "fatty acid synthase" that is an is_a descendent of "catalytic activity" which has has_part relationships to the parts of the reaction that are individual steps in prokaryotes.

In terms of providing sensible annotations to our user community, I think it makes sense to recognize that the functional unit in eukaryotes is the fatty acid synthase complex, such that there is a term "fatty acid synthase" that is_a "catalytic activity", and then use the ontology to encode that we recognize that the "fatty acid synthase" activity in eukaryotes appears to be composed of a number of parts that correspond to single genes in prokaryotes.

Regarding solving this issue, I agree that just putting something on the GO meeting agenda without anyone thinking about it in advance does not tend to provide lasting solutions. However, the working groups for annotation issues for the last meeting worked well, such that the working group was responsible for either coming up with a recommendation or identifying broader questions to get GOC input on possible directions to go for a solution.

Original comment by: krchristie

gocentral commented 13 years ago

It is unfortunate that the examples of complex functions that you have chosen both have the word 'factor' in the term name, and sound like a class of gene products with 'activity' added to the end. Taking the 'sigma factor activity' example:

sigma factor activity --hP GO:0000985 --hP GO:0001000

I believe that this information would be better off captured with an annotation to GO:0000985 AND GO:0001000 [i.e. these annotations are linked], plus a textual description or term from another vocabulary indicating that the GP is a sigma factor. Just look at the definition of sigma factor:

"A sigma factor is the promoter specificity subunit of eubacterial-type multisubunit RNA polymerases, those whose core subunit composition is often described as alpha(2)-beta-beta-prime. (This type of multisubunit RNA polymerase complex is known to be found in eubacteria and plant plastids). Although sigma does not bind DNA on its own, when combined with the core to form the holoenzyme, this binds specifically to promoter sequences, with the sigma factor making sequence specific contacts with the promoter elements. The sigma subunit is released from the elongating form of the polymerase and is thus free to act catalytically for multiple RNA polymerase core enzymes."

This doesn't describe an activity or an event that takes place, it describes a physical entity.

With regards arguments centred around processes that can or can't be split up into steps: splitting things into steps is an entirely artificial way of looking at a set of events; reality is not like a reel of film in which we can see the individual frames. We choose to divide events up into parts based on what is convenient and useful to us, and the level at which we look at the event. I think that from the perspective of the gene product, the set of reactions catalyzed by FAS could be considered a single 'step'. If we look at FAS at the molecular level, though, we can identify a number of separate conversions that take place sequentially. This is why many databases--KEGG, MetaCyc, EC, RHEA, and others--split up the reactions catalyzed by FAS. You should note that Reactome tend not to split up these reactions, which is why Peter D'E. would concur that eukaryotic FAS should be a single "function". GO has followed the MetaCyc/EC/KEGG route for many other multi-reaction enzymes, so making an exception for this one seems a little odd.

Apart from all these other concerns, I think that maintaining a FAS term in function is asking for trouble because there will be confusion over when it should be used and incorrect annotations are almost inevitable. GO is supposed to be a species-neutral controlled vocabulary, and keeping FAS activity violates both the species neutrality AND the controlled nature of the vocabulary.

Finally (!!), I believe that with a richer annotation paradigm, we would be able to capture all these "complex functions" without having to use ontology work-arounds. Why would it not be possible to capture what the euk FAS complex does with a more powerful annotation system?

Original comment by: girlwithglasses

gocentral commented 13 years ago

I don't think there's any point to continuing this discussion on SourceForge. You have stated your position. I have stated that of SGD. Continuing to restate the arguments here does not solve the issue, which clearly does need to be solved, and in a way that the GOC as a whole buys into so that don't keep coming back to this same issue.

Original comment by: krchristie

gocentral commented 13 years ago

Yup, agree. If you have any advice or suggestions on the best way to proceed with regards working groups, etc., please let us know.

Original comment by: girlwithglasses

gocentral commented 13 years ago

Original comment by: mah11

gocentral commented 13 years ago

We've been discussing this at our SF jamboree. In an attempt to get this issue resolved, would it work to have the generic fatty-acyl-CoA synthase activity ; GO:0004321 term, with HAS_PART links to the GO terms for the 5 individual EC activities?

We realise this would be accepting that GO:0004321 is a 'compound function'. We've been moving towards the view that functions are essentially mini-processs anyway.

We can also create a P-F link to make GO:0004321 part of fatty acid biosynthetic process.

thanks, GOEd

Original comment by: rebeccafoulger

gocentral commented 13 years ago

Original comment by: rebeccafoulger

gocentral commented 13 years ago

Sorry, for my long comment, but unfortunately, there wasn't a quick answer.

In theory, I agree with your proposal to keep the compound fas activity terms as a MF term and to include its subset of individual reactions as has_part relationships, however there are some complications, which I will mention below.

The first thing is that there need to be two different GO terms for the composite reactions: fatty-acyl-CoA synthase activity (EC 2.3.1.86) and fatty acid-CoA synthase activity (EC 2.3.1.85). I just noticed that GO is representing both of these EC reactions with one GO term (GO:0004321). They should be distinct because not all organisms product fatty-acyl-CoAs as an intermediate in fatty acid biosynthesis.

While, yeast, some fungi and protists produce fatty-acyl-CoAs, mammals, bacteria, and plants do not. The latter group directly convert the fatty-acyl-ACP intermediates to free fatty acids (EC 3.1.2.14), while the former group convert the fatty-acyl-ACPs to fatty-acyl-CoAs (EC 2.3.1-) which are then hydrolysed to free fatty acids (EC 3.1.2.2) (PMID 7007046). So while the fatty-acyl-CoA synthase activity (EC 2.3.1.86) and fatty acid-CoA synthase activity (EC 2.3.1.85) are both applicable to yeast, some fungi and some protists, only the fatty acid synthase activity activity (EC 2.3.1.85) is applicable to mammals, plants and bacteria.

So we could have, MF Fatty-acyl-CoA synthase activity (EC 2.3.1.86) Has_part: EC 2.3.1.38 [acyl-carrier-protein] S-acetyltransferase Has_part: EC 2.3.1.39 acyl-carrier-protein] S-malonyltransferase Has_part: EC 2.3.1.41 3-oxoacyl-[acyl-carrier-protein] synthase Has_part: EC 1.1.1.100 3-oxoacyl-[acyl-carrier-protein] reductase Has_part: EC 1.3.1.10 enoyl-[acyl-carrier-protein] reductase (NADPH)) Has_part: EC 4.2.1- hydroxyacyl-[acyl-carrier-protein] dehydratase

EC 1.3.1.10 should replace EC 1.3.1.9 b/c yeast use NADPH as a reactant and not NADH (PMID 338601) and the comment on the IUBMB website for EC 2.3.1.86 states, “The yeast and Escherichia coli enzymes are B-specific with respect to NADP””.

I don’t think we need to include EC 1.1.1.279, (R)-3-hydroxyacid ester dehydrogenase. It appears to be covered by EC 1.1.1.100.

Another difficulty is how to deal with the hydroxyacyl-acp dehydratase reaction (EC 4.2.1.-). EC 4.2.1.61 3-hydroxypalmitoyl-[acyl-carrier-protein] dehydratase is noted to work on C12 – C16 compounds, which probably deals with most of the fatty acids but there are different EC numbers for C4-C8 (EC 4.2.1.58) and C6 to C12 (EC 4.2.1.59). For this reason I suggest we just have a Has_part relationship to EC 4.2.1-

The term for fatty acid CoA synthase activity (EC 2.3.1.85) is a little more difficult because some organisms us one has-part reaction while others use another and some species have multiple pathways which can use one or another.

I spoke to Karen Chris and I believe she said that has_part relationships always have to hold true, If we wanted to follow this route we would have to create separate fatty acid CoA synthase activity terms for different species, and have a general fatty acid synthase activity term with only has_part relationships to reactions shared by all of the organisms.

However, I don’t know how do deal with E. coli that may use 1.3.1.9 or 1.3.1.10, and 2.3.1.41 or 2.3.1.180.

I was thinking possibly, something like this, but I’m not in love with it. Since I was unable to paste in the image, I have put it in my public html directory which you can obtain by going to http://fafner.stanford.edu/~cindy/GO\_fas\_activity.pptx

Cindy

Original comment by: cjkrieger

gocentral commented 13 years ago

Thank you for the detailed response Cindy! You have the crux of the issue when you say,

"If we wanted to follow [the "has part"] route we would have to create separate fatty acid CoA synthase activity terms for different species, and have a general fatty acid synthase activity term with only has_part relationships to reactions shared by all of the organisms".

GO was set up to remove species- (or biologist-) specific ambiguity. It seems here that we are clinging on to using a specific name that means different things to different people, when it would be better to give the term a different name and have 'fatty acid synthase" as a synonym so that searches will bring it up. My suggestion would be to create a set of terms that specify the different pathways of FA synthesis (of the form "FA synth via X, Y and Z" or "FA synth from A") and have as synonyms "yeast fatty acid synthesis", "E. coli fatty acid synthesis", etc..

I am not sure what the issue is with the terms being in process vs function. The fact that these are pathways comprising multiple reactions with individual EC numbers dictates they should be in process. As long as they can still be found using 'fatty acid synthase' as a search string, I don't see an issue.

Re: "has part": if X has part Y, ALL Xs have some Y as a part, but not all Ys are part of X. See the extended ontology relations guide for more info: http://www.geneontology.org/GO.ontology-ext.relations.shtml

My suggestion would be to basically do what Cindy has suggested, but have the parent terms as 'fatty acid synthesis via X, Y and Z' (instead of 'fatty acid synthase in yeast'), with a description of the series of reactions and intermediates in the term def, "has part" links to the reactions as appropriate, and synonyms that reference the appropriate species.

If you check out 'GO:0052704 : ergothioneine biosynthesis from histidine via N-alpha,N-alpha,N-alpha-trimethyl-L-histidine', you can see a pathway that has been defined in this explicit way; if you look at it in OBO-Edit, you can see the "has part" links to the functions involved in the pathway.

Original comment by: girlwithglasses

gocentral commented 12 years ago

Looking at this again, we (GO Eds) feel that the two terms are being confused a bit here: GO:0004321 fatty-acyl-CoA synthase activity and GO:0004312 fatty acid synthase activity

Original comment by: paolaroncaglia

gocentral commented 10 years ago

Hi David,

This was assigned to Amelia - I'll hand it over to you if you don't mind, as it relates to the same areas as these two that you also own:

https://sourceforge.net/p/geneontology/ontology-requests/10586/ https://sourceforge.net/p/geneontology/ontology-requests/10634/

Thanks, Paola

Original comment by: paolaroncaglia

gocentral commented 10 years ago

Diff:


--- old
+++ new
@@ -1,4 +1,3 @@
-

 This is a compound function term.
 EC 4.2.1.61 :  IUBMB

Original comment by: paolaroncaglia

gocentral commented 10 years ago

Diff:


--- old
+++ new
@@ -1,4 +1,3 @@
-
 This is a compound function term.
 EC 4.2.1.61 :  IUBMB
 EC 2.3.1.39 : IUBMB

Original comment by: ukemi

ukemi commented 8 years ago

I'm closing this as being out of date since 1) we are moving towards the annotation of complexes with molecular functions. 2) Relations now exist that link molecular functions and processes and 3) We have accepted many other compound molecular functions, receptors etc as valid with corresponding has_part relations if possible.