fix: lipid biomass composition

SysBioChalmers / yeast-GEM

The consensus GEM for Saccharomyces cerevisiae

http://sysbiochalmers.github.io/yeast-GEM/

Creative Commons Attribution 4.0 International

91 stars 42 forks source link

fix: lipid biomass composition #21

Closed edkerk closed 6 years ago

edkerk commented 6 years ago

The expansive description of lipid metabolism is confusing, so I might be wrong in the following. To my understanding, the model doesn't specify a distribution of different FA chain lengths, but demands all FA chains in equal amounts (for majority of lipid metabolism, sterol esters seem specific):

There are all these individual reactions that build up for instance TAG(16:0, 18:1, 18:1).

oleoyl-CoA[erm] + diglyceride (1-16:0, 2-18:1)[erm] => coenzyme A[erm] + triglyceride (1-16:0, 2-18:1, 3-18:1)[erm]

They use specific acyl-CoAs (so no pooled pseudometabolite), and nowhere in those reactions is there any specification on abundant each fatty acid is. With that in mind, it would be cheapest to make TAG(16:0, 16:1, 16:0), and this is what I actually see when I run FBA and minimize the number of fluxes.

The model has so-called 'ISA' reactions that 'converts' FA-chain specific TAG species into a generic TAG species:

triglyceride (1-16:0, 2-18:1, 3-18:1)[erm] => 0.67901 triglyceride[erm]

I don't understand what these coefficients mean, but they seem to connect to the chain length (e.g. triglyceride (1-18:0, 2-18:1, 3-16:0) gets the same coefficient, even though it has different number of saturations).

These generic lipid species are then used in the lipid pseudoreaction:

[...] + 0.000206 fatty acid[c] + [...] + 0.000781 triglyceride[c] + 1.5e-05 zymosterol[c] => lipid[c]

So nowhere along that path is there any specification of distribution of FA chain lengths & saturation, all TAG species are as likely to be made, with some correction for the amount of carbons (but not hydrogens, as the two species mentioned above with similar coefficients do have different molecular weights).

The ISA reactions for fatty acids have no influence in this for two reasons:

palmitate[c] => 0.61538 fatty acid[c]

1: the coefficients are again just representing the number of carbons, not any measured abundance 2: fatty acid[c] is only used in the lipid pseudoreaction to represent free fatty acids, it is not used to be incorporated in any other lipid species.

edkerk commented 6 years ago

Possible solution:

choose a reference condition with measured fatty acid chain length and saturation to be included in the model (can of course be adjusted later, but the repository should be in some reference state).
as we typically only know FA profile after hydrolysis, we don't know how the fatty acid chains are distribution over for instance the 3 positions on TAG. So assume that each position is just as likely (or is the middle one always saturated?).
with some calculations, adjust the coefficients of all ISA reactions to now represent the measured distribution.

Would be most versatile when implemented in either MATLAB or Excel.

edkerk commented 6 years ago

Pushed the wrong button.. Just wanted to add/clarify that we should probably end up with one ISA reaction per lipid species, so:

0.25 TAG(16:0,16:1,16:0) + 0.10 TAG(18:0,16:1,16:0) + [...] = triglyceride

where the coefficients represent measure composition of fatty acids chains, instead of having individual ISA reactions:

TAG(16:0,16:1,16:0) = triglyceride TAG(18:0,16:1,16:0) = triglyceride

As is the case now.

hongzhonglu commented 6 years ago

There are total about 176 ISA reactions in present yeast model. These reactions were from yeast 5. In general, it is difficult to understand these reactions as they lack evidences in the database (I am not searching all the reaction database). @edkerk we are now finding the latest annotation information of each metabolite in yeast model. So based on this, we can correct the coefficients as you suggested.

BenjaSanchez commented 6 years ago

@edkerk as you point out, all these rxns convert chain-specific species to general species. Also, you are correct in the observation that the stoich. coeff. that these rxns get is based on chain length. The relationship is actually linear; as an example, I will focus on triglycerides (even though it should equally apply to all other species). Here are the 4 possible stoich. coeff. for all 32 triglyceride ISA rxns, based on total chain length from all three F.A. tails:

So the more carbon the higher the assigned stoich. coeff. The idea then currently in Yeast7 is to allow the model to choose any triglyceride from the 32 options (through the ISA rxns), and to correct for the chain length, so it is "equally attractive" carbon-wise for the cell to produce any of them. However as you say, we should adjust these coefficients to account for saturations as well, so making them proportional to the molecular weight would solve the issue; in that sense it's good that @hongzhonglu is working on including chemical formulas for all these species.

That being said, I think there are still mistakes in how the total abundance for each lipid is calculated: We know that we can get the abundance of each species in the model if we take the stoich. coeff. that is in the lipid pseudo-rxn, because the stoich. coeff. of the species lipid is = 1 in the biomass pseudo-rxn. For instance, for triglycerides this is = 0.000781 mmol/gDW. However, even if we would use the "cheapest" TAG (16:16:16) to produce the totality of this species, we would need a total of:

0.000781/0.62963 = 0.0012 mmol/gDW

of that specific TAG (as 0.62963 is the stoich. coeff. assigned for 16:16:16 species in the ISA rxns). However, we know from literature that the amount of TAGs in a cell is around 0.007 mmol/gDW, so the TAG composition is largely underestimated. In order to simplify this mess, I would instead redefine directly the triglyceride abundance in the lipid rxn to 0.007 mmol/gDW, and assume that this corresponds to a specific TAG distribution and work back from there. As an example, if we use 16:16:16 species (= 48 carbons) as a baseline, then the coeff in the ISA rxns for any of those would be = 1, and then we can re-scale the rest. For instance, any 16:16:18 species would get a stoich. coeff. of:

50/48 = 1.0417

This of course should be done with molecular weights as stated before to improve precision, but the idea is the same. I hope this clears out some of the confusion.

BenjaSanchez commented 6 years ago

Answering to some other comments in the discussion:

@edkerk

we should probably end up with one ISA reaction per lipid species, so: 0.25 TAG(16:0,16:1,16:0) + 0.10 TAG(18:0,16:1,16:0) + [...] = triglyceride

Here I am not so sure. The TAG distribution can vary considerably between strains, so I think it might be safer to just leave it up to the modeler if he/she has specific data, but if not just allow the model to choose any TAG, making of course the corrections that I mentioned in my previous post. Or at the very least, let's first solve the mistakes in the composition, and maybe then we can try out to force the model to specific TAG distributions. How well are these distributions studied btw? For TAGs probably well enough, but for phospholipids?

@hongzhonglu

it is difficult to understand these reactions as they lack evidences in the database

Note that ISA rxns are actually pseudo-rxns, so they are not expected to appear in any database. Let me know what do you think about the solution that I presented in the previous post :)

BenjaSanchez commented 6 years ago

Finally, to add some more to the discussion, here is a breakdown for all lipids created through the 176 ISA rxns:

Compound	In lipid pseudo-rxn?	# ISA rxns that can create it
complex sphingolipid	yes	3 in Golgi +3 in mitochondrion
dolichol	no	9 in lipid particle
inositol-P-ceramide	yes*	10 in Golgi + 10 in ER + 10 in mitochondrion
inositol phosphomannosylinositol phosphoceramide	yes*	10 in Golgi + 10 in ER + 10 in mitochondrion
mannosylinositol phosphorylceramide	yes*	10 in Golgi + 10 in ER + 10 in mitochondrion
1-phosphatidyl-1D-myo-inositol	yes	8 in cytoplasm
ergosterol ester	yes	2 in ER membrane
fatty acid	yes	5 in cytoplasm
phosphatidyl-L-serine	yes	8 in ER membrane
phosphatidylcholine	yes	8 in ER membrane
phosphatidylethanolamine	yes	8 in ER membrane
triglyceride	yes	32 in ER membrane

*partially: The only pseudo-metabolites that go into the lipid pseudo-rxn are the 3 in Golgi, as they get pooled in the complex sphingolipid pseudo-metabolite through isa rxns: inositol-P-ceramide [Golgi] -> complex sphingolipid [Golgi] inositol phosphomannosylinositol phosphoceramide [Golgi] -> complex sphingolipid [Golgi] mannosylinositol phosphorylceramide [Golgi] -> complex sphingolipid [Golgi]

And later transported to the cytoplasm (where they are used in the lipid pseudo-rxn): complex sphingolipid [Golgi] -> complex sphingolipid [cytoplasm]

For the case of mitochondrion, even though equivalent isa rxns also exist, there is no transport to the cytoplasm, therefore those 3 pseudo-metabolites are dead-end metabolites

For the case of ER, there're no isa rxns to begin with, therefore all 30 pseudo-metabolites are dead-end as well.

BenjaSanchez commented 6 years ago

@edkerk maybe this is something to fix? should complex sphingolipids be only produced in Golgi or can they also be produced in mitochondrion and ER?

edkerk commented 6 years ago

@BenjaSanchez I assume you're referring to the sphingolipids. There are two issues here:

Localization: if there is no proof in literature that they are produced in mitochondrion and/or ER, these pathways should be deleted, or, if it is known to what amount they are produced in each compartment they should be connected to lipid pseudoreaction. My gut feeling tells me that this is not known, so we will likely end up just removing these dead ends.
Stoichiometry / logical 'OR' (I think this is the trickier problem): the 'isa' reactions allow alternative sphingolipids to be labelled 'complex sphingolipid'. Have a look at Figure 1 of the Yeast 5.0 paper where they discuss 'isa' reactions. They specify

A model user is free to constrain the fluxes which produce specific complex sphingolipids to model an observed lipid composition, or may leave the model unconstrained if the more general biomass definition is sufficient for their needs.

but this is a very bad solution, as you'd have to adjust these boundaries for every slight change in growth rate. For most of the lipids we will have detailed information, so we can truly specify the lipid component of biomass. But can we find itemized quantities of complex sphingolipids (mmol/gDCW)?

One strength of these ISA reactions is that gene essentiality simulations will have less false positives, as the cell will have the choice to make different (complex sphingo)lipids, which apparently is the case in reality. So, instead of deleting ISA reactions, perhaps we should leave them in but set boundaries to 0. If one wants to do gene essentiality simulations, one has to switch those reactions on.

BenjaSanchez commented 6 years ago

@edkerk thanks for your feedback :)

Regarding localization: For now I will leave them in the Golgi then.
Regarding stoichiometry: I agree with you on this. The 3 rxns I showed above were actually the case for the sphingolipids:
```
inositol-P-ceramide [Golgi] -> complex sphingolipid [Golgi]
inositol phosphomannosylinositol phosphoceramide [Golgi] -> complex sphingolipid [Golgi]
mannosylinositol phosphorylceramide [Golgi] -> complex sphingolipid [Golgi]
```
However, in literature data (PJ Lahtvee et al. 2016) we don't see sphingolipids at all, so I'm not sure what to do with them for now. Keep the original abundance values? Or remove them entirely?

BenjaSanchez commented 6 years ago

update: PR #112 fixes this issue by using the newly defined SLIME reactions: lipids are now split into their 2 basic components, backbone and acyl-chains:

lipid -> sB backbone + sC1 acyl-chain1 + sC2 acyl-chain2 + ...

With this, separate lipid pseudoreactions are defined later for backbones and for acyl-chains. The stoichiometric coefficients are representing molecular weights, as the data used comes in g/gDW.

More info is found on SysBioChalmers/SLIMEr. This issue will be closed when the changes are merged to master.