franciscozorrilla / EMBOMicroCom

Metabolic modeling tutorial for EMBO practical course: Metabolite and species dynamics in microbial communities
https://www.embl.org/about/info/course-and-conference-office/events/mcd22-01/
MIT License
7 stars 1 forks source link

Question about assessing growth rates based on media composition #2

Open alexmsalmeida opened 1 month ago

alexmsalmeida commented 1 month ago

Hi Francisco,

Thanks so much for taking the time to develop these great tutorials.

In my project I am comparing two groups of species (Entero co-colonizers vs co-excluders) and have identified a few metabolites that differ between them in terms of uptake and secretion.

One thing I wanted to check now is if it would be possible to simulate the effect of having or not those metabolites in the growth of several Enterobacteriaceae species. With cobrapy from what I understand you are able to assess this by modifying the medium and then checking the resulting growth rate. However, before embarking on that I did a quick check of the growth rates between the models I had simulated with carveme (gapless, M3 and M5) and I am getting identical values for all three. I assume this wouldn’t be expected, as the different media compositions would likely result in some variation. To get the growth rate values I am simply running:

model = cobra.io.read_sbml_model(sys.argv[1])
model_opt = model.optimize()
genome_name = os.path.basename(sys.argv[1]).split(".xml")[0]
print("%s\t%s" % (genome_name, model_opt.objective_value))

So this brings me to my questions:

1) Is this the correct way to compare growth rates between different CarveMe models?

2) If I want to test the effect of the presence/absence of a metabolite on the growth rate should I generate new models by changing the medium composition with CarveMe or stick to one model and then modify the medium within cobrapy? What would be the pros and cons of each?

This is all still pretty new to me, so many thanks in advance for the help and any specific tips you might have.

Best wishes, Alex

franciscozorrilla commented 1 month ago

Hi Alex,

Thanks for taking the time to repost your questions here and making this discussion publicly available.

First, it would probably be interesting for you to see exactly which reactions are getting gap filled in across the different media. You can extract this information from the model XML files (e.g. see this script, but note that it is for an older CarveMe version, so might require tweaking), as gapfilled reactions will be specified with a tag e.g. GAP_FILL: M8 (see this thread). This will give you an idea if any/which reactions are getting gap filled in based on media variation, and perhaps indicate whether or not it is reasonable that growth rates are the same across different gapfilled media variants. We plot this in the cheese study in sup fig 6b.

Second, note that while CarveMe produces gap filled FBA ready models, they are not constrained to any specific media composition by default (you can optionally do this with the -i flag, see docs). Therefore, even though you reconstructed those models to be able to grow in different specific media (i.e. gapfilling medium), the models have “open” exchanges, meaning that they can uptake whatever metabolites can be transported across the membrane given the metabolic network, not just the gapfilled media components. So even though they could have a few different reactions based on gapfilling, it is possible that they might not affect the growth rate since they could have the same set of open exchange reactions when you simulate them. You could either regenerate the models specifying the -i flag, or more ideal would be to simply specify the media composition before running FBA. For example, in this script I do exactly that for the cheese study media, although note that I used reframed for this analysis (also developed by Daniel Machado of CarveMe/SMETANA), but you could do something similar with cobrapy as I have done in this FBA tutorial. See sup fig 6d for results overview of gapfilling/simulation media conditions for FBA.

Regarding your specific questions:

  1. Yes, that is an appropriate method for comparing growth rates, just make sure that you are properly constraining the growth media as explained above. More generally, you might also want to consider some additional/complementary analysis to convince a particularly stubborn metabolic modeler reviewer. FBA only gives you one possible set of flux distributions that satisfies the linear programming constraints imposed by cellular metabolism and the growth objective function. For example, you could explore this space by carrying out flux variability analysis (FVA), which essentially calculates the feasible range of allowable fluxes across reactions. This is not a strictly necessary analysis, but good to be aware of FBA limitations and additionaly approaches.

  2. As suggested above, I think it makes more sense to simply modify the media composition before each simulation with cobrapy/reframed, as it would be overkilll to regenerate the model just to change one simulation parameter. Just so you know, when you modify media composition of the model, you are just updating the reaction lower and upper bounds. Furthermore, you should know that it is possible for CarveMe to produce slightly different models from the same input ORF-annotated protein fasta file, and this has to do with the underlying implementation. Since the model reconstruction process is set up as a linear programming task, just as we saw before with a flux distribution solution space, we here have a metabolic model solution space that satisfies the imposed constraints. Usually these are negligible/small differences, but it can vary depending on genome quality/completeness. CPLEX (the solve most commonly used) has some non-open source proprietary heuristics that can result in slightly different outcomes. Daniel Machado is moving towards replacing CPLEX with open source solvers, but since this is work in progress I am still using CPLEX.

If you are distributed and/or interested by this network uncertainty that underlies metabolic model reconstruction, you can quantify it by generating ensemble models with CarveMe and plotting pairwise jaccard distances as I have done in this tutorial + visualization. You definitely do not need to carry out this analysis for all your reconstructed GEMs, but good for you to be aware of it.

You might also be interested in this reframed function which returns a list of auxotrophies from a given metabolic model.

Hope this clears things up, and please let me know if you have any further questions!

Best wishes, Francisco

alexmsalmeida commented 1 month ago

Hi Francisco,

That’s great, very useful feedback. As a quick follow-up I now looked at the growth rates in the constrained models (carve -i) and compared with the reframed approach and am getting consistent results (with growth rates now much lower than using the previous universal model). Seems like a nice sanity check. Will move forward with reframed as this approach seems rather intuitive.

I guess the only consideration now from my perspective is for the other analyses where I am comparing metabolic competition/complementary indices, number and types of metabolites consumed/produced, etc. which models make more sense to use? The universal+gapfilled models or a constrained model on a specific media (e.g., rich gut media). The question is which scenario would be more realistic and better represent the biochemistry of these species in the human gut? There probably isn’t a straightforward answer, but would love to hear your thoughts.

Thanks again for the input!

Best, Alex

franciscozorrilla commented 1 month ago

Hi Alex, glad to hear that things are making sense!

Just to clarify some terms and concepts: in CarveMe when we talk about the universal model we are referring to the underlying metabolic network from which your model is “carved” out of. So for your gut bacteria analysis you should certainly not use the universal model, and conceptually it should not be possible to fill gaps in the universal model since it already includes all possible reactions. Of course, in practice it is likely that there are unknown reactions/metabolisms missing from the universal model, but CarveMe would not be able to gapfill them since they are missing from the database used for gapfilling :) To better understand the underlying process I would recommend having a quick read through the methods section of the CarveMe paper.

Regarding the calculation of SMETANA global metrics, i.e. metabolic interaction potential (MIP) and metabolic resource overlap (MRO), I believe that you should get the same result for a given set of models regardless of the constraints at the time of input, since the calculation of those metrics requires the models to be re-constrained under interacting vs non-interacting scenarios. I would also suggest you have a look at implementation of these global metrics as outlined in the methods section of the original SMETANA paper.

Regarding the calculation of SMETANA detailed metrics, i.e. species coupling score (SCS), metabolite uptake score (MUS), and metabolite production score (MPS), for this you will want to use models that have been gapfilled (e.g. M3: dGMM + LAB) and simulated (e.g. M11: dGMM + LAB - aromatic amino acids) under different biologically meaningful conditions. By gapfilling the model in M3 media CarveMe ensures that the model can achieve growth in that environment, so if you try to simulate SMETANA detailed interactions under M3 media for a community where each member has been gapfilled in M3 media, then no exchanges should be predicted since they can each grow individually. On the other hand, if we gapfill under M3 and simulate under M11, then exchanges will be predicted to support the growth of aromatic amino acid auxotrophs as we did in the metaGEM paper (refs: [1], [2]). Also check out these discussions in the metaGEM repo where other users had similar questions (https://github.com/franciscozorrilla/metaGEM/issues/112, https://github.com/franciscozorrilla/metaGEM/issues/139). The implementation of these metrics are outlined in the supplementary materials of the SMETANA paper linked above, also worth having a look at in my opinion.

Regarding FBA simulations to understand which metabolites are produced/consumed, here you could make a choice to explore the varying effect of many media conditions on metabolic fluxes of one/few models, or stick to one/few media conditions and compare across many models. In your case I think the second option makes more sense, since you are exploring the metabolism of Enterobacteriaceae species in the gut. For these kind of simulations you could use models gapfilled in the rich M3 media, and the run FBA under the different media variants from the paper Nutritional preferences of human gut bacteria reveal their metabolic idiosyncrasies we discussed (Figure 1d).

Best wishes, Francisco

alexmsalmeida commented 1 month ago

Amazing! Thanks so much Francisco, this clears up a lot of things. I will experiment a bit on my side to see the effect of these variables in the results. I also need to read a bit more the references you listed to better understand the fundamentals behind these methods.

Will keep in touch!

Ale