SysBioChalmers / Human-GEM

The generic genome-scale metabolic model of Homo sapiens
https://sysbiochalmers.github.io/Human-GEM-guide/
Creative Commons Attribution 4.0 International
89 stars 39 forks source link

Why are two models generated using the same data and code inconsistent? #730

Open yangjingyu1108 opened 8 months ago

yangjingyu1108 commented 8 months ago

I utilized tINIT and Hart2015_ RNAseq.txt generates a model and then runs it again in the same way. Why don't the two runs yield the same model?

yangjingyu1108 commented 8 months ago

As shown in the figures.

图片1 图片2
mihai-sysbio commented 8 months ago

Just to make sure, you ran the same code, including the same data (DLD1 from Hart2015 from within the data/datasets folder. Was this tINIT or the newer ftINIT? Also, are you sure the Human-GEM version didn't change in the meantime (e.g., different branches of the repository)?

JonathanRob commented 8 months ago

I agree with @mihai-sysbio to first check for any changes in model version, repo version, etc.

If everything is exactly the same, there is still a possibility that tINIT can yield different results. This is due to some randomness that is introduced during the optimization process. The difference in the models should be very minor, but can still change the outcome of certain analyses if they involve the differing components.

I believe you can obtain the identical model if you seed the random number generator with the same value prior to running tINIT, but I am not entirely certain if the solver uses Matlab's random number generator or its own. If it uses its own, then I don't know if that approach would work; it probably also depends on the solver being used.

It is possible that in some cases ftINIT is less likely to yield different models for the same run, if you want to give that a try. Otherwise, if this difference is important to your analyses/conclusions, then it may be worth generating several copies of the model and analyzing them collectively to determine the impact of the algorithm stochasticity on the results.

yangjingyu1108 commented 8 months ago

Just to make sure, you ran the same code, including the same data (DLD1 from Hart2015 from within the data/datasets folder. Was this tINIT or the newer ftINIT? Also, are you sure the Human-GEM version didn't change in the meantime (e.g., different branches of the repository)?

I'm sure I used the same code and data. This was tINIT. I only ran the code twice within 24 hours, and I believe that the Human-GEM version didn't change in the meantime.

mihai-sysbio commented 8 months ago

I'm sure I used the same code and data. This was tINIT. I only ran the code twice within 24 hours, and I believe that the Human-GEM version didn't change in the meantime.

Even if there would be no code changes, nor a git fetch command ran to get the latest updates, a simple branch change from main to develop could be the cause for changes to appear. That said, if you ran everything within 24h, I guess it would be a safe assumption that nothing has changed. So I think @JonathanRob's suggestions above should be followed.

mihai-sysbio commented 7 months ago

@yangjingyu1108, here is a starting point for a longer discussion on the "consistency" of model contextualization/extraction algorithms that, e.g., use either random order of pruning reactions or random seeds in MILP algorithms:

Some MEMs, such as iMAT and MBA, utilize mixed integer linear programming (MILP), and as a result, the MEMs can yield multiple solutions for a model extraction that would be equally “fit” to the input data. We tested here how different the equivalently optimal models are from each other, and demonstrated that they exhibit similar levels of accuracy. Specifically, we generated equivalent optimal models of the A375 cell line constructed using iMAT or MBA (with the p25 threshold) by running the algorithms ten different times, using different random seeds to start the algorithm. We found that the ability to predict gene essentiality did not drastically change (Figure S3). Indeed, for example, the results from the analyses in which we predicted gene essentiality for the 10 different optimal MBA models represent yielded similar results. Furthermore, while the models show some differences in reaction content, we found them to vary far less in reaction content among the alternative optimal models, compared to models generated with different MEMs (Figure S4).

source: A Systematic Evaluation of Methods for Tailoring Genome-Scale Metabolic Models https://doi.org/10.1016/j.cels.2017.01.010

There is also a follow-up: Guidelines for extracting biologically relevant context-specific metabolic models using gene expression data https://doi.org/10.1016/j.ymben.2022.12.003