GeomScale / dingo

A python library for metabolic networks sampling and analysis
GNU Lesser General Public License v3.0
44 stars 28 forks source link

Error using fva after sampling #96

Open vfisikop opened 6 months ago

vfisikop commented 6 months ago

The following code

import dingo
dingo_model = dingo.MetabolicNetwork.from_sbml("ext_data/e_coli_core.xml")
sampler = dingo.PolytopeSampler(dingo_model)
samples = sampler.generate_steady_states()
fva_output = dingo_model.fva()

returns an error

Traceback (most recent call last):
  File "workspace/dingo_vfisikop/script.py", line 16, in <module>
    fva_output = dingo_model.fva()
  File "workspace/dingo_vfisikop/dingo/MetabolicNetwork.py", line 111, in fva
    return fast_fva(
  File "workspace/dingo_vfisikop/dingo/gurobi_based_implementations.py", line 162, in fast_fva
    max_biomass_flux_vector, max_biomass_objective = fast_fba(lb, ub, S, c)
  File "workspace/dingo_vfisikop/dingo/gurobi_based_implementations.py", line 112, in fast_fba
    optimum_sol.append(v[i].x)
UnboundLocalError: local variable 'v' referenced before assignment
vfisikop commented 6 months ago

the issue holds for both slow and fast modes

hritikb commented 5 months ago

the issue holds for both slow and fast modes

I tried with the slow mode, it seems to work fine.

phase 1: number of correlated samples = 500, effective sample size = 4, ratio of the maximum singilar value over the minimum singular value = 3123.85
phase 2: number of correlated samples = 500, effective sample size = 15, ratio of the maximum singilar value over the minimum singular value = 73.5605
phase 3: number of correlated samples = 500, effective sample size = 93, ratio of the maximum singilar value over the minimum singular value = 3.10414
phase 4: number of correlated samples = 500, effective sample size = 123, ratio of the maximum singilar value over the minimum singular value = 2.43669
phase 5: number of correlated samples = 1900, effective sample size = 795
[5]total ess 1030: number of correlated samples = 3900
[5]maximum marginal PSRF: 1.00293

max_biomass_objective: 0.8739215069684304

To verify, I just added print(f"max_biomass_objective: {fva_output[3]}") at the end of the code in https://github.com/GeomScale/dingo/issues/96#issue-2225436186

hariszaf commented 5 months ago

If you have

dingo_model.parameters['fast_computations'] = False

then indeed it's working fine.

For some reason, in the fast mode, the lb changes:

in the

>>> model = dingo.MetabolicNetwork.from_sbml("e_coli_core.xml")
>>> model.lb
array([    0.  ,     0.  , -1000.  , -1000.  ,     0.  , -1000.  ,
       -1000.  , -1000.  , -1000.  , -1000.  , -1000.  , -1000.  ,
           0.  , -1000.  , -1000.  ,     8.39,     0.  , -1000.  ,
           0.  , -1000.  ,     0.  , -1000.  , -1000.  ,     0.  ,
           0.  , -1000.  , -1000.  , -1000.  ,     0.  , -1000.  ,
           0.  ,     0.  , -1000.  , -1000.  ,     0.  , -1000.  ,
           0.  , -1000.  , -1000.  ,     0.  , -1000.  , -1000.  ,
       -1000.  ,     0.  ,     0.  ,     0.  , -1000.  ,     0.  ,
           0.  ,     0.  ,     0.  ,   -10.  ,     0.  ,     0.  ,
       -1000.  , -1000.  ,     0.  ,     0.  , -1000.  , -1000.  ,
       -1000.  ,     0.  ,     0.  , -1000.  ,     0.  ,     0.  ,
       -1000.  ,     0.  ,     0.  , -1000.  ,     0.  , -1000.  ,
       -1000.  ,     0.  ,     0.  ,     0.  , -1000.  ,     0.  ,
           0.  , -1000.  ,     0.  , -1000.  , -1000.  ,     0.  ,
       -1000.  ,     0.  ,     0.  , -1000.  ,     0.  ,     0.  ,
           0.  ,     0.  , -1000.  , -1000.  ,     0.  ])

then

>>> sampler = dingo.PolytopeSampler(model)
>>> samples = sampler.generate_steady_states()
>>> model.lb
array([-1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
        8.39000000e+000,  0.00000000e+000, -1.79769313e+308,
        0.00000000e+000, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
        0.00000000e+000, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
        0.00000000e+000, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308,  0.00000000e+000,  0.00000000e+000,
        0.00000000e+000, -1.79769313e+308,  0.00000000e+000,
        0.00000000e+000, -1.79769313e+308, -1.79769313e+308,
       -1.00000000e+001, -1.79769313e+308,  0.00000000e+000,
       -1.79769313e+308, -1.79769313e+308,  0.00000000e+000,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308,  0.00000000e+000,  0.00000000e+000,
       -1.79769313e+308,  0.00000000e+000,  0.00000000e+000,
       -1.79769313e+308,  0.00000000e+000, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308,  0.00000000e+000,
        0.00000000e+000, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308,  0.00000000e+000, -1.79769313e+308,
       -1.79769313e+308,  0.00000000e+000,  0.00000000e+000,
       -1.79769313e+308,  0.00000000e+000, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308])

The issue seems to be with the fast_remove_redundant_facets.

import dingo
from dingo.gurobi_based_implementations import fast_remove_redundant_facets

model = dingo.MetabolicNetwork.from_sbml("ext_data/e_coli_core.xml")
sampler = dingo.PolytopeSampler(model)
A, b, Aeq, beq = fast_remove_redundant_facets(sampler._metabolic_network.lb,
                                              sampler._metabolic_network.ub, 
                                              sampler._metabolic_network.S, 
                                              sampler._metabolic_network.biomass_function, 
                                              sampler._parameters["opt_percentage"])
model.lb

would return

array([-1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,

An easy work-around would be something like this

@vissarion let me know if that's ok or else we should edit the fast_remove_redundant_facets

hariszaf commented 5 months ago

I think the issue is actually here where we edit the lb and ub. @TolisChal I guess we could just make a deep copy, we do not need to alter the lb and ub for some reason, right ?

vissarion commented 5 months ago

Indeed if the user set dingo_model.parameters['fast_computations'] = False then there is no error. However, if the user sets dingo_model.set_slow_mode then the issue persist. A side problem here is that the user has two ways for setting slow/fast mode.

The workaround with the deep copy fixes the instance above, thanks @hariszaf. The shortcoming is first that it has a performance penalty and second that still the sampler performs a destructive operation on lb e.g. run print(sampler.metabolic_network.lb) thus sampler.metabolic_network.fva() fails with the same error.

hariszaf commented 5 months ago

Regarding the two ways to change the modes: the only thing we can is to remove the function; the function is just edits the parameters dictionary , so it's the same thing, just a bit easier for the user.

With the changes in PR #98 I was able to have the following:

>>> model = dingo.MetabolicNetwork.from_json("e_coli_core.json")
>>> model.parameters
{'opt_percentage': 100, 'distribution': 'uniform', 'nullspace_method': 'sparseQR', 'fast_computations': True, 'tol': 1e-06}
>>> model.set_slow_mode()
>>> sampler  = dingo.PolytopeSampler(model)
>>> samples = sampler.generate_steady_states()
phase 1: number of correlated samples = 500, effective sample size = 6, ratio of the maximum singilar value over the minimum singular value = 1393.54
phase 2: number of correlated samples = 500, effective sample size = 6, ratio of the maximum singilar value over the minimum singular value = 200.768
phase 3: number of correlated samples = 500, effective sample size = 4, ratio of the maximum singilar value over the minimum singular value = 297.122
phase 4: number of correlated samples = 500, effective sample size = 104, ratio of the maximum singilar value over the minimum singular value = 79.8253
phase 5: number of correlated samples = 500, effective sample size = 175, ratio of the maximum singilar value over the minimum singular value = 3.16506
phase 6: number of correlated samples = 500, effective sample size = 165, ratio of the maximum singilar value over the minimum singular value = 2.81058
phase 7: number of correlated samples = 2300, effective sample size = 990
[5]total ess 1450: number of correlated samples = 5300
[5]maximum marginal PSRF: 1.01828

>>> model.fva()
           minimum    maximum
PFK       7.339496   7.665214
PFL       0.000000   0.120636
PGI       4.444643   5.313224
PGK     -16.176700 -15.887174
PGL       4.507811   5.376392
...            ...        ...
NADH16   38.450441  38.739967
NADTRHD   0.000000   0.723817
NH4t      4.760294   4.773698
O2t      21.759349  21.839773
PDH       9.153288   9.877105

[95 rows x 2 columns]
>>> model.parameters
{'opt_percentage': 100, 'distribution': 'uniform', 'nullspace_method': 'sparseQR', 'fast_computations': False, 'tol': 1e-06}
>>> model.fba()
            fluxes
PFK       7.477382
PFL      -0.000000
PGI       4.860861
PGK     -16.023526
PGL       4.959985
...            ...
NADH16   38.534610
NADTRHD  -0.000000
NH4t      4.765319
O2t      21.799493
PDH       9.282533

[95 rows x 1 columns]
>>> model.set_fast_mode()
>>> fast_sampler  = dingo.PolytopeSampler(model)
>>> fast_samples = fast_sampler.generate_steady_states()
phase 1: number of correlated samples = 500, effective sample size = 4, ratio of the maximum singilar value over the minimum singular value = 2475.19
phase 2: number of correlated samples = 500, effective sample size = 23, ratio of the maximum singilar value over the minimum singular value = 98.2438
phase 3: number of correlated samples = 500, effective sample size = 12, ratio of the maximum singilar value over the minimum singular value = 254.737
phase 4: number of correlated samples = 500, effective sample size = 30, ratio of the maximum singilar value over the minimum singular value = 70.0693
phase 5: number of correlated samples = 500, effective sample size = 94, ratio of the maximum singilar value over the minimum singular value = 2.34115
phase 6: number of correlated samples = 2000, effective sample size = 886
[5]total ess 1049: number of correlated samples = 4500
[5]maximum marginal PSRF: 1.02775

>>> model.fva()
           minimum    maximum
PFK       7.339496   7.665214
PFL       0.000000   0.120636
PGI       4.444643   5.313224
PGK     -16.176700 -15.887174
PGL       4.507811   5.376392
...            ...        ...
NADH16   38.450441  38.739967
NADTRHD   0.000000   0.723817
NH4t      4.760294   4.773698
O2t      21.759349  21.839773
PDH       9.153288   9.877105

[95 rows x 2 columns]
>>> sampler.metabolic_network.lb[:10]
array([-1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308, -1.79769313e+308, -1.79769313e+308,
       -1.79769313e+308])
>>> sampler.metabolic_network.lb.shape
(95,)

I suggest we keep this as the cost of deepcopy is too low and we check on the fast_remove_redundant_facets in the first chance.