cdanielmachado / reframed

ReFramed: metabolic modeling package
Other
54 stars 15 forks source link

Use `set()` instead of `[]` for `var_ids` in Solver #28

Open Prunoideae opened 2 months ago

Prunoideae commented 2 months ago

https://github.com/cdanielmachado/reframed/blob/1927419e9d7d986e8c3e7f86a442c839373d8b8d/reframed/solvers/solver.py#L36-L40

Will make add_variable to have performance regression if over 40k variables are added into the problem, and adding 200k variables will have 10mins overhead due to the query of a python list is O(n):

image

While replacing it with set() will make it done within a second as the set is O(1) complexity:

image

cdanielmachado commented 2 months ago

Thank you, that is very useful feedback!

I'm actually working on refactoring that class (outside the main branch):

https://github.com/cdanielmachado/reframed/compare/solvers

And I just replaced lists with dicts... i wonder how they compare to sets.

Just out of curiosity, which solver are you using ?

Prunoideae commented 2 months ago

I think dict is the same as set as they both are based on a hash table to query or insert items. Languages like Java don't have a separate implementation of sets, where they are just dict with only keys and empty values.

I was using a fine-tuned Gurobi solver with an academic license. The MILP problem being solved was somehow infeasible due to the presence of a few metabolic models, so I reran the solving for several times to debug and noticed the performance problem.