LinExpr.add_var / LinExpr.add_expr / LinExpr.add_term are way too slow

jachymb commented 3 years ago

Consider the following function. It generates and solves a model structurally somewhat similar to a practical problem I am solving. (I tried to simplify it as much as possible)

def experiment_mip(n):
    start = time.time()

    model = mip.Model(sense=mip.MAXIMIZE, solver_name=mip.CBC)
    obj = []
    for i in range(n):
        var = model.add_var(f"var_{i}", lb=0.5, ub=1.0)
        a = random.random()  - 0.5  # Just a dummy value for illustration
        obj.append(var * a)
        constraint = mip.xsum(obj) <= 1.0
        model.add_constr(constraint, f"constraint_{i}")
    model.objective = mip.xsum(obj)

    construction_time = time.time() - start
    start = time.time()

    model.optimize()

    solution_time = time.time() - start
    return {"Model construction time": construction_time, "Solution time": solution_time}

Now, try running this for example for n=4000. The solution is much faster than the construction of the model. My measurements suggest that the solution time is asymptotically O(n^2) complexity but the construction is more than O(n^3). The profiler shows that that most time is spent in the LinExpr.add_var, LinExpr.add_expr and LinExpr.add_term functions. I think this is not acceptable and it should never take more time to construct than to solve a nontrivial problem.

I am not sure what the solution could be.

jachymb commented 3 years ago

Possibly related to #146

h-g-s commented 3 years ago

Hi,

I agree that this needs to be optimized, but remember that the model is created in an interpreted, dynamic language and it is solved using highly optimized native machine code.

If I remember correctly, the culprit here is

Xsum

LinExpr.add_term

More specifically lines

if isinstance(term, Var):

self.add_var(term, coeff)

elif isinstance(term, LinExpr):

self.add_expr(term, coeff)

elif isinstance(term, numbers.Real):

self.add_const(term)

else:

in entities.py 280

This is slow since before adding each term to the LinExpr it needs to check its type, it can be a constant, a variable, another LinExpr… This makes easy to build models but also makes it slow.

A faster approach would be to have a more “low level” way of creating constraints, i.e. using constraint indexes and coefficients both as numpy arrays (or cffi arrays). The code would be uglier and more verbose (like C code) but definitely much faster.

Cheers

From: Jáchym Barvínek notifications@github.com Reply-To: coin-or/python-mip reply@reply.github.com Date: Sunday, November 22, 2020 at 3:42 PM To: coin-or/python-mip python-mip@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [coin-or/python-mip] LinExpr.add_var / LinExpr.add_expr / LinExpr.add_term are way too slow (#163)

Consider the following function. It generates and solves a model structurally somewhat similar to a practical problem I am solving. (I tried to simplify it as much as possible)

def experiment_mip(n): start = time.time()

model = mip.Model(sense=mip.MAXIMIZE, solver_name=mip.CBC) obj = [] for i in range(n): var = model.addvar(f"var{i}", lb=0.5, ub=1.0) a = random.random() - 0.5 # Just a dummy value for illustration obj.append(var * a) constraint = mip.xsum(obj) <= 1.0 model.addconstr(constraint, f"constraint{i}") model.objective = mip.xsum(obj)

construction_time = time.time() - start start = time.time()

model.optimize()

solution_time = time.time() - start return {"Model construction time": construction_time, "Solution time": solution_time} Now, try running this for example for n=4000. The solution is much faster than the construction of the model. My measurements suggest that the solution time is asymptotically O(n^2) complexity but the construction is more than O(n^3). The profiler shows that that most time is spent in the LinExpr.add_var, LinExpr.add_expr and LinExpr.add_term functions. I think this is not acceptable and it should never take more time to construct than to solve a nontrivial problem.

I am not sure what the solution could be.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

rschwarz commented 3 years ago

I guess that in many situations, the modeler would input constraints in sum notation, knowing that no variable appears twice, and every term is of the form coefficient * variable. In this case, it would make sense to offer an alternative constructor to LinExpr that takes in a dict or a list of tuples.

This would not provide nice-looking algebraic syntax, but it avoids many dictionary operations as well as the isinstance calls.

christian2022 commented 3 years ago

With these minor changes (see ###), on my laptop construction time is 1/3 of solution time for n = 4000

def experiment_mip(n):
    start = time.time()

    model = mip.Model(sense=mip.MAXIMIZE, solver_name=mip.CBC)
    obj = LinExpr() ### do not build array of LinExpr but directly append to LinExpr
    for i in range(n):
        var = model.add_var(f"var_{i}", lb=0.5, ub=1.0)
        a = random.random()  - 0.5  # Just a dummy value for illustration
        obj.add_var(var, a) ### do not create intermediate LinExpr
        constraint = obj <= 1.0 ### no need to mip.xsum as obj is already the sum
        model.add_constr(constraint, f"constraint_{i}")
    model.objective = obj

    construction_time = time.time() - start
    start = time.time()

    model.optimize()

    solution_time = time.time() - start
    return {"Model construction time": construction_time, "Solution time": solution_time}

hadware commented 3 years ago

Super interesting ! That's a pretty nice solution, I'll try to find some time to hack this into the package.

tuliotoffolo commented 3 years ago

Nice, @christian2022! I like this code and that's why we reintroduced inplace operations for LinExpr, so that you could also write obj += a * var (of course this would create an intermediate LinExpr).

Anyhow, I agree that it would be useful to have something like an alternative LinExpr constructor that takes a dict or a list of tuples.

tuliotoffolo commented 2 years ago

I've just included a test considering 5 variations of the code @jachymb and @christian2022 posted here. The test example is available in this link and here are the 'construction' runtimes I got with n=10000:

using_addvar: 9.426069974899292
using_inplace_op: 9.878501176834106
using_dict: 11.112802028656006
using_lists: 14.583881855010986
recreating_linexpr: 74.6143479347229

@rschwarz: you can now create a LinExpr from a dictionary or two lists (variables and coeffs).

coin-or / python-mip

LinExpr.add_var / LinExpr.add_expr / LinExpr.add_term are way too slow #163