coin-or / pulp

A python Linear Programming API
http://coin-or.github.io/pulp/
Other
2.1k stars 384 forks source link

PuLP returns None for variables that have zero coefficients both in objective and constraints. #331

Open andrea-taverna opened 4 years ago

andrea-taverna commented 4 years ago

Description: Consider a model where one or more variables are added to a constraint or the objective function but they all have zero coefficients everywhere in the model.

For example:

 model.addConstraint( a * x + b * y >= c)

with a, b, c floats and x, and y LpVariables, and assume that y does not appear in the objective or in any other constraint apart from the one above.

If someone sets b=0.0 before adding the constraint, then PuLP apparently omits y and invoking y.value() after solving the model will return None.

On the other hand, if one were to manually write an LP file with y listed among the variables but without appearing in any constraint, or in the objective, CBC would still write a feasible value for y in the solution file, usually the lower bound. (see MWE)

Context: I am writing a cutting plane model for dual optimization inside a lagrangian decomposition scheme. At the first iteration of the scheme the cutting plane has only one constraint, the first cut, and it can happen that one (dual) variable in the cut has an exact zero in the coefficient of the constraint. Then, querying the value of the variable after solving the model yields "None".

Issue: In general, this behaviour requires the user to be aware of which variables have zero coefficients everywhere in the model to avoid putting None where valid floats are expected. On the other hand, for these "free" variables it is fine to return any feasible value, as it is already done by CBC and, IIRC, by other software like AMPL.

I thank you for your formidable work on PuLP and also for any help you may provide!

MWE: Code:

from pulp import *
import sys
print(f"==> PuLP version:")
!{sys.executable} -m pip show pulp

problem = LpProblem(sense=LpMaximize)

z = LpVariable("z")
x = LpVariable("x",2,5)
y = LpVariable("y", 1,3)

problem.addConstraint(z+x+0.0*y <=1)

print("\n==>Solving problem as created in PuLP")
problem.writeLP("problem.lp")
problem.solve()

print("=>PuLP solution")
print(f"z={z.value()}")
print(f"x={x.value()}")
print(f"y={y.value()} <-- PROBLEM: this one is going to be None")

print("\n")

print("\n==> Solving problem manually written in LP format")
!cat problem2.lp
print("Calling CBC")
!cbc problem2.lp solve solu solution.txt
print("Solution computed by CBC")
!cat solution.txt

Output:

Name: PuLP
Version: 2.3
Summary: PuLP is an LP modeler written in python. PuLP can generate MPS or LP files and call GLPK, COIN CLP/CBC, CPLEX, and GUROBI to solve linear problems.
Home-page: https://github.com/coin-or/pulp
Author: J.S. Roy and S.A. Mitchell
Author-email: pulp@stuartmitchell.com
License: UNKNOWN
Location: /home/user/pyenv/experiments3.8/lib/python3.8/site-packages
Requires: amply
Required-by: 

==>Solving problem as created in PuLP
=>PuLP solution
z=-1.0
x=2.0
y=None <-- PROBLEM: this one is going to be None

==> Solving problem manually written in LP format
\* NoName *\
Maximize
OBJ: z
Subject To
_C1: x + z <= 1
Bounds
1 <= y <= 3
2 <= x <= 5
z free
End
Calling CBC
Welcome to the CBC MILP Solver 
Version: 2.9.9 
Build Date: Aug 21 2017 

command line - cbc problem2.lp solve solu solution.txt (default strategy 1)
### CoinLpIO::readLp(): Variable y does not appear in objective function or constraints
 CoinLpIO::readLp(): Maximization problem reformulated as minimization
Presolve 0 (-1) rows, 0 (-3) columns and 0 (-2) elements
Empty problem - 0 rows, 0 columns and 0 elements
Optimal - objective value 1
After Postsolve, objective 1, infeasibilities - dual 0 (0), primal 0 (0)
Optimal objective 1 - 0 iterations time 0.002, Presolve 0.00
Total time (CPU seconds):       0.00   (Wallclock seconds):       0.00

Solution computed by CBC
Optimal - objective value 1.00000000
      0 z                     -1                       0
      1 x                      2                       1
      2 y                      1                       0
pchtsp commented 4 years ago

Hello and thanks for the detailed report and MWE. Unfortunately, I think it may be hard to make it work like you expect. In PuLP, we only store the variables that are present at least once in the constraints or the objective function.

A couple of workarounds occur to me, I'm not sure if any satisfy your case:

  1. Initialize variables right after created with the setInitialValue() method with the lower bound, say. That way they will never have None if not included in the solving. Example:
y = LpVariable("y", 1, 3)
y.setInitialValue(y.lowBound)
  1. Right before/ after solving, check if the variable is in the model and then do something (set an initial value, for example). Example:
problem_vars = problem.variables()
if y not in problem_vars:
    y.setInitialValue(y.lowBound)
andrea-taverna commented 4 years ago

Hi Franco,

thanks for the quick reply.

I will use one of the workarounds you suggested as they are slightly clearer than checking directly for None values.

I know this is an OS project, and, in particular, I did not pay or contribute to it, but I still find the current situation perplexing.

In the current implementation variables are omitted in an affine expression whenever they have an exact zero coefficient, i.e.

 z = x + b *y + w
print(list(z.keys()))

prints [x,y,w] whenever b~=0. Even with b=1e-100 y is still added to the constraint. This likely means there's an equality check with 0.0 somewhere in the code.

However, code-wise, the above statement is very different than completely omitting y

z = x + w
print(list(z.keys()))

In the latter case I make it explicit I do not have y in the expression and that does not depend on the coefficients.

On the other hand, the workarounds you proposed require the user to check every variable manually.

So I still think this is a potential "leaky-abstraction" issue, whether one uses row- or column- wise modelling.

It seems to me that the condition "variable being used in a constraint or objective" may be detectable in the current interface already, without relying on the coefficient being exactly zero.

Indeed, a possible workaround could be to have the +/- for affine expressions always add a variable to it independently from its coefficient, but I guess there are reasons for which this would be bad.

stumitchell commented 4 years ago

Interesting somewhere in the code there is probably an if statement based on the coeff of the variable.

Pythons truthyness test would then exclude it.

I don't see any reason why we can't fix this, but then the question comes to what value should we set it too.

Perhaps we should leave the variable in and let the solver work out what it wants to return.

Some solvers will probably return nothing and therefore the variable.value() will be None but then we can point to the solver implementation.

pchtsp commented 4 years ago

I was unsure on whether it should be pulp, the solver or the user who should identify (and potentially extract) empty columns in the matrix. Many modelers do offer warnings when columns are empty or duplicated. I though that pulp's behavior was "intentional": I can see the usefulness for omitting variables with zero coefficients. On the other hand, I don't really see any reason why one would want to keep variables that are not relevant to the problem. If the column is empty, it should not be inserted. I believe (I may be wrong) that this should be part of the user implementation checks (be it a direct model or a decomposition).

On top of that this does not solve the problem that arises when a user forgets to add a particular variable to any constraint. Then, the value will be None and there is nothing we can do. This may be a different behavior from other modelers that require an explicit declaration of all variables and constraints for a given problem.

Finally, another workaround (maybe weirder), would be to add an explicit constraint for each variable, like a redundant upper bound. For example:

y = LpVariable("y", 1, 3)
problem += y <= y.lowBound

Of course, you would have to do that for each of your variables, which is I guess similar to the other workarounds in implementation.

andrea-taverna commented 4 years ago

I guess it boils down to the type and level of abstraction you want.

If I assume (I) that PuLP allows me to create a model with math-like statement then I should look at those statements to assess the model I'm creating. From the documentation it then appears obvious that to actually add a variable to a model I need to put it in a constraint or the objective. So what I do is write code to create a constraint or the objective where the variable is visibly present. The code becomes a "contract" that I can even unit test.

The "variable-omission" for zero columns is a useful optimization for the underlying LP file but I do not think it relates to how the library is used, as users are not expected to add variables by specifying whether all the coefficient in the model are exactly 0.0f or not.

The current implementation for sure violates assumption (I) of being able to easily understand the model or read its expected output by looking at the code for model creation. Whether assumption (I) is correct or important is a matter left to the contributors. Another question would be whether (I) is expected from the documentation or not.

I also found another potential problem. None can be returned when

In both cases it seems that the current implementation can introduce (silent) bugs depending on the parameters of the model, which are often input, that would then be hard to identify or would require using the library in an unnatural way, as specifying bounds or initial values in an unnecessary way.

Either way, those cases do not respect assumption (I) and, IMHO, are not found in most other solver interfaces, which tend to adhere to assumption (I) AFAIK.

Example (PoC):

model = LpProblem()
x = LpVariable.dict("x", [0,1,2,3], 0,3)
y = LpVariable("y",0,5)

c = [1,2,0,2.1]
b = [2* j for j in c]

S = {0,1,3}
D1 = 5
D2 = 4

cons1 = lpSum(c[i]*x[i] for i in x.keys()) + y <= D1)
cons2 = lpSum(b[i]*x[i] for i in S) + y <= D2)

model.add(cons1)
model.add(cons2)

print(f"Coefficients of x in constraint 1: {list(cons1.values())} # [1, 2, 2.1] -> no mention of x[2] coefficient
print(f"Coefficients of x in constraint 2: {list(cons2.items())} # [2,4, 4.2]

# Suppose I am in another part of the code and I want to do parametric analysis and see what happens if I increase cons1 coefficients by 1 unit
# Then I create a modified constraint as follow
cons3 = lpSum((coef+1)*var for var,coef in cons1.items()) <= cons1.constant) 

# Actually, coefficient for x[2] is not increased and cons3 won't have x[2] in the formulation . 
# Had x[2] had a coefficient ~=0 then the code would work as expected.