Open andrea-taverna opened 4 years ago
Hello and thanks for the detailed report and MWE. Unfortunately, I think it may be hard to make it work like you expect. In PuLP, we only store the variables that are present at least once in the constraints or the objective function.
A couple of workarounds occur to me, I'm not sure if any satisfy your case:
setInitialValue()
method with the lower bound, say. That way they will never have None if not included in the solving. Example:y = LpVariable("y", 1, 3)
y.setInitialValue(y.lowBound)
problem_vars = problem.variables()
if y not in problem_vars:
y.setInitialValue(y.lowBound)
Hi Franco,
thanks for the quick reply.
I will use one of the workarounds you suggested as they are slightly clearer than checking directly for None values.
I know this is an OS project, and, in particular, I did not pay or contribute to it, but I still find the current situation perplexing.
In the current implementation variables are omitted in an affine expression whenever they have an exact zero coefficient, i.e.
z = x + b *y + w
print(list(z.keys()))
prints [x,y,w]
whenever b~=0
. Even with b=1e-100
y
is still added to the constraint. This likely means there's an equality
check with 0.0 somewhere in the code.
However, code-wise, the above statement is very different than completely omitting y
z = x + w
print(list(z.keys()))
In the latter case I make it explicit I do not have y
in the expression and that does not depend on the coefficients.
On the other hand, the workarounds you proposed require the user to check every variable manually.
So I still think this is a potential "leaky-abstraction" issue, whether one uses row- or column- wise modelling.
It seems to me that the condition "variable being used in a constraint or objective" may be detectable in the current interface already, without relying on the coefficient being exactly zero.
Indeed, a possible workaround could be to have the +
/-
for affine expressions always add a variable to it independently from its coefficient, but I guess there are reasons for which this would be bad.
Interesting somewhere in the code there is probably an if statement based on the coeff of the variable.
Pythons truthyness test would then exclude it.
I don't see any reason why we can't fix this, but then the question comes to what value should we set it too.
Perhaps we should leave the variable in and let the solver work out what it wants to return.
Some solvers will probably return nothing and therefore the variable.value() will be None but then we can point to the solver implementation.
I was unsure on whether it should be pulp, the solver or the user who should identify (and potentially extract) empty columns in the matrix. Many modelers do offer warnings when columns are empty or duplicated. I though that pulp's behavior was "intentional": I can see the usefulness for omitting variables with zero coefficients. On the other hand, I don't really see any reason why one would want to keep variables that are not relevant to the problem. If the column is empty, it should not be inserted. I believe (I may be wrong) that this should be part of the user implementation checks (be it a direct model or a decomposition).
On top of that this does not solve the problem that arises when a user forgets to add a particular variable to any constraint. Then, the value will be None and there is nothing we can do. This may be a different behavior from other modelers that require an explicit declaration of all variables and constraints for a given problem.
Finally, another workaround (maybe weirder), would be to add an explicit constraint for each variable, like a redundant upper bound. For example:
y = LpVariable("y", 1, 3)
problem += y <= y.lowBound
Of course, you would have to do that for each of your variables, which is I guess similar to the other workarounds in implementation.
I guess it boils down to the type and level of abstraction you want.
If I assume (I) that PuLP allows me to create a model with math-like statement then I should look at those statements to assess the model I'm creating. From the documentation it then appears obvious that to actually add a variable to a model I need to put it in a constraint or the objective. So what I do is write code to create a constraint or the objective where the variable is visibly present. The code becomes a "contract" that I can even unit test.
The "variable-omission" for zero columns is a useful optimization for the underlying LP file but I do not think it relates to how the library is used, as users are not expected to add variables by specifying whether all the coefficient in the model are exactly 0.0f
or not.
The current implementation for sure violates assumption (I) of being able to easily understand the model or read its expected output by looking at the code for model creation. Whether assumption (I) is correct or important is a matter left to the contributors. Another question would be whether (I) is expected from the documentation or not.
I also found another potential problem. None
can be returned when
0.0f
, as it can be in some algorithms (what I reported at the beginning)In both cases it seems that the current implementation can introduce (silent) bugs depending on the parameters of the model, which are often input, that would then be hard to identify or would require using the library in an unnatural way, as specifying bounds or initial values in an unnecessary way.
Either way, those cases do not respect assumption (I) and, IMHO, are not found in most other solver interfaces, which tend to adhere to assumption (I) AFAIK.
Example (PoC):
model = LpProblem()
x = LpVariable.dict("x", [0,1,2,3], 0,3)
y = LpVariable("y",0,5)
c = [1,2,0,2.1]
b = [2* j for j in c]
S = {0,1,3}
D1 = 5
D2 = 4
cons1 = lpSum(c[i]*x[i] for i in x.keys()) + y <= D1)
cons2 = lpSum(b[i]*x[i] for i in S) + y <= D2)
model.add(cons1)
model.add(cons2)
print(f"Coefficients of x in constraint 1: {list(cons1.values())} # [1, 2, 2.1] -> no mention of x[2] coefficient
print(f"Coefficients of x in constraint 2: {list(cons2.items())} # [2,4, 4.2]
# Suppose I am in another part of the code and I want to do parametric analysis and see what happens if I increase cons1 coefficients by 1 unit
# Then I create a modified constraint as follow
cons3 = lpSum((coef+1)*var for var,coef in cons1.items()) <= cons1.constant)
# Actually, coefficient for x[2] is not increased and cons3 won't have x[2] in the formulation .
# Had x[2] had a coefficient ~=0 then the code would work as expected.
Description: Consider a model where one or more variables are added to a constraint or the objective function but they all have zero coefficients everywhere in the model.
For example:
with
a
,b
,c
floats andx
, andy
LpVariables, and assume thaty
does not appear in the objective or in any other constraint apart from the one above.If someone sets
b=0.0
before adding the constraint, then PuLP apparently omitsy
and invokingy.value()
after solving the model will return None.On the other hand, if one were to manually write an LP file with
y
listed among the variables but without appearing in any constraint, or in the objective, CBC would still write a feasible value fory
in the solution file, usually the lower bound. (see MWE)Context: I am writing a cutting plane model for dual optimization inside a lagrangian decomposition scheme. At the first iteration of the scheme the cutting plane has only one constraint, the first cut, and it can happen that one (dual) variable in the cut has an exact zero in the coefficient of the constraint. Then, querying the value of the variable after solving the model yields "None".
Issue: In general, this behaviour requires the user to be aware of which variables have zero coefficients everywhere in the model to avoid putting
None
where valid floats are expected. On the other hand, for these "free" variables it is fine to return any feasible value, as it is already done by CBC and, IIRC, by other software like AMPL.I thank you for your formidable work on PuLP and also for any help you may provide!
MWE: Code:
Output: