"Fixing" binary variables

etesio commented 5 years ago

Hello there,

I am writing a MILP model and I am trying to achieve the following:

in step 1, I solve the MILP problem
in step 2, I'd like to "fix" the binary variables obtained from the solution of step 1, and solve the corresponding (continuous) problem by changing some parameters. By "fixing" I mean: the binary variables should be constrained to take the optimal values stemming from step 1

The idea is that I want to solve a MILP problem and determine the cost for a given set of paramenters. Then, using the "optimal" set of binary variables I change some of the parameters to move along the cost curve of the continuous problem

I can think of two solutions to achieve step 2:

1) perhaps obviously, add constraints to the problem so that binary variables are equal to the optimal values associated to the solution of step 1 2) extract the optimal values of the binary variables from step 1 and write a second problem (literally copy-pasting it, and removing the variable declarations) where these values enter as parameters in the model

In case (1) it seems to me that I am formally writing a mixed-integer problem, but then I constrain all the integer variables... which seems like a waste

In case (2) it seems to me that - although cumbersome - I actually write a continuous problem, and ompr should (?) be equipped to solve it more quickly/efficiently than the original mixed integer problem (I am using the cbc solver, btw)

So my question is: is any of the two approach correct? Or is there a more direct way to achieve this in ompr?

hugolarzabal commented 5 years ago

When you are modeling a problem with ompr, there are 3 main concerns: 1) ompr/R eficiency at constructing the model (there can be time or memory issues with really big problems) 2) solver eficiency (to many useless/symetric constraints may slow down the resolution) 3) human eficiency (not spending too much time writing the code and making it easy to read)

I think the best solution in your case depends on those main concerns.

If your problem is not too big or too hard to solve, case 1 should be best. You can also make it simpler by using the set_bounds function on your binaries instead of using constraints: model %>% set_bounds(b[i,j], i = 1:n, j=1:m, lb = p(i,j), ub = p(i,j)) instead of model %>% add_constraint(b[i,j] == p(i,j),i = 1:n, j=1:m) That solution will be lighter for ompr and for the solver since it won't add additional constraints. Nevertheless you will still have some useless constraints between binaries in the model. In most cases, I would say that it is the best solution. It is easy to program and won't be a big problem for the solver.

If R or solver eficiency in step2 is a real concern (either because the model is really big or very hard to solve), then case 2 is the best option.

In both case, I would advise you to create a single function for the model which will add the necesarry constraints depending on step 1 or 2.

etesio commented 5 years ago

Thanks a lot @hugolarzabal, that makes perfect sense. I am leaning toward option1 as you suggested, as

writing the model takes some time while adding bounds/constraint on the step1 model is very quick, and
it definitively increases code clarity

So my situation is something like this:

model = build_model(...) # this takes some time
sol = ompr::solve_model(model, with_ROI(solver = "cbc", control = list("verbose"=T))) # this takes 0.4 - 0.5 sec
X = ompr::get_solution(sol, x[i,j]) # x is binary
Y = ompr::get_solution(sol, y[) # y is continuous
# modify Y
newY = Y + 5

cost_model = model %>% # I am not re-writing the full model, just copying the original one and adding a few constraints
    add_constraint(x[i,j] == X[i,j], i=1:N, j=1:M) # or use set_bounds
    add_constraint(y == newY)
cost_sol = ompr::solve_model(model, with_ROI(solver = "cbc", control = list("verbose"=T)))

As I expected, since the problem is now continuous (the binary variables x is constrained), the solver time execution is low (about 0.03 seconds). This is the cbc output (see the "no integer variables - nothing to do" in the pre-processing phase, I guessed?)

Continuous objective value is 7908.47 - 0.00 seconds
Cgl0002I 65 variables fixed
Cgl0004I processed model has 8 rows, 33 columns (0 integer (0 of which binary)) and 47 elements
Cbc3007W No integer variables - nothing to do
[...]
Total time (CPU seconds):       0.03   (Wallclock seconds):       0.03

Now my question is: why the total exeuction time of the ompr::solve_model call is much longer than the actual solver time? Is this expected? I did a microbenchmark of the ompr::solve_model call and this is the result (about 10 seconds of execution)

Unit: seconds
                                                                                 expr      min       lq     mean   median       uq      max neval
 ompr::solve_model(cost_model, with_ROI(solver = "cbc", control = list(verbose = T))) 9.794555 9.986063 10.12385 10.25004 10.28031 10.30826     5

Sorry if this is not the right place for the question, I guess it is more of an ompr/ROI issue (if any)? I noticed that the same is true when I call the solver for the original model instance, but I just figured there might be something going on here as it is independent on the fact that no branch-and-cut is even needed.

hugolarzabal commented 5 years ago

ompr build the model as a somewhat complex R object (list of list of list...) which can become quite heavy (several Mo) if you have tens of thousands of variables and indexes.

When you solve it, the model is translated first from ompr to ROI format, then from ROI to rcbc format and then from R language to C++ language to be passed to the solver. The solution does the same trip back.

Because of that, a model can sometime be quite large but very easy to solve and take more time to pass to the solver than to solve. The solver also usually have a preprocessing time which doesn't count as solving time.

However, in your case, cbc says "8 rows, 33 columns (0 integer (0 of which binary)) and 47 elements" which means that your model is small so I do not really know where it could come from.

Are you using MIPMode() or MILPModel() to create your function ?

etesio commented 5 years ago

OK thanks a lot I see, then I guess there is no way around this.

As per problem size, is it possible that the smaller problem (the "8 rows, 33 columns and 47 elements one" is reached/processed by CBC starting from a larger instance (which would explain the additional time before reaching the solution)? Because the model is actually larger

Mixed integer linear optimization problem
Variables:
  Continuous: 1572 
  Integer: 0 
  Binary: 289 
Model sense: minimize 
Constraints: 2059

where again, the binary variables are actually constrained so that the pre-processing phase of the solver already finds the correct solution.

The model is built via MILPModel

tjseydel commented 5 years ago

Hello there,

I have a similar problem where I also want to fix some of the variables, but not all of them since my solution is updated through multiple iterations. The following code illustrates my problem:

df <-  data.frame(request = c(1,2,3), timeslot = c(1,3,NA))

#nr of already scheduled requests
nr_scheduled <- 2 
model <- MIPModel() %>%
  #Create a variable that is 1 if request r is planned to start on time t
  add_variable(x[r,t], r= requests, t= timeslots,
               type = "binary") %>%

  #Bind the already scheduled appointments 
  set_bounds(x[r,t], r = df$request[1:nr_scheduled],  t = df$timeslot[1:nr_scheduled], lb = 1, ub = 1 )

Though, setting the bound like this sets all combinations of the request and timeslot numbers to one. So the values of x with respective r,t = 1,1 2,1 1,3 and 2,3 are set to 1.

I would like the set_bounds function to only set the bound of the x[r,t] that are already scheduled and defined in df. So, only set the variables x[1,1] & x[2,3] to 1 (since these are the already scheduled requests). This will probably require a filter, but I can't get it to work myself. I hope someone can help me out.

tjseydel commented 5 years ago

Already found a method to solve my problem. I first constructed a matrix _ymatrix[r,t] with values 0 and changed the values of r,t corresponding to already scheduled requests to 1. I then used the following set_bounds function:

 set_bounds(x[r,t], r = requests,  t = timeslots, lb = y_matrix[requests,timeslots], ub = 1 )

dirkschumacher / ompr

"Fixing" binary variables #255