dirkschumacher / ompr

R package to model Mixed Integer Linear Programs
https://dirkschumacher.github.io/ompr/
Other
264 stars 35 forks source link

add dependency matrix to constrains #450

Open RKonstantinR opened 3 months ago

RKonstantinR commented 3 months ago

Hello, thanks for the great package!

I'm trying to use your library to develop a model for creating an optimal project portfolio. I was able to set constraint on the total number of projects and write the resulting function that maximizes profit.

But in addition to standard constraints, I’m trying to add project dependency matrix into the model. A number of projects depend on each other. The implementation of two projects out of three does not make economic sense.

There is a way to set constraints based on a dependency matrix and check that all related projects are present in the resulting list of projects?

library(dplyr)
library(ROI)
library(ROI.plugin.symphony)
library(ompr)
library(ompr.roi)
library(ROI.plugin.glpk)

project_list <- tibble(name = c("a", "b", "c", "d"), profit = c(100, 10, 5, 50)) %>% 
  mutate(item = row_number())

max_project <- 3

# dependency matrix (if 1 - the project depends on another project)
my_matrix <- matrix(
  c(0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0), 
  nrow = 4,   
  ncol = 4,        
  byrow = TRUE         
)

rownames(my_matrix) <- c("a", "b", "c", "d")
colnames(my_matrix) <- c("a", "b", "c", "d")

m <- MIPModel() %>%
  add_variable(x[i], i=project_list$item, type="binary") %>%
  add_constraint(sum_over(x[i], i=project_list$item) <= max_project) %>% 
  set_objective(sum_over(project_list$profit[i]*x[i], i=project_list$item),"max") %>% 
  solve_model(with_ROI(solver = "symphony", verbosity=1))

This model selects projects a, b, d, but given dependencies, projects a, b, c should be selected.

sbmack commented 3 months ago

With dependent binary variables you have to define the types of dependencies between the variables. The side constraints are formulated based on type. For example:

x1 requires x2: x1 <= x2 x1 – x2 <= 0

x1, x2 mutually dependent: x1 – x2 = 0

x1, x2 mutually exclusive: a + b <= 1

There are other cases.

Dependency constraints with more than 2 variables can be chained together. The specific constraint you need will be a function of the dependencies among a, b, c, d. Edit your question to include that info for more help.

RKonstantinR commented 3 months ago

Thanks for the reply!

Dependency constraints with more than 2 variables can be chained together. The specific constraint you need will be a function of the dependencies among a, b, c, d. Edit your question to include that info for more help.

In my case, projects that have a 1 in the dependency matrix are mutually dependent on each other. I need to add a constraint that the sum of dependencies for each row of a matrix containing only selected projects must be equal to 1.

The problem is that I don't understand how to properly include this constraint in the model.

full matrix (normalized)
  a b   c    d  Sum Profit
a 0 0.5 0.5  0    1 100
b 1 0   0    0    1 10
c 1 0   0    0    1 5 
d 0 0   0    0    0 50

subset matrix containing 3 projects with maximum profit without taking into account mutual dependence
  a b   d    Sum
a 0 0.5 0    0.5 < 1 - error
b 1 0   0    1 = 1 - ok
d 0 0   0    0 = 1 - ok

subset matrix containing 3 projects with maximum profit with taking into account mutual dependence
  a b   c    Sum
a 0 0.5 0.5  1 = 1 - ok
b 1 0   0    1 = 1 - ok
c 1 0   0    1 = 1 - ok
sbmack commented 3 months ago

In my case, projects that have a 1 in the dependency matrix are mutually dependent on each other. I need to add a constraint that the sum of dependencies for each row of a matrix containing only selected projects must be equal to 1.

The problem is that I don't understand how to properly include this constraint in the model.

full matrix (normalized)
  a b   c    d  Sum Profit
a 0 0.5 0.5  0    1 100
b 1 0   0    0    1 10
c 1 0   0    0    1 5 
d 0 0   0    0    0 50

subset matrix containing 3 projects with maximum profit without taking into account mutual dependence
  a b   d    Sum
a 0 0.5 0    0.5 < 1 - error
b 1 0   0    1 = 1 - ok
d 0 0   0    0 = 1 - ok

subset matrix containing 3 projects with maximum profit with taking into account mutual dependence
  a b   c    Sum
a 0 0.5 0.5  1 = 1 - ok
b 1 0   0    1 = 1 - ok
c 1 0   0    1 = 1 - ok

Again, you have to specify the type of each dependency because it determines the signs of the coefficients, the type of inequality/equality and the value of the right hand side (RHS). So for example row 1 in your matrix, what is the dependency relationship between b and c? And also the other rows?

RKonstantinR commented 3 months ago

Again, you have to specify the type of each dependency because it determines the signs of the coefficients, the type of inequality/equality and the value of the right hand side (RHS). So for example row 1 in your matrix, what is the dependency relationship between b and c? And also the other rows?

Projects depend on each other if the matrix at their intersection has a value greater than 0. For example:

First row (project a):
  a b   c    d   Sum
a 0 0.5 0.5  0   1 - project "a" depends on project "b" and "c"

Second row (project b):
  a b   c    d   Sum
b 1 0   0    0   1 - project "b" depends on project "a"

Last row (project d):
  a b   c    d  Sum
d 0 0   0    0  0 - project "d" does not depend on any project

For example, let’s take three projects: “a”, “b”, “d” and do this check manually. To do this, you need to take the original matrix, exclude the row and column with project “с” and calculate the sum for each row.

  a b   d    Sum
a 0 0.5 0    0.5 < 1 - error
b 1 0   0    1 = 1 - ok
d 0 0   0    0 = 1 - ok

Project "a" depends on project "c" and if it is excluded, the row sum for project "a" will be less than 1.

sbmack commented 3 months ago

Projects depend on each other if the matrix at their intersection has a value greater than 0. For example:

But exactly HOW do they depend on each other?

If project a requires project b and project c then that maps to case 1 above: x1 requires x2: x1 <= x2 x1 – x2 <= 0

So for the a requires b AND c case could be written as:

a - b <= 0 a - c <= 0 or 2*a - b - c <= 0

So you can see that the coefficients, inequality and RHS are dependent on the type of constraint.

RKonstantinR commented 3 months ago

So you can see that the coefficients, inequality and RHS are dependent on the type of constraint.

Thank you for your help and patience.

This is my first experience using this package, as well as designing MILP models, so a number of my questions may be quite primitive or I may be mistaken.

The inequality you proposed for project "b" correctly describes the dependencies.

But this is only true for one project; for other projects it will be different.

What if there are 1000 such projects? for each of them is it necessary to write down such an inequality?

Or should I specify a matrix as the decision variable?

sbmack commented 3 months ago

Or should I specify a matrix as the decision variable?

You could use matrices with 1 matrix block for each constraint type. The values in a matrix are the coefficients of the contraints. So with: the a requires b AND c case could be written as:

a - b <= 0 a - c <= 0 or 2*a - b - c <= 0

Better to use the 2 constraint version with the coefficients being:

a b c d 1, -1, 0, 0 1, 0, -1, 0

For the block of constraints the inequality would be <= and the RHS would be 0

RKonstantinR commented 3 months ago

For the block of constraints the inequality would be <= and the RHS would be 0

Thanks to your hint, I improved the coefficient matrix and constrain type.

The a requires b AND c and they are completely dependent on each other: a + b + c == 1 or 1 - a - b - c == 0

Coefficients would be:

coef_matrix <- matrix(
  c(1/3, 1/3, 1/3, 0, 0.5, 0.5, 0, 0, 0.5, 0, 0.5, 0, 0, 0, 0, 1), 
  nrow = 4,   
  ncol = 4,        
  byrow = TRUE         
)

rownames(coef_matrix ) <- c("a", "b", "c", "d")

colnames(coef_matrix ) <- c("a", "b", "c", "d")

Then I try to set the constrain:

sum(coef_matrix[,1]) + sum(coef_matrix[,2]) + sum(coef_matrix[,3]) == max_project
[1] TRUE
sum(coef_matrix[,1]) + sum(coef_matrix[,2]) + sum(coef_matrix[,4]) == max_project
[1] FALSE

But I get an error Error in add_constraint(): ! Some constraints are not proper linear constraints or are not true.:

library(dplyr)
library(ROI)
library(ROI.plugin.symphony)
library(ompr)
library(ompr.roi)
library(ROI.plugin.glpk)

project_list <- tibble(name = c("a", "b", "c", "d"), profit = c(100, 10, 5, 50)) %>% 
  mutate(item = row_number())

max_project <- 3

coef_matrix <- matrix(
  c(1/3, 1/3, 1/3, 0, 0.5, 0.5, 0, 0, 0.5, 0, 0.5, 0, 0, 0, 0, 1), 
  nrow = 4,   
  ncol = 4,        
  byrow = TRUE         
)

rownames(coef_matrix) <- c("a", "b", "c", "d")
colnames(coef_matrix) <- c("a", "b", "c", "d")

m <- MIPModel() %>%
  add_variable(x[i], i=project_list$item, type="binary") %>%
  add_constraint(sum_over(coef_matrix[,i], i = project_list$item) == max_project) %>% 
  add_constraint(sum_over(x[i], i=project_list$item) <= max_project) %>% 
  set_objective(sum_over(project_list$profit[i] * x[i], i=project_list$item),"max") %>%
  solve_model(with_ROI(solver = "symphony", verbosity=1))
sbmack commented 3 months ago

But I get an error Error

You have a bunch of errors in your model. Here's an update:

project_list <- tibble(name = c("a", "b", "c", "d"), profit = c(100, 10, 5, 50)) %>% 
  mutate(item = row_number())

nprojs <- nrow(project_list)

max_project <- 3

# coef_matrix <- matrix(
#   c(1/3, 1/3, 1/3, 0, 0.5, 0.5, 0, 0, 0.5, 0, 0.5, 0, 0, 0, 0, 1), 
#   nrow = 4,   
#   ncol = 4,        
#   byrow = TRUE         
# )

# rownames(coef_matrix) <- c("a", "b", "c", "d")
# colnames(coef_matrix) <- c("a", "b", "c", "d")

# constraint vector:
cvec <- c(1, 1, 1, 0)

m <- MIPModel() %>%
  # add_variable(x[i], i=project_list$item, type="binary") %>%
  add_variable(x[i], i = 1:nprojs, type = 'binary') |> 
  # add_constraint(sum_over(coef_matrix[,i], i = project_list$item) == max_project) %>% 
  add_constraint(sum_over(cvec[i] * x[i], i = 1:nprojs) == 1) |> 
  # add_constraint(sum_over(x[i], i=project_list$item) <= max_project) %>% 
  add_constraint(sum_over(x[i], i = 1:nprojs) <= max_project) |> 
  # set_objective(sum_over(project_list$profit[i] * x[i], i=project_list$item),"max")
  set_objective(sum_over(project_list$profit[i] * x[i], i = 1:nprojs),"max")

  result <- solve_model(m, with_ROI(solver = "symphony", verbosity=1))
  1. You must index over a range. You indexed over a single value
  2. The range is set from 1 to the number of projects (nprojs)
  3. All constraints must be summed over the declared variable x. a, b, c are not variables
  4. The binary side constraint coefficients are contained in a vector (cvec) since there is only 1
  5. Note that your a + b + c = 1 constraint forces the solution to be less than 3 projects, so the max_project constraint in inactive
  6. I separated the model execution from the piping and set the value to result because it's easier to review post-optimization.
RKonstantinR commented 3 months ago

Thank you for your feedback.

I was able to add the required constraint to the model (https://github.com/dirkschumacher/ompr/issues/259#issue-445883878).

Here is the working code:

library(dplyr)
library(ROI)
library(ROI.plugin.symphony)
library(ompr)
library(ompr.roi)
library(ROI.plugin.glpk)

project_list <- tibble(name = c("a", "b", "c", "d"), 
                       profit = c(100, 10, 5, 50)) %>% 
  mutate(item = row_number(),
         num = 1)

max_project <- 3

coef_matrix <- matrix(
  c(1/3, 1/3, 1/3, 0, 0.5, 0.5, 0, 0, 0.5, 0, 0.5, 0, 0, 0, 0, 1), 
  nrow = 4,   
  ncol = 4,        
  byrow = TRUE         
)

rownames(coef_matrix) <- c("a", "b", "c", "d")
colnames(coef_matrix) <- c("a", "b", "c", "d")

sum_func <- function(i){
  a_sub <- coef_matrix[, i] 
  a_vec <- sum(as.vector(a_sub)) 
  return(a_vec)
}

m <- MIPModel() %>%
  add_variable(x[i], i=project_list$item, type="binary") %>%
  add_constraint(sum_over(sum_func(i) * x[i], i = project_list$item) == sum_over(project_list$num[i] * x[i], i=project_list$item)) %>%
  add_constraint(sum_over(x[i], i=project_list$item) <= max_project) %>% 
  set_objective(sum_over(project_list$profit[i] * x[i], i=project_list$item),"max")

result <- solve_model(m, with_ROI(solver = "symphony", verbosity=1))
sbmack commented 3 months ago

Thank you for your feedback.

The use of the coef_matrix does not make sense since you have only a single constraint. Moreover you stated:

The a requires b AND c and they are completely dependent on each other: a + b + c == 1 or 1 - a - b - c == 0

What you are saying there is that a OR b OR c must be selected. That means only 1 solution variable of the 3. Since d has no dependencies, you do not have to include d in a constraint. Simple visual inspection of your profit vector shows that a and d should be selected since a = 100 and d = 50. However, run the code:

get_solution(result, x[i]) and you see that a, b, c are selected which violates your intention.

Please clarify what you really want.

RKonstantinR commented 3 months ago

The model generates the right solutions for my task.

Perhaps I didn't describe the constraints quite correctly:

a + b + c + d + ... + n == 1 - This condition must be met for each row of the matrix

I will try to describe the problem in simple words without mathematical equations:

It is required to select no more than three out of four projects that will bring maximum profit, and if the projects depend on each other, then they can only be included together.

In the given example, the solution of the model is projects a, b and с. These are three interdependent projects with a total profit of 115.

If change the conditions and assign a profit of 500 to the project d, then the solution of the model will be the project d. Thus, the model takes into account the dependency matrix.

project_list$profit[4] <- 500

get_solution(result, x[i])
  variable i value
1        x 1     0
2        x 2     0
3        x 3     0
4        x 4     1
sbmack commented 3 months ago

The model generates the right solutions for my task.

That constraint type is usually solved using auxiliary binary variables:

project_list <- tibble(name = c("a", "b", "c", "d"), profit = c(100, 10, 5, 50)) %>% 
  mutate(item = row_number())

nprojs <- nrow(project_list)

max_project <- 3

cvec <- c(1, 1, 1, 0)
abcsum <-3

m <- MIPModel() %>%
  add_variable(x[i], i = 1:nprojs, type = 'binary') |>
  add_variable(abc, type = 'binary') |> 
  add_constraint(sum_over(cvec[i] * x[i], i = 1:nprojs) - abcsum * abc == 0) |> 
  add_constraint(sum_over(x[i], i = 1:nprojs) <= max_project) |> 
  set_objective(sum_over(project_list$profit[i] * x[i], i = 1:nprojs),"max")

wd <- getwd()
file_out <- paste(wd, "test.txt", sep = "/")
ROI_write(as_ROI_model(m), file_out, "lp_cplex")

result <- solve_model(m, with_ROI(solver = "symphony", verbosity=1))

In the above code the defined coefficient abcsum is k of j linked projects permitted in the solution. In this case all 3, a, b, c. If you changed abcsum to 2, only 2 out of 3 would be selected if any. abc is the auxillary binary variable. The secondary constraint ensures that the linked projects are selected as a group or not at all. Note that no maxtrix as input or link to an external function are required.

RKonstantinR commented 3 months ago

Your code works on this data, but I tried to add another group of interdependent projects (e and f) and the solution did not match my expectations.

project_list <- tibble(name = c("a", "b", "c", "d", "e", "f"), 
                                 profit = c(100, 10, 5, 50, 100, 15)) %>% 
                       mutate(item = row_number())

nprojs <- nrow(project_list)
max_project <- 3

cvec <- c(1, 1, 1, 0, 1, 1)
abcsum <- 1

I tried to use the model on real data. I have a data set for 1800 projects and a dependency matrix of 1800x1800.

I waited about 4 hours and, without waiting for an answer, stopped the calculation.

Is there a way to estimate the approximate time it will take to find a solution? Or maybe it's worth trying to use a different solver?

sbmack commented 3 months ago

Your code works on this data, but I tried to add another group of interdependent projects (e and f) and the solution did not match my expectations.

It worked for me:

> result
Status: success
Objective value: 150
> get_solution(result, x[i])
  variable i value
1        x 1     1
2        x 2     0
3        x 3     0
4        x 4     1
5        x 5     0
6        x 6     0

Your constraint says that projects 1:3, 5:6 are linked. Only one project of those 5 can be selected, so project 1 and project 4 which has no dependencies are basic. The solution is degenerate since project 5 has the same profit coefficient as project 1. A MIP problem with 1800 variables and 1800 binary constraints is considered small and should solve very quickly with any solver.

RKonstantinR commented 3 months ago

Your constraint says that projects 1:3, 5:6 are linked. Only one project of those 5 can be selected, so project 1 and project 4 which has no dependencies are basic.

There are two groups of interdependent projects in the data set (c(a, b, c) & c(e, f) and one independent project (d).

One of the constrain is “all projects from the same group or none must be selected”.

The correct solution would be to select project d and a group of projects e and f. Changing the abcsum parameter does not produce the desired result.

Model calculation parameters

Starting Preprocessing...
Preprocessing finished...
     variables fixed: 569
     variables aggregated: 568
Problem has 
     2 constraints 
     1226 variables 
     2398 nonzero coefficients

Total Presolve Time: 0.000000...

Solving...

granularity set at 1.000000
solving root lp relaxation
The LP value is: -341194.000 [0,182]
sbmack commented 3 months ago

I

There are two groups of interdependent projects in the data set (c(a, b, c) & c(e, f) and one independent project (d).

One of the constrain is “all projects from the same group or none must be selected”.

It should be obvious that you have to add a second auxiliary constraint.

RKonstantinR commented 2 months ago

It should be obvious that you have to add a second auxiliary constraint.

I modified your solution to take into account the dependency matrix. This could only be achieved using a loop. Thanks for the help.

project_list <- tibble(name = c("a", "b", "c", "d", "e", "f"), 
                       profit = c(200, 10, 5, 50, 100, 15)) %>% 
  mutate(item = row_number())

nprojs <- nrow(project_list)

max_project <- 3

coef_matrix <- matrix(
  c(1, 1, 1, 0, 0, 0, 
    1, 1, 1, 0, 0, 0,
    1, 1, 1, 0, 0, 0,
    0, 0, 0, 1, 0, 0,
    0, 0, 0, 0, 1, 1,
    0, 0, 0, 0, 1, 1), 
  nrow = 6,   
  ncol = 6,        
  byrow = TRUE         
)

m <- MIPModel() %>%
  add_variable(x[i], i = 1:nprojs, type = 'binary') |>
  add_constraint(sum_over(x[i], i = 1:nprojs) <= max_project) |> 
  set_objective(sum_over(project_list$profit[i] * x[i], i = 1:nprojs),"max")

for(j in 1:nprojs) {

  m <- add_constraint(m, sum_over(coef_matrix[i,j] * x[i], i = 1:nprojs) - sum(coef_matrix[,j]) * x[j] == 0)
}

result <- solve_model(m, with_ROI(solver = "symphony", verbosity=1))
sbmack commented 2 months ago

I modified your solution to take into account the dependency matrix.

It's not clear to me what you are trying to do. I ran the model and the solution is a, b, c = 1. In coef_matrix constraint rows 1:3 are identical and so are rows 5:6. If your intent is a + b + c <= 1 and e + f <= 1 it can be formulated like this:

project_list <- tibble(name = c("a", "b", "c", "d", "e", "f"), 
                       profit = c(200, 10, 5, 50, 100, 15)) %>% 
  mutate(item = row_number())

nprojs <- nrow(project_list)

max_project <- 3

coef_matrix <- matrix(
  c(1, 1, 1, 0, 0, 0, 
    0, 0, 0, 0, 1, 1), 
  ncol = 6,        
  byrow = TRUE         
)

crows = nrow(coef_matrix)

m <- MIPModel() %>%
  add_variable(x[i], i = 1:nprojs, type = 'binary') |>
  add_constraint(sum_over(x[i], i = 1:nprojs) <= max_project) |> 
  set_objective(sum_over(project_list$profit[i] * x[i], i = 1:nprojs),"max") |> 

  add_constraint(sum_over(coef_matrix[i, j] * x[j], j = 1:nprojs) <= 1, i = 1:crows)

result <- solve_model(m, with_ROI(solver = "symphony", verbosity=1))

You don't need a for loop. You index down rows of constraints by indexing on the RHS of the constraint set. In this case i = 1:crows.

You're welcome...