dr4kan / EmiR

Evolutionary Minimizer for R
GNU General Public License v3.0
0 stars 1 forks source link

feature request: specify a function for `constrained_method` parameter? #2

Open jeffreyhanson opened 1 week ago

jeffreyhanson commented 1 week ago

Hi,

Thank you very much for developing this package!

I'm working on solving a non-linear mixed integer programming problem, and I think this package would be very helpful. In particular, my problem has a single linear constraint and many binary variables (it's like a knapsack problem with a non-linear objective function). I'm finding that the solver generally has a difficult time generating feasible solutions, and so often ends up returning one of the initial solutions that I manually specify as the best solution. To address this, I was wondering if there was some way I could manually specify a function to tell the optimization process how to generate a feasible solution from an infeasible solution. This way I can use domain knowledge to help the optimization process generate better candidate solutions (i.e., at the step where it generates new candidate solutions in a new iteration). How does that sound?

That said, I'm only very new to using the package, so maybe I'm using the wrong settings or the wrong algorithm for my problem? For reference, I've been using the genetic algorithm (algorithm_id = "GA") solver. What do you think?

dr4kan commented 1 week ago

Hi,

I think there is no need for that as you can provide your own initial population via initial_population in minimize method. Please check ?MinimizerOpts.

Cheers, Davide.

jeffreyhanson commented 1 week ago

Thanks so much for the quick response @dr4kan!

Yeah, I've been specifying my own initial population to ensure that the starting solutions are feasible. This definitely helps! If I don't do this, then minimize() function crashes (with an error about C stack usage) -- which I'm guessing occurs because the optimization process has trouble generating feasible solutions for the initial population. However, when I manually specify an initial population -- which contains only feasible solutions -- it seems appears that the best solution found during the optimization process is often one of the starting solutions in the initial population. This suggests -- to me, but maybe I'm wrong? -- that the process for generating new solutions in a given iteration is having trouble generating feasible solutions (because an infeasible solution would receive an additional penalty that reduces the quality of the optimization process, meaning that it is unlikely to result in an improvement).

To help explain this, I've included a reprex below. Please let me know if you have any issues running it, or questions about it?

# Initialization
## set seed for reproducibility
set.seed(500)

## load required packages
library(surveyvoi)
library(assertthat)
library(EmiR)
library(scales)

## define parameters for simulating data for optimization problem
### number of planning units
n_pu <- 300
### number of species
n_f <- 10
### sparsity of species in planning units
sparsity <- 0.7
### target level of representation for species by solution
target <- 5
### specify budget as a proportion of total costs
budget_prop <- 0.05

## define functions

#' Generate a prioritization
#'
#' Generate a prioritization for protected area establishment.
#'
#' @param rij `matrix` containing the probability of each species occurring
#' within each planning unit.
#'
#' @param pu_costs `numeric` vector containing the planning unit costs.
#' within each planning unit.
#'
#' @param pu_locked_in `logical` vector indicating which planning units
#' are locked in to the solution.
#'
#' @param pu_locked_out `logical` vector indicating which planning units
#' are locked out from the solution.
#'
#' @param target `numeric` vector indicating the representation target for
#' each species.
#'
#' @param budget `numeric` value indicating the total budget for the
#' prioritization.
#'
#' @param control `list` containing parameters for configuring the
#' optimization algorithm. These are passed to `EmiR::config_algo()`.
#' Defaults to `default_algo_config()`.
#'
#' @details
#' Briefly, this function aims to identify a set of planning units (i.e.
#' candidate  locations for protection) to protect that have a high probability
#' of safeguarding each of the species. To ensure that solutions are
#' economically feasible, the planning units selected for protection should not #' exceed a total budget. The \pkg{EmiR} package is used to perform
#' the optimization process.
#'
#' @return A `list` containing (`x`) the solution and (`objval`) the objective
#' value.
#'
#' @export
generate_prioritization <- function(rij, pu_costs, pu_locked_in, pu_locked_out,
                                    target, budget,
                                    control = default_algo_config()) {
  # assert arguments are valid
  assertthat::assert_that(
    ## rij
    is.matrix(rij),
    ## pu_costs
    is.numeric(pu_costs),
    assertthat::noNA(pu_costs),
    identical(ncol(rij), length(pu_costs)),
    ## pu_locked_in
    is.logical(pu_locked_in),
    assertthat::noNA(pu_locked_in),
    identical(ncol(rij), length(pu_locked_in)),
    ## pu_locked_out
    is.logical(pu_locked_out),
    assertthat::noNA(pu_locked_out),
    identical(ncol(rij), length(pu_locked_out)),
    ## target
    is.numeric(target),
    assertthat::noNA(target),
    identical(length(target), nrow(rij)),
    ## budget
    assertthat::is.number(budget),
    assertthat::noNA(budget),
    isTRUE(budget > 0)
  )

  # validate problem feasibility
  assertthat::assert_that(
    sum(pu_costs[pu_locked_in > 0.5]) <= budget,
    msg = "cost of locked in planning units exceeds budget"
  )

  # define parameters for problem
  n_pu <- length(pu_costs)
  n_spp <- nrow(rij)
  param_idx <- which(!pu_locked_in & !pu_locked_in)
  n_params <- length(param_idx)
  params <- EmiR::parameters(
    matrix(c(0, 1, 1), nrow = 3, ncol = n_params)
  )

  # define objective function
  obj <- function(x) {
    ## generate solution (accounting for locked in planning units
    v <- logical(n_pu)
    v[pu_locked_in] <- TRUE
    v[param_idx[x > 0.5]] <- TRUE
    ## calculate results for each species and sum togeather, and
    ## then multiply by minus because EmiR only does minimization
    surveyvoi:::rcpp_expected_value_of_action(
      solution = v,
      pij = rij,
      target = target
    )
  }

  # define constraint
  constr <- EmiR::constraint(
    func = function(x) {
      sum(pu_costs[param_idx[x > 0.5]]) + sum(pu_costs[pu_locked_in]) - budget
    },
    inequality = "<="
  )

  # define config for optimization
  config <- do.call(EmiR::config_algo, control)

  # generate initial solutions
  init <- generate_initial_solutions(
    pu_costs, pu_locked_in, pu_locked_out, n = control$population_size
  )

  # validate initial solutions
  ## calculate cost of initial solutiosn
  init_cost <-
    rowSums(init * matrix(
        pu_costs[param_idx],
        nrow = control$population_size,
        ncol = length(param_idx),
        byrow = TRUE
      )
    ) + sum(pu_costs[pu_locked_in > 0.5])
  ## validation
  assertthat::assert_that(
    all(init_cost <= budget),
    msg = "cost of initial solutions exceeds budget"
  )
  assertthat::assert_that(
    all(
      vapply(
        seq_len(control$population_size), FUN.VALUE = numeric(1), function(i) {
          constr@func(init[i, ])
        }
      ) <= 0
    ),
    msg = "failed to generate valid constraint"
  )

  # run optimization
  res <- EmiR::minimize(
    algorithm_id = control$algorithm_id,
    obj_func = obj,
    config = config,
    parameters = params,
    maximize = TRUE,
    constraints = list(constr),
    constrained_method = "PENALTY",
    initial_population = init,
    seed = 500
  )

  # return result
  res
}

#' Default algorithm configuration
#'
#' This function provides default configuration settings for
#' running `EmiR::minimize()`.
#'
#' @return A `list` containing configuration settings.
#'
#' @export
default_algo_config <- function() {
  list(algorithm_id = "GA", population_size = 200, iterations = 1000)
}

#' Generate initial solutions
#'
#' @inheritParams generate_prioritization
#'
#' @param n `numeric` value indicating the number of desired solutions.
#'
#' @details
#' This is useful for generating
#' an initial population of solutions for `EmiR::minimize()`.
#'
#' @return A `matrix` containing the initial solutions.
#'
#' @export
generate_initial_solutions <- function(pu_costs, pu_locked_in,
                                       pu_locked_out, n = 1000
) {
  # initialization
  n_pu <- length(pu_costs)
  param_idx <- which(!pu_locked_in & !pu_locked_in)
  sols <- matrix(NA_real_, nrow = n, ncol = n_pu)
  sols[, which(pu_locked_in)] <- 1
  sols[, which(pu_locked_out)] <- 0

  # generate starting solutions
  for (i in seq_len(n)) {
    ## find remaining planning units
    cand_pu <- which(is.na(sols[i, ]))
    ## calculate cost of current solution
    curr_cost <- sum(sols[i, ] * pu_costs, na.rm = TRUE)
    ## remove candidate planning units that cost too much
    cand_pu <- cand_pu[which((pu_costs[cand_pu] + curr_cost) <= budget)]
    ## loop while candidates remain
    while (length(cand_pu) > 0) {
      ## randomly pick a planning unit
      curr_idx <- sample(seq_along(cand_pu), size = 1, replace = FALSE)
      curr_pu <- cand_pu[curr_idx]
      ## add it to the solution
      sols[i, curr_pu] <- 1
      ## remove selected unit
      cand_pu <- cand_pu[-curr_idx]
      ## update cost
      curr_cost <- sum(sols[i, ] * pu_costs, na.rm = TRUE)
      ## remove candidate planning units that cost too much
      cand_pu <- cand_pu[which(pu_costs[cand_pu] + curr_cost <= budget)]
    }
  }

  ## set remaining planning units with NAs to zeros
  sols[is.na(sols)] <- 0

  # return starting values for parameters
  sols[, param_idx, drop = FALSE]
}

# Preliminary processing
### these values denote the probability of each species occurring
### in each planning unit. This is generally zero-inflated and
### and skewed towards lower probability values
rij <- matrix(
  scales::rescale(exp(runif(n_pu * n_f)), to = c(0, 1)),
  nrow = n_f,
  ncol = n_pu
)
### note that we loop over each species individually to ensure that
### each species has at least one non-zero probability value
for (i in seq_len(n_f)) {
  rij[i, runif(ncol(rij)) < sparsity] <- 0
}

### these values denote the cost of each planning unit
pu_costs <- exp(runif(n_pu, 0.1, 3))

### this value denotes the total budget for the prioritization.
### the total cost of the selected planning units cannot exceed this
### threshold
budget <- sum(pu_costs) * budget_prop

### these values control if certain planning units should always
### be selected or not selected. Although this useful for real-world
### analyses, we will just set them to FALSE since we're just looking at
### simulated data
pu_locked_in <- rep(FALSE, n_pu)
pu_locked_out <- rep(FALSE, n_pu)

# Main processing
## generate prioritization by running the optimization process
res <- generate_prioritization(
  rij = rij, pu_costs = pu_costs,
  pu_locked_in = pu_locked_in, pu_locked_out = pu_locked_out,
  target = rep(target, n_f), budget = budget,
  control = list(algorithm_id = "GA", population_size = 200, iterations = 10000)
)
#> 0%   10   20   30   40   50   60   70   80   90   100%
#> [----|----|----|----|----|----|----|----|----|----|
#> **************************************************|
#> 163.815 sec elapsed

## view summary of best solution at each iteration
print(summary(res@cost_history))
#>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#>  3.394   3.394   3.394   3.394   3.394   3.394

## we can see that the cost history has the same minimum and maximum value,
## indicating that the best solution found over the course of the entire
## optimization is simply one of the initial solutions

## if we try running this with a higher `budget_prop` value
## (e.g., `budget_prop = 0.7), then the res@cost_history shows an improvement
## in the best solution over subsequent iterations. Since this issue
## only occurs when `budget_prop` is low (i.e., when the criteria for
## feasibility is more strict; e.g., a random generated solution is less likely
## to be feasible), this suggests that the issue is because the optimization
## process is unable to generate new feasible solutions when a low `budget_prop`
## parameter is used.

Also, here's my session information.

R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /opt/R/R-4.3.3/lib/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Pacific/Auckland
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] scales_1.3.0          PoissonBinomial_1.2.6 EmiR_1.0.4           
 [4] assertthat_0.2.1      surveyvoi_1.0.6       nloptr_2.0.3         
 [7] sf_1.0-16             Matrix_1.6-5          testthat_3.2.1       
[10] devtools_2.4.5        usethis_2.2.2        

loaded via a namespace (and not attached):
 [1] class_7.3-22       KernSmooth_2.23-22 stringi_1.8.3      lattice_0.22-5    
 [5] digest_0.6.35      magrittr_2.0.3     grid_4.3.3         pkgload_1.3.4     
 [9] fastmap_1.1.1      pkgbuild_1.4.3     sessioninfo_1.2.2  e1071_1.7-14      
[13] brio_1.1.4         DBI_1.2.2          urlchecker_1.0.1   promises_1.2.1    
[17] purrr_1.0.2        Rdpack_2.6         cli_3.6.2          shiny_1.8.0       
[21] rlang_1.1.3        rbibutils_2.2.16   units_0.8-5        munsell_0.5.0     
[25] ellipsis_0.3.2     remotes_2.4.2.1    cachem_1.0.8       tools_4.3.3       
[29] memoise_2.0.1      colorspace_2.1-0   mathjaxr_1.6-0     httpuv_1.6.13     
[33] vctrs_0.6.5        R6_2.5.1           mime_0.12          proxy_0.4-27      
[37] lifecycle_1.0.4    classInt_0.4-10    stringr_1.5.1      tictoc_1.2        
[41] fs_1.6.3           htmlwidgets_1.6.4  miniUI_0.1.1.1     later_1.3.2       
[45] glue_1.7.0         profvis_0.3.8      Rcpp_1.0.12        xtable_1.8-4      
[49] htmltools_0.5.7    compiler_4.3.3 
dr4kan commented 4 days ago

Hi,

I see the problem now. I added the possibility of specifying a custom generator function for new solution. I will do some tests and upload to CRAN in few days.

Cheers, Davide

jeffreyhanson commented 4 days ago

Brilliant - thanks so much @dr4kan!! I really appreciate your efforts. If it would be helpful, I'm happy to help with testing before you submit to CRAN? Would you mind providing (or linking to) an an example on how to use it? Sorry, I couldn't find an example in the recent changes.