Open jeffreyhanson opened 1 week ago
Hi,
I think there is no need for that as you can provide your own initial population via initial_population
in minimize
method. Please check ?MinimizerOpts
.
Cheers, Davide.
Thanks so much for the quick response @dr4kan!
Yeah, I've been specifying my own initial population to ensure that the starting solutions are feasible. This definitely helps! If I don't do this, then minimize()
function crashes (with an error about C stack usage) -- which I'm guessing occurs because the optimization process has trouble generating feasible solutions for the initial population. However, when I manually specify an initial population -- which contains only feasible solutions -- it seems appears that the best solution found during the optimization process is often one of the starting solutions in the initial population. This suggests -- to me, but maybe I'm wrong? -- that the process for generating new solutions in a given iteration is having trouble generating feasible solutions (because an infeasible solution would receive an additional penalty that reduces the quality of the optimization process, meaning that it is unlikely to result in an improvement).
To help explain this, I've included a reprex below. Please let me know if you have any issues running it, or questions about it?
# Initialization
## set seed for reproducibility
set.seed(500)
## load required packages
library(surveyvoi)
library(assertthat)
library(EmiR)
library(scales)
## define parameters for simulating data for optimization problem
### number of planning units
n_pu <- 300
### number of species
n_f <- 10
### sparsity of species in planning units
sparsity <- 0.7
### target level of representation for species by solution
target <- 5
### specify budget as a proportion of total costs
budget_prop <- 0.05
## define functions
#' Generate a prioritization
#'
#' Generate a prioritization for protected area establishment.
#'
#' @param rij `matrix` containing the probability of each species occurring
#' within each planning unit.
#'
#' @param pu_costs `numeric` vector containing the planning unit costs.
#' within each planning unit.
#'
#' @param pu_locked_in `logical` vector indicating which planning units
#' are locked in to the solution.
#'
#' @param pu_locked_out `logical` vector indicating which planning units
#' are locked out from the solution.
#'
#' @param target `numeric` vector indicating the representation target for
#' each species.
#'
#' @param budget `numeric` value indicating the total budget for the
#' prioritization.
#'
#' @param control `list` containing parameters for configuring the
#' optimization algorithm. These are passed to `EmiR::config_algo()`.
#' Defaults to `default_algo_config()`.
#'
#' @details
#' Briefly, this function aims to identify a set of planning units (i.e.
#' candidate locations for protection) to protect that have a high probability
#' of safeguarding each of the species. To ensure that solutions are
#' economically feasible, the planning units selected for protection should not #' exceed a total budget. The \pkg{EmiR} package is used to perform
#' the optimization process.
#'
#' @return A `list` containing (`x`) the solution and (`objval`) the objective
#' value.
#'
#' @export
generate_prioritization <- function(rij, pu_costs, pu_locked_in, pu_locked_out,
target, budget,
control = default_algo_config()) {
# assert arguments are valid
assertthat::assert_that(
## rij
is.matrix(rij),
## pu_costs
is.numeric(pu_costs),
assertthat::noNA(pu_costs),
identical(ncol(rij), length(pu_costs)),
## pu_locked_in
is.logical(pu_locked_in),
assertthat::noNA(pu_locked_in),
identical(ncol(rij), length(pu_locked_in)),
## pu_locked_out
is.logical(pu_locked_out),
assertthat::noNA(pu_locked_out),
identical(ncol(rij), length(pu_locked_out)),
## target
is.numeric(target),
assertthat::noNA(target),
identical(length(target), nrow(rij)),
## budget
assertthat::is.number(budget),
assertthat::noNA(budget),
isTRUE(budget > 0)
)
# validate problem feasibility
assertthat::assert_that(
sum(pu_costs[pu_locked_in > 0.5]) <= budget,
msg = "cost of locked in planning units exceeds budget"
)
# define parameters for problem
n_pu <- length(pu_costs)
n_spp <- nrow(rij)
param_idx <- which(!pu_locked_in & !pu_locked_in)
n_params <- length(param_idx)
params <- EmiR::parameters(
matrix(c(0, 1, 1), nrow = 3, ncol = n_params)
)
# define objective function
obj <- function(x) {
## generate solution (accounting for locked in planning units
v <- logical(n_pu)
v[pu_locked_in] <- TRUE
v[param_idx[x > 0.5]] <- TRUE
## calculate results for each species and sum togeather, and
## then multiply by minus because EmiR only does minimization
surveyvoi:::rcpp_expected_value_of_action(
solution = v,
pij = rij,
target = target
)
}
# define constraint
constr <- EmiR::constraint(
func = function(x) {
sum(pu_costs[param_idx[x > 0.5]]) + sum(pu_costs[pu_locked_in]) - budget
},
inequality = "<="
)
# define config for optimization
config <- do.call(EmiR::config_algo, control)
# generate initial solutions
init <- generate_initial_solutions(
pu_costs, pu_locked_in, pu_locked_out, n = control$population_size
)
# validate initial solutions
## calculate cost of initial solutiosn
init_cost <-
rowSums(init * matrix(
pu_costs[param_idx],
nrow = control$population_size,
ncol = length(param_idx),
byrow = TRUE
)
) + sum(pu_costs[pu_locked_in > 0.5])
## validation
assertthat::assert_that(
all(init_cost <= budget),
msg = "cost of initial solutions exceeds budget"
)
assertthat::assert_that(
all(
vapply(
seq_len(control$population_size), FUN.VALUE = numeric(1), function(i) {
constr@func(init[i, ])
}
) <= 0
),
msg = "failed to generate valid constraint"
)
# run optimization
res <- EmiR::minimize(
algorithm_id = control$algorithm_id,
obj_func = obj,
config = config,
parameters = params,
maximize = TRUE,
constraints = list(constr),
constrained_method = "PENALTY",
initial_population = init,
seed = 500
)
# return result
res
}
#' Default algorithm configuration
#'
#' This function provides default configuration settings for
#' running `EmiR::minimize()`.
#'
#' @return A `list` containing configuration settings.
#'
#' @export
default_algo_config <- function() {
list(algorithm_id = "GA", population_size = 200, iterations = 1000)
}
#' Generate initial solutions
#'
#' @inheritParams generate_prioritization
#'
#' @param n `numeric` value indicating the number of desired solutions.
#'
#' @details
#' This is useful for generating
#' an initial population of solutions for `EmiR::minimize()`.
#'
#' @return A `matrix` containing the initial solutions.
#'
#' @export
generate_initial_solutions <- function(pu_costs, pu_locked_in,
pu_locked_out, n = 1000
) {
# initialization
n_pu <- length(pu_costs)
param_idx <- which(!pu_locked_in & !pu_locked_in)
sols <- matrix(NA_real_, nrow = n, ncol = n_pu)
sols[, which(pu_locked_in)] <- 1
sols[, which(pu_locked_out)] <- 0
# generate starting solutions
for (i in seq_len(n)) {
## find remaining planning units
cand_pu <- which(is.na(sols[i, ]))
## calculate cost of current solution
curr_cost <- sum(sols[i, ] * pu_costs, na.rm = TRUE)
## remove candidate planning units that cost too much
cand_pu <- cand_pu[which((pu_costs[cand_pu] + curr_cost) <= budget)]
## loop while candidates remain
while (length(cand_pu) > 0) {
## randomly pick a planning unit
curr_idx <- sample(seq_along(cand_pu), size = 1, replace = FALSE)
curr_pu <- cand_pu[curr_idx]
## add it to the solution
sols[i, curr_pu] <- 1
## remove selected unit
cand_pu <- cand_pu[-curr_idx]
## update cost
curr_cost <- sum(sols[i, ] * pu_costs, na.rm = TRUE)
## remove candidate planning units that cost too much
cand_pu <- cand_pu[which(pu_costs[cand_pu] + curr_cost <= budget)]
}
}
## set remaining planning units with NAs to zeros
sols[is.na(sols)] <- 0
# return starting values for parameters
sols[, param_idx, drop = FALSE]
}
# Preliminary processing
### these values denote the probability of each species occurring
### in each planning unit. This is generally zero-inflated and
### and skewed towards lower probability values
rij <- matrix(
scales::rescale(exp(runif(n_pu * n_f)), to = c(0, 1)),
nrow = n_f,
ncol = n_pu
)
### note that we loop over each species individually to ensure that
### each species has at least one non-zero probability value
for (i in seq_len(n_f)) {
rij[i, runif(ncol(rij)) < sparsity] <- 0
}
### these values denote the cost of each planning unit
pu_costs <- exp(runif(n_pu, 0.1, 3))
### this value denotes the total budget for the prioritization.
### the total cost of the selected planning units cannot exceed this
### threshold
budget <- sum(pu_costs) * budget_prop
### these values control if certain planning units should always
### be selected or not selected. Although this useful for real-world
### analyses, we will just set them to FALSE since we're just looking at
### simulated data
pu_locked_in <- rep(FALSE, n_pu)
pu_locked_out <- rep(FALSE, n_pu)
# Main processing
## generate prioritization by running the optimization process
res <- generate_prioritization(
rij = rij, pu_costs = pu_costs,
pu_locked_in = pu_locked_in, pu_locked_out = pu_locked_out,
target = rep(target, n_f), budget = budget,
control = list(algorithm_id = "GA", population_size = 200, iterations = 10000)
)
#> 0% 10 20 30 40 50 60 70 80 90 100%
#> [----|----|----|----|----|----|----|----|----|----|
#> **************************************************|
#> 163.815 sec elapsed
## view summary of best solution at each iteration
print(summary(res@cost_history))
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 3.394 3.394 3.394 3.394 3.394 3.394
## we can see that the cost history has the same minimum and maximum value,
## indicating that the best solution found over the course of the entire
## optimization is simply one of the initial solutions
## if we try running this with a higher `budget_prop` value
## (e.g., `budget_prop = 0.7), then the res@cost_history shows an improvement
## in the best solution over subsequent iterations. Since this issue
## only occurs when `budget_prop` is low (i.e., when the criteria for
## feasibility is more strict; e.g., a random generated solution is less likely
## to be feasible), this suggests that the issue is because the optimization
## process is unable to generate new feasible solutions when a low `budget_prop`
## parameter is used.
Also, here's my session information.
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /opt/R/R-4.3.3/lib/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Pacific/Auckland
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scales_1.3.0 PoissonBinomial_1.2.6 EmiR_1.0.4
[4] assertthat_0.2.1 surveyvoi_1.0.6 nloptr_2.0.3
[7] sf_1.0-16 Matrix_1.6-5 testthat_3.2.1
[10] devtools_2.4.5 usethis_2.2.2
loaded via a namespace (and not attached):
[1] class_7.3-22 KernSmooth_2.23-22 stringi_1.8.3 lattice_0.22-5
[5] digest_0.6.35 magrittr_2.0.3 grid_4.3.3 pkgload_1.3.4
[9] fastmap_1.1.1 pkgbuild_1.4.3 sessioninfo_1.2.2 e1071_1.7-14
[13] brio_1.1.4 DBI_1.2.2 urlchecker_1.0.1 promises_1.2.1
[17] purrr_1.0.2 Rdpack_2.6 cli_3.6.2 shiny_1.8.0
[21] rlang_1.1.3 rbibutils_2.2.16 units_0.8-5 munsell_0.5.0
[25] ellipsis_0.3.2 remotes_2.4.2.1 cachem_1.0.8 tools_4.3.3
[29] memoise_2.0.1 colorspace_2.1-0 mathjaxr_1.6-0 httpuv_1.6.13
[33] vctrs_0.6.5 R6_2.5.1 mime_0.12 proxy_0.4-27
[37] lifecycle_1.0.4 classInt_0.4-10 stringr_1.5.1 tictoc_1.2
[41] fs_1.6.3 htmlwidgets_1.6.4 miniUI_0.1.1.1 later_1.3.2
[45] glue_1.7.0 profvis_0.3.8 Rcpp_1.0.12 xtable_1.8-4
[49] htmltools_0.5.7 compiler_4.3.3
Hi,
I see the problem now. I added the possibility of specifying a custom generator function for new solution. I will do some tests and upload to CRAN in few days.
Cheers, Davide
Brilliant - thanks so much @dr4kan!! I really appreciate your efforts. If it would be helpful, I'm happy to help with testing before you submit to CRAN? Would you mind providing (or linking to) an an example on how to use it? Sorry, I couldn't find an example in the recent changes.
Hi,
Thank you very much for developing this package!
I'm working on solving a non-linear mixed integer programming problem, and I think this package would be very helpful. In particular, my problem has a single linear constraint and many binary variables (it's like a knapsack problem with a non-linear objective function). I'm finding that the solver generally has a difficult time generating feasible solutions, and so often ends up returning one of the initial solutions that I manually specify as the best solution. To address this, I was wondering if there was some way I could manually specify a function to tell the optimization process how to generate a feasible solution from an infeasible solution. This way I can use domain knowledge to help the optimization process generate better candidate solutions (i.e., at the step where it generates new candidate solutions in a new iteration). How does that sound?
That said, I'm only very new to using the package, so maybe I'm using the wrong settings or the wrong algorithm for my problem? For reference, I've been using the genetic algorithm (
algorithm_id = "GA"
) solver. What do you think?