cvxgrp / CVXR

An R modeling language for convex optimization problems.
https://cvxr.rbind.io/
Apache License 2.0
206 stars 32 forks source link

Inaccurate timing results #91

Open dbackenroth opened 3 years ago

dbackenroth commented 3 years ago

Thank you for developing the great CVXR package. The timing stats provided by CVXR seem to be drastically underestimating the amount of time required for problem solution. 

Here is my script:

library(CVXR)
library(MASS)

Test <- function(n) {
  n_ds <- 1
  r_sq <- runif(n_ds, min = 0.05, max = 0.15)
  prev <- sample(c(0.005, 0.01, 0.05, 0.1), size = n_ds, replace = T)
  C <- GetCostMatrix(prev = prev, r_sq = r_sq, n)
  P <- GetPMatrix(C = C, m = round(n * 2/3))
}

GetPMatrix <- function(C, m) {
  n <- nrow(C)
  P <- Variable(rows = n, cols = n, name = "P", symmetric = T)
  ones <- matrix(rep(1, n))
  constraints <- list(
    P <= 1,
    0 <= P,
    P %*% ones == m * ones
  )
  objective <- Minimize(sum(C * P))
  problem <- Problem(objective, constraints = constraints)
  a <- Sys.time()
  result <- psolve(problem, solver = "ECOS", verbose = F)
  print("Sys.time timing:")
  print(Sys.time() - a)
  print("psolve timing:")
  print(result$solve_time + result$setup_time)
  P_res <- result$getValue(P)
  return(P_res)
}

GetCostMatrix <- function(prev, r_sq, n) {
  n_ds <- length(prev)
  z_k <- qnorm(prev, lower.tail = F)
  scores <- mvrnorm(n = n, mu = rep(0, n_ds), Sigma = diag(r_sq, nrow = length(r_sq)))
  grid <- expand.grid(m = 1:n, w = 1:n)
  mean_scores <- (scores[grid$m, , drop = F] + scores[grid$w, , drop = F]) / 2
  # 1 - Phi((z_k - c) / sqrt(1 - r^2_ps / 2))

  p_d <- matrix(NA, nrow = nrow(grid), ncol = n_ds)
  for (i in 1:n_ds) {
    p_d[, i] <- 1 - pnorm((z_k[i] - mean_scores[, i, drop = F]) / sqrt(1 - r_sq[i] / 2))
  }
  p_no_d <- apply(1 - p_d, 1, prod)
  no_d_matrix <- matrix(p_no_d, nrow = n, ncol = n, byrow = F)
}

I print out the time using both Sys.time() and also using the solve_time and setup_time variables returned by CVXR::psolve. It appears that time is both much higher with Sys.time() than with solve_time and setup_time, and it also increases at a much faster rate with problem size (# rows in symmetric matrix which we want to optimize) as measured by Sys.time() then by using solve_time and setup_time. 

Is there some other operation done by CVXR::psolve other than solve and setup, and is there anything that can reduce how long that operation takes?

Test(100) [1] "Sys.time timing:" Time difference of 1.458917 secs [1] "psolve timing:" [1] 0.05601097 Test(200) [1] "Sys.time timing:" Time difference of 16.15535 secs [1] "psolve timing:" [1] 0.3218943 Test(400) [1] "Sys.time timing:" Time difference of 3.830424 mins [1] "psolve timing:" [1] 1.831462

bnaras commented 3 years ago

This is a valid point. The timings returned by the psolve refer to the solver setup and solver solution times. It does not tell anything about CVXR setup time etc. which can substantial as that is pure R code.