optimization: md_compute_poverty_stats()

tonyfujs commented 3 years ago

Some benchmarking of current performance. This function will run on the fly, and is currently slow, so this is a high priority for optimization.

set.seed(42)
welfare <- sort(sample(rlnorm(1000000, meanlog = 2)))
weight <-  sample(runif(1000000, min = 1, max = 100))
povline_lcu <- 3

profvis::profvis({
  headcount <- 0
  gap <- 0
  severity <- 0
  watt8 <- 0

  for (i in seq_along(welfare)) {

    weight_i <- weight[i]
    welfare_i <- welfare[i]

    if (welfare_i <= povline_lcu) {

      headcount <- sum(headcount, weight_i)
      gap_i <- 1 - welfare_i / povline_lcu
      gap <- sum(gap, weight_i * gap_i)
      severity <- sum(severity, weight_i * gap_i ^ 2)
      if (welfare_i > 0) { # Is this check needed no negative welfare value should make it to the application
        watt8 <- sum(watt8, weight_i * log(povline_lcu / welfare_i))
      }

    }
  }

  #compute the values for the return
  sum_weight <- sum(weight)

  headcount <- headcount / sum_weight
  gap <- gap / sum_weight
  severity <- severity / sum_weight
  watt8 <- if (headcount > 0) {
    watt8 <- watt8 / sum_weight
  } else {
    watt8 <- 0}
})

microbenchmark::microbenchmark(
  wbpip:::md_compute_poverty_stats(welfare     = welfare,
                                   povline_lcu = povline_lcu,
                                   weight      = weight)
)

Unit: milliseconds
 wbpip:::md_compute_poverty_stats(welfare = welfare, povline_lcu = povline,  weight = weight)
      min       lq     mean  median       uq    max neval
 633.8605 661.0954 739.2486 677.619 740.0478 1276.4   100

tonyfujs commented 3 years ago

Potential options to explore for optimization:

Re-organize function to optimize in base R
Use Rcpp to re-write the function in C++
Use package optimized for speed (collapse)

randrescastaneda commented 3 years ago

Hi Tony,

In the PR #136 , I addressed this issue. We can continue the discussion there. I'll close this issue for now.

Best,

PIP-Technical-Team / wbpip

optimization: md_compute_poverty_stats() #128