andrjohns / StanEstimators

Estimate Parameters for Arbitrary R Functions using 'Stan'
Other
20 stars 1 forks source link

First run is always very slow to start (Windows) #1

Closed StaffanBetner closed 8 months ago

StaffanBetner commented 8 months ago

I am using R 4.3.2, and the latest version of the package.

This is how it looks when I run e.g. stan_sample for the first time in a session:

> library(StanEstimators)
> library(tictoc)
> 
> lnorm_loglik <- function(par, x){sum(dlnorm(x, meanlog = par[1], sdlog = par[2], log = TRUE))}
> x = rlnorm(100, meanlog = 1, sdlog = 1)
> tic()
> stan_sample(lnorm_loglik, par_inits = c(1,1), additional_args = list(x = x), lower = c(-Inf, 0), num_chains = 4) -> fit
method = sample (Default)
  sample
    num_samples = 1000 (Default)
    num_warmup = 1000 (Default)
    save_warmup = 0 (Default)
    thin = 1 (Default)
    adapt
      engaged = 1 (Default)
      gamma = 0.050000000000000003 (Default)
      delta = 0.80000000000000004 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
    num_chains = 4
id = 1 (Default)
data
  file = C:\Users\staff\AppData\Local\Temp\Rtmp0WIYmd\file9fec6fe01b9b.json
init = C:\Users\staff\AppData\Local\Temp\Rtmp0WIYmd\file9fecc54156.json
random
  seed = 2412832692 (Default)
output
  file = C:\Users\staff\AppData\Local\Temp\Rtmp0WIYmd\file9fec72095f97.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = profile.csv (Default)
num_threads = 1 (Default)

### Main waiting time is here!
Gradient evaluation took 0.000363 seconds
1000 transitions using 10 leapfrog steps per transition would take 3.63 seconds.
Adjust your expectations accordingly!

Gradient evaluation took 0.000271 seconds
1000 transitions using 10 leapfrog steps per transition would take 2.71 seconds.
Adjust your expectations accordingly!

Gradient evaluation took 0.000291 seconds
1000 transitions using 10 leapfrog steps per transition would take 2.91 seconds.
Adjust your expectations accordingly!

Gradient evaluation took 0.000302 seconds
1000 transitions using 10 leapfrog steps per transition would take 3.02 seconds.
Adjust your expectations accordingly!

Chain [1] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [1] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [1] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [1] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [1] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [1] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [1] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [1] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [1] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [1] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [1] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [1] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [1] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [1] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [1] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [1] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [1] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [1] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [1] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [1] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [1] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [1] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 1.562 seconds (Warm-up)
               1.236 seconds (Sampling)
               2.798 seconds (Total)

Chain [2] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [2] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [2] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [2] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [2] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [2] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [2] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [2] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [2] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [2] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [2] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [2] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [2] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [2] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [2] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [2] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [2] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [2] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [2] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [2] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [2] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [2] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 2.233 seconds (Warm-up)
               2.02 seconds (Sampling)
               4.253 seconds (Total)

Chain [3] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [3] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [3] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [3] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [3] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [3] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [3] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [3] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [3] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [3] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [3] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [3] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [3] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [3] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [3] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [3] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [3] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [3] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [3] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [3] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [3] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [3] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 1.964 seconds (Warm-up)
               1.716 seconds (Sampling)
               3.68 seconds (Total)

Chain [4] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [4] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [4] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [4] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [4] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [4] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [4] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [4] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [4] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [4] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [4] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [4] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [4] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [4] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [4] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [4] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [4] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [4] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [4] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [4] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [4] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [4] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 1.531 seconds (Warm-up)
               1.661 seconds (Sampling)
               3.192 seconds (Total)

> toc()
432.25 sec elapsed

As the log indicates, it is not the sampling that takes 7 minutes, but something else. The main waiting time is before the first "Gradient evaluation took xxx". If I run any of the functions again, it doesn't have any waiting time like this, regardless of data and loglikelihood function.

andrjohns commented 8 months ago

I've reworked the handling of multiple/parallel chains to avoid the use of a mutex/locking when accessing the R session, can you try with the current main branch?

Also just to flag that I've changed the num_threads argument to parallel_chains

StaffanBetner commented 8 months ago

The problem still persists, unfortunately. But now I can kill the function without killing the R process.

andrjohns commented 8 months ago

Should be resolved in main now

StaffanBetner commented 8 months ago

Splendid! Can confirm!