biodiverse / ubms

Fit models to data from unmarked animals using Stan. Uses a similar interface to the R package 'unmarked', while providing the advantages of Bayesian inference and allowing estimation of random effects.
https://hmecology.github.io/ubms/
GNU General Public License v3.0
35 stars 8 forks source link

Speed up gof for large models #61

Closed jniedballa closed 2 years ago

jniedballa commented 2 years ago

Hi Ken, I have very large and complex model for which gof() takes over 5 minutes per posterior sample. Below I suggest a way to speed up gof considerably for this large models.

The model contains >50000 sites and >50 occasions. There are about 16000 unique cohorts for the gof test. More than half the cohorts had only one site, and mean cohort size was 3.5.

I found that in my example, this line in mb_chisq(): https://github.com/kenkellner/ubms/blob/5cfb4e78d35186f601d0b542dc2c546bcad34ff3/R/mb_chisq.R#L15

took about 10ms for one iteration, there were 16000 iterations, and mb_chisq is called twice during each posterior sample. So that really adds up.

If the line is replaced with:

 p_sub <- p[Reduce(c,  lapply(inds[[i]], FUN = function(k){
      seq((((k - 1) * object@response@max_obs) + 1) , (k * object@response@max_obs))
    }    ))]  

it only took about 40 microseconds instead of 10 milliseconds in my example (and it makes the object site_idx obsolete). Of course the speed gains don't fully translate to gof() because of the other functions (get_exp_counts / get_obs_counts: 110/130 microseconds), but overall gof() was still much faster (10 posterior samples: 4 minutes vs. >1 hour).

Please also note that it is slower than the current code in the example in your readme (20s instead of 12s for 1000 draws). When I added some NAs to the detection matrix (resulting in 8 cohorts) it was 24s instead of 17s.

So it's not a universal solution, but may help in large models (in my case, site_idx had over 3 million entries, so naturally "%in%" got a bit slow).

Also, before that I also wrote another little hack of sim_gof() which allows parallel computation of gof across the posterior samples. But the speed gains were less noticable than the modification above and the parallelization overhead was immense for the huge model (exporting a >1Gb model to multiple cores). If it is of interest I can also share it though.

kenkellner commented 2 years ago

Thanks, I haven't done much testing on really gigantic models so it's good to have some feedback and suggestions for that situation.

Can you try the following modification of mb_chisq on your model and see if it improves on your benchmarks:

setMethod("mb_chisq", "ubmsFitOccu", function(object, state, p){
  cohorts <- ubms:::get_cohorts(object)
  dat <- cohorts$data
  inds <- cohorts$inds
  chisq <- 0
  J <- object@response@max_obs
  for (i in 1:length(inds)){
    state_sub <- state[inds[[i]]]
    idx <- c(sapply(inds[[i]], function(k) seq.int((k - 1) * J + 1, length.out=J)))
    p_sub <- p[idx]
    N <- length(state_sub)
    obs <- ubms:::get_obs_counts(dat[[i]])
    expect <- ubms:::get_exp_counts(object, obs, state_sub, p_sub)
    chisq <- chisq + sum((obs - expect)^2 / expect, na.rm=TRUE)
    chisq <- chisq + max(0, N - sum(expect, na.rm=TRUE))
  }
  chisq
})

It is a bit more readable to me and also seems to be a bit faster, but I'm only testing it on a small dataset.

Parallel processing is a good idea also but I see what you mean about passing around the giant model object. Right now the relevant methods are written to act on the model object, mainly for simplicity, but I guess a potential solution here is to re-write functions so that only the inputs actually needed (y, state, p, etc.) need to be copied. Presumably if you are using forks (i.e., on linux) instead of sockets, this is less of an issue since the model object is shared?

jniedballa commented 2 years ago

Hi Ken, apologies for a late reply, I got distracted with other things. The code you shared worked and was about 10% percent faster than the code I suggested above (in my example 200 instead of 220 seconds). The original code would have taken 1 hour + for the same task, so it's a great improvement. Thank you very much for incorporating this, it's very helpful. Best regards!

kenkellner commented 2 years ago

Excellent, thanks again for sharing the original code. I did not realize that %in% would be such a time sink.