Closed jniedballa closed 2 weeks ago
This definitely can be a problem. It's not quite as simple as just dropping log_lik from the parameter list, since it is required for loo/waic and many functions as you note are expecting that output. You might want to take a look at this pull request - as part of work on K-fold cross validation I wrote code to calculate the log likelihood outside Stan (meaning it doesn't need to be stored in the model object). I haven't merged it because I wanted to test it a bit more but got distracted with other things.
Thank you. Sampling concluded, but then stan_occu() failed with an error:
Error in lp + Z %*% t(b) : non-conformable arrays
I only tried with the big model containing several random effects and did not investigate further. Please let me know if I can help find the issue.
Thanks for checking it out, good thing I didn't merge it! I'll do some tests on my end first to see if I can replicate.
I can't replicate it. I've tried models with multiple random effects, random intercepts and slopes, etc. Can you describe the model structure and/or maybe show the formula setup you are using so I can try to match it more closely?
The error seems to only occur when there is at least one NA in the detection history and there is a random effect in the detection part of the model. Example below:
library(ubms)
set.seed(123)
dat_occ <- data.frame(x1=rnorm(500))
dat_p <- data.frame(x2=rnorm(500*5))
y <- matrix(NA, 500, 5)
z <- rep(NA, 500)
b <- c(0.4, -0.5, 0.3, 0.5)
re_fac <- factor(sample(letters[1:26], 500, replace=T))
dat_occ$group <- re_fac
re <- rnorm(26, 0, 1.2)
re_idx <- as.numeric(re_fac)
idx <- 1
for (i in 1:500){
z[i] <- rbinom(1,1, plogis(b[1] + b[2]*dat_occ$x1[i] + re[re_idx[i]]))
for (j in 1:5){
y[i,j] <- z[i]*rbinom(1,1,
plogis(b[3] + b[4]*dat_p$x2[idx]))
idx <- idx + 1
}
}
umf <- unmarkedFrameOccu(y=y, siteCovs=dat_occ, obsCovs=dat_p)
# add NA
y[1,1] <- NA
dat_p$x2[1] <- NA
umf_na <- unmarkedFrameOccu(y=y, siteCovs=dat_occ, obsCovs=dat_p)
# works when no NAs present
(fm4 <- stan_occu(~ (1|group) ~1, umf, chains=3, cores=3, iter = 200))
# fails with one NA present
(fm4 <- stan_occu(~ (1|group) ~1, umf_na, chains=3, cores=3, iter = 200))
Error in lp + Z %*% t(b) : non-conformable arrays
That's very helpful, thanks. Of course it's an NA thing.
This should be fixed in 4cef59d0d2.
Thank you, it works now. The saved models are also much smaller (about 20% of what it was in the big model mentioned above).
Great. I've also now added an argument log_lik
which controls whether the posteriors of the parameter are saved.
Nice, thank you very much. Does log_lik being FALSE by default now affect other functions in the package, or break existing code?
setting log_lik to FALSE causes ubms to calculate the log-likelihood from the posteriors, rather than saving and using the values from Stan. So, it shouldn't affect other functions (although there may be very small differences in values due to rounding etc). Models with a spatial random effect will ignore the argument since they aren't supported yet, same thing with stan_distsamp.
Hi Ken, saving a large stan_occu() model (>100k sites, 6 occasions, 2500 iterations without warmup) results in an .RData file of almost 10Gb size, a bit unwieldy on an average computer and also very slow to process (even printing the model summary can take several minutes). It is caused by the posterior samples for log_lik in each chain within model@stanfit@sim$samples. There are around 100 actual model parameters in my example, but almost 90,000 log_lik parameters.
Returning log_lik is currently hard-coded and one cannot pass the pars argument to stan() via ... since stan_occu() defines it already. Removing log_lik from the posterior samples manually breaks print() and possibly other methods, so that's not an option either.
So, could computing / returning log_lik be made optional please to avoid such huge model files? From what I understand Stan does not automatically compute and store the log-likelihood, so it might be nice if ubms would also give that option (returning log_lik should be on by default though in my opinion).
Thank you!