florianhartig / BayesianTools

General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics
https://cran.r-project.org/web/packages/BayesianTools/index.html
115 stars 29 forks source link

Issue with calling summary on a model with only a single parameter #207

Closed aejensen closed 2 years ago

aejensen commented 3 years ago

Hi,

I'm experiencing an error with the summary function when the log-likelihood only contains a single parameter. Below is a simple example where I wish to estimate the mean of a normal distribution. In the first model (ll1 and out1) I fix the standard deviation to its true value, hence I only have one parameter in the posterior. In the second model (ll2 and out2) I sample the posterior of both mu and sigma.

The former model with a single parameter produces an error when calling summary on the MCMC-output while the latter returns the correct output. Specifically the error is:

> summary(out1)
# # # # # # # # # # # # # # # # # # # # # # # # # 
## MCMC chain summary ## 
# # # # # # # # # # # # # # # # # # # # # # # # # 

# MCMC sampler:  Metropolis 
# Nr. Chains:  1 
# Iterations per chain:  5000 
Error in UseMethod("rejectionRate") : 
  no applicable method for 'rejectionRate' applied to an object of class "c('double', 'numeric')"

Here is a short example that should reproduce the error. I'm using R version 4.0.1 (2020-06-06) and BayesianTools version 0.1.7.

library(BayesianTools)

set.seed(1234)
y <- rnorm(1000, 25, 0.5)

ll1 <- function(param){
  sum(dnorm(y, mean = param[1], sd = 0.5, log = TRUE))
}

ll2 <- function(param){
  sum(dnorm(y, mean = param[1], sd = param[2], log = TRUE))
}

settings = list(iterations = 5*1000,  message = TRUE)

setup1 <- createBayesianSetup(ll1, lower = c(-500), upper = c(500))
set.seed(1234)
out1 <- runMCMC(bayesianSetup = setup1, sampler = "Metropolis", settings = settings)

setup2 <- createBayesianSetup(ll2, lower = c(-500, 0), upper = c(500, 1))
set.seed(1234)
out2 <- runMCMC(bayesianSetup = setup2, sampler = "Metropolis", settings = settings)

summary(out1) #this one fails
summary(out2) #this one is okay

A similar error occurs if I use a different sampler. If I e.g., use

out1 <- runMCMC(bayesianSetup = setup1, sampler = "DE", settings = settings)
out2 <- runMCMC(bayesianSetup = setup2, sampler = "DE", settings = settings)

instead of Metroplis the error becomes the following

> summary(out1)
Error in if (plot == T & !is.na(diag$mpsrf)) do.call(gP, as.list(match.call())) : 
  argument is of length zero

Thanks a lot for a great package!

florianhartig commented 2 years ago

Hey, sorry, thanks for reporting this, and sorry for the late reply. Yes, there are a few issues with 1-d functions. We will fix this in the next release!

@RobertBosek, as an "exercise" about how to deal with issues

a) check with the development version (from GitHub) if this is still creating an error (if so, confirm here). b) try to find out what the reason is (use the traceback in Rstudio to estimate the approximate location, set a breakpoint, and then use debugger in Studio to go through the code line by line) c) describe the reason here d) suggest a possible fix

RobertBosek commented 2 years ago

In development version 0.1.7.1 the bug for Metropolis sampler is fixed, also the bug concerning gelman Diagnostics with one parameter and DE-sampler, but there still remains a bug for the latter, reproducible with @aejensen 's out1 with DE-sampler on development version 0.1.7.1.

Due to multiple chains coda::rejectionRate.mcmc.list function is called (in summary.mcmcSampler l.64) and the apply function within, responsible for applying rejectionRate.mcmc on every single chain, cant be executed. Problem is that with n one parameter mcmc chains getting passed to apply(sapply(..)) the X within the apply function is 1xn dimensions and seems to get reduced in advance to a list without dimensions, making the dimension check fail, which then stops the execution. (line 4 of base::apply).

suggested bugfix: wrapper in summary function that handles passing the single chains directly to coda::recetionRate.mcmc

florianhartig commented 2 years ago

OK, thanks for looking at this. So if I understand correctly, the bug is actually in coda::rejectionRate.mcmc.list, which fails for 1-d MCMC lists?

If so, I agree with the solution, but I would prefer to introduce a new S3 function getRejectionRate that handles the switch, instead of doing this in summary(). Reason: we might have other functions that will want to call the rejection rate, it's preferable to have one interface for this.

So, introduce a new S3 function getRejectionRate as in https://github.com/florianhartig/DHARMa/blob/a65d52a1f685f917091aca26540ecbdc79105f46/DHARMa/R/compatibility.R#L57, create a default which calls Coda, and then create getRejectionRate.mcmc.list and apply the solution you suggested there.

Alternatively, we could implement the calculation of the rejection rate ourselves. This reduces the dependency on coda and might be more efficient. Have a look at how coda is calculating this. I suppose it just boils down to do

length(unique(x))/length(x)

for each parameter dimension? Not sure if there is a trick to make this more efficient for large vectors, see also https://stackoverflow.com/questions/18056465/count-number-of-distinct-values-in-a-vector

florianhartig commented 2 years ago

Note that this also works, surprisingly

settings = list(iterations = 5*1000, message = TRUE, nrChains = 2) setup1 <- createBayesianSetup(ll1, lower = c(-500), upper = c(500)) out1 <- runMCMC(bayesianSetup = setup1, sampler = "DEzs", settings = settings) summary(out1) #this one fails

florianhartig commented 2 years ago

The current implementation is

Object -> getSample -> codaRejection

it is also possible that the issue is in getSample()

florianhartig commented 2 years ago

Find out why the code above works, it shouldn't work if coda is really the problem. Also this works

settings = list(iterations = 5*1000, message = TRUE, nrChains = 1)
setup1 <- createBayesianSetup(ll1, lower = c(-500), upper = c(500))
out1 <- runMCMC(bayesianSetup = setup1, sampler = "Metropolis", settings = settings)
summary(out1)

x = getSample(out1, coda = T)
str(x)