florianhartig / BayesianTools

General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics
https://cran.r-project.org/web/packages/BayesianTools/index.html
119 stars 29 forks source link

How to forward thin argument in the samplers? #39

Open florianhartig opened 7 years ago

florianhartig commented 7 years ago

The thin argument in the samplers is not properly passed on, e.g.

library(BayesianTools)
ll <- generateTestDensityMultiNormal(sigma = "no correlation")
bayesianSetup = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))
settings = list(iterations = 10000, thin = 10)
mcmcOutput <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DEzs", settings = settings)
plot(mcmcOutput)

we should fix this.

As a general note: together with taking care of the thinning in getSample, coda compatibility and so on, this is getting all rather cumbersome. I wonder if we were not better off to create our own mcmc chain class for the 0.2 release.

florianhartig commented 7 years ago

It would be easy to fix this, simply adding multiplying mcmcOutput$settings$thin to start / end / thin values in the return of getSample.

However, then the question is - if we do this, the "real" iteration numbers will be displayed in the plot, e.g. we have a sampler that has been thinned by 10 during sampling, so that we have 10.0000 samples, but the plot displays values up to 100.000 , and the user now looks at this and requests a start values of 10.000, then we should also divide inputs in (start, end, thin) in getSample by mcmcOutput$settings$thin and so on, which could get rather confusing, because according to this logic, a request of thin = 10 in getSample would have no effect, because the sample is already thinned at 10 ... or is this actually only confusing if one is not used to it?

In general, I was wondering if we should move towards using row names or an extra column for recording the iteration of the MCMC sample, the whole business with having several opportunities for thinning is getting rather confusing.

@stefan-paul , what do you think?

stefan-paul commented 7 years ago

Hmm tricky question. I would prefer having the real iteration number displayed in the plot even for a thinned sample. I think the confusing part is mostly in the functions so the user doesn't see this. I guess we only have to take care that we're consistent here. And maybe we should add a line or two in the help page.

For the general question I'm still a bit undecided what the best way would be to handle the thinning options.... I'll think on it .

florianhartig commented 6 years ago

Max, Tankred, opinions?

MaximilianPi commented 6 years ago

On the long run we should move away from coda and just provide a toCoda function. Regarding the thinning, displaying the real iterations should avoid confusion and might be more intuitive.

A solution would be to give the coda::as.mcmc.list function the start, end, and thin values (at the end of the samplers). This way the real iterations should be always displayed and we could additionally thin out with the plot function through getSample. The start/end arguments passed to getSample have to be divided then by the settings$thin as you mentioned above @florianhartig.

TankredO commented 6 years ago

I think getting the real iteration number displayed make more sense.

Maybe we should introduce a "bayesianSample" object or something like this as output of getSample which has all this information?

florianhartig commented 6 years ago

I don't think an own object, or moving away from coda, is really an option, as a lot of coda objects are used internally. We had a https://github.com/florianhartig/BayesianTools/issues/5 a while ago to actually move away from our own internal structure and do everything as coda from the start.

OK, the way I see it

  1. We should have the correct iteration numbers displayed, so this should be fixed. For this issue, the only question is how to do this technically, i.e. either we just have to take care internally to always record start and thin, and display accordingly, or we should actually introduce a column for the iteration number in the sample (more storage space, possibly more versatile because we could also have non-even thinning). Regarding the latter - it should be noted that coda can't handle non-evenly spaced thinning anyway, so if we move to coda, we have to discard this info.

--> I think my preference would be to simply introduce start, end, thin argument with the internal MCMC chain, to have this compliant with coda

  1. The more tricky question is how getSample should now handle a chain that is already thinned. I guess the most stringent way would be to take over start, end, etc. arguments from the previous chain, so if we have a sampler that starts at 1000, is thinned with 10, getSample with

We will have to take care that this all works smoothly also with the numSamples argument, which is frequently used in the plots.

Alternative would be to make the getSample work relative to the current chain, i.e. thin = 2 thins the existing chain by 2, regardless of the original thinning interval

I'm not sure, both options are possibly confusing to a user. How does coda handle all this? They also allow thinning, right?