Open florianhartig opened 7 years ago
It would be easy to fix this, simply adding multiplying mcmcOutput$settings$thin to start / end / thin values in the return of getSample.
However, then the question is - if we do this, the "real" iteration numbers will be displayed in the plot, e.g. we have a sampler that has been thinned by 10 during sampling, so that we have 10.0000 samples, but the plot displays values up to 100.000 , and the user now looks at this and requests a start values of 10.000, then we should also divide inputs in (start, end, thin) in getSample by mcmcOutput$settings$thin and so on, which could get rather confusing, because according to this logic, a request of thin = 10 in getSample would have no effect, because the sample is already thinned at 10 ... or is this actually only confusing if one is not used to it?
In general, I was wondering if we should move towards using row names or an extra column for recording the iteration of the MCMC sample, the whole business with having several opportunities for thinning is getting rather confusing.
@stefan-paul , what do you think?
Hmm tricky question. I would prefer having the real iteration number displayed in the plot even for a thinned sample. I think the confusing part is mostly in the functions so the user doesn't see this. I guess we only have to take care that we're consistent here. And maybe we should add a line or two in the help page.
For the general question I'm still a bit undecided what the best way would be to handle the thinning options.... I'll think on it .
Max, Tankred, opinions?
On the long run we should move away from coda and just provide a toCoda function. Regarding the thinning, displaying the real iterations should avoid confusion and might be more intuitive.
A solution would be to give the coda::as.mcmc.list function the start, end, and thin values (at the end of the samplers). This way the real iterations should be always displayed and we could additionally thin out with the plot function through getSample. The start/end arguments passed to getSample have to be divided then by the settings$thin as you mentioned above @florianhartig.
I think getting the real iteration number displayed make more sense.
Maybe we should introduce a "bayesianSample" object or something like this as output of getSample which has all this information?
I don't think an own object, or moving away from coda, is really an option, as a lot of coda objects are used internally. We had a https://github.com/florianhartig/BayesianTools/issues/5 a while ago to actually move away from our own internal structure and do everything as coda from the start.
OK, the way I see it
--> I think my preference would be to simply introduce start, end, thin argument with the internal MCMC chain, to have this compliant with coda
We will have to take care that this all works smoothly also with the numSamples argument, which is frequently used in the plots.
Alternative would be to make the getSample work relative to the current chain, i.e. thin = 2 thins the existing chain by 2, regardless of the original thinning interval
I'm not sure, both options are possibly confusing to a user. How does coda handle all this? They also allow thinning, right?
The thin argument in the samplers is not properly passed on, e.g.
we should fix this.
As a general note: together with taking care of the thinning in getSample, coda compatibility and so on, this is getting all rather cumbersome. I wonder if we were not better off to create our own mcmc chain class for the 0.2 release.