Closed atulag0711 closed 2 years ago
The posterior predictive (4) is something we should look into in more detail at a later state. For me, this is computationally not cheap, requires much more information and might be integrated already in the sampling (if the quantity of interest can be defined a priori, the overhead of computing this might be not significant), but then it is not really a postprocessing. All the rest looks good to me, as for the questions
Both seaborn and arviz are based on matplotlib and, I guess, you can take the plots of both libraries "as is" for debugging, but also highly customize the style (by modifying the returned matplotlib axes/figure).
The arviz developers already made up their mind about reasonable datastructures, especially the InferenceData
class that can be conveniently created from the output of various samplers as shown here. This InferenceData
is the data source of all their plotting functions.
So my suggestion for a first design would be that all samplers can create an InferenceData
instance and the user then calls arviz manually.
inference_data = our_emcee_wrapper.run(....)
arviz.plot_trace(inference_data, var_names=["E", "nu"], ...)
arviz.plot_pair(inference_data, ...)
However, I could not find any way to provide the posterior sample weight (AFAIK an important result of nested samplers). @rozsasarpi, how could we deal with that?
Thanks everyone for contributing! I prepared a sub-module at probeye/postprocessing (next to definition and inference) where we can put all of these inference-engine-independent post-processing routines. Btw, Thomas' proposal looks like a good approach to me.
However, I could not find any way to provide the posterior sample weight (AFAIK an important result of nested samplers). @rozsasarpi, how could we deal with that?
Yes, for nested sampling methods we will need to work with weighted samples. The relevant outputs are two numerical vectors: one is the sample, the other one is the weight vector.
I haven't yet looked under the hood of arviz
but in general for all functions that do something with the sample you will need a version that can handle weighted samples, e.g. to compute the mean, standard deviation, credible regions, etc. If arviz
does not support this then it may require quite an effort to extend their functions and methods (assuming that they would agree with such an extension).
Thanks Arpad! Some options:
Maybe we should contact the developers of arviz (@atulag0711 ?) to ask that question, maybe we just overlooked something or it is already on their todo list.
Maybe we should contact the developers of arviz (@atulag0711 ?) to ask that question, maybe we just overlooked something or it is already on their todo list.
I dont know much about dynesty. So arent we implementing Inference tool specific visualization, or maybe I am mistaken? Because arviz has inbuilt method to identify pyro. So it will be easier.
Also, how do you envision the visualization implementation and the user needs to provide what? Do we expect the user to know arviz or the user for example just needs to provide the mcmc object and say I want posterior kde plot for parameter "x". If the later is true, then essentially we will just be adding a later of abstraction over arviz. Functionality/ease of use trade off will at play here then. I think we also need to provide a way to override our methods. This will be beneficial for users who needs more flexibility/ maybe knows arviz or matplotlib well enough.
All inference procedures should get the same output, if this is already included in arviz and we decide on using this, then this is fine. However, if we want to use other inference engines like dynesty, we should at least try to allow a visualization of these results as well, which would mean either write a similar tool specific visualization or try to adapt arviz to incorporate that - maybe it is even incorporated (weighted samples) and we would just have to write a wrapper that converts the output of dynesty into something that arviz ca understands see e.g.. (I would not prefer to have a postprocessing per inference engine, but rather agree on a joint format that is passed to a single postprocessing modul). As for what is plotted, I would use this issue for a discussion, for me prior and posterior (for single variables) and pair plots, the samples (individual for the different chains) over time/sampling (traceplots) to see burn in, approximated posteriors with parameterized pdfs (type could either be chosen by the user or automatically determined) would be nice. I would provide a standard plot, but certainly the user can add individual plots having direct access to arviz. This migh even mean that we just provide a standard list of arviz commands the user can copy paste, and if they want something else, we just reference to arviz. At least I was not fully satisfied with the output of arviz, but maybe I did not try long enough.
Just some findings:
kdeplot
accepts a weights
argument that seems to workweights
into more complex combined plots like pairplot
lightkde
only 0.04 seconds).PairGrid
. This can be used to produce the pairplot
like
grid = seaborn.PairGrid(some_data)
grid.map_diag(plotting_function_for_diagnals) # e.g. seaborn.kdeplot
grid.map_offdiag(plotting_function_for_off_diagonals) # e.g. seaborn.scatterplot
Maybe there is a way to: 1) include custom KDE plots (e.g. from lightkde for performance) 2) include custom analytical plots for known, continuous distributions (e.g. for the priors)
In my current work, I came across a plot like shown. The first two parameters are latent model parameters, the Lambda is a latent precision parameter. [VB = analytical variational Bayes and blue = dynesty nested sampling] By just looking at the individual 1D plots, it seems that the posterior and prior are almost equal, so the data has no influence. Obviously, when looking at the 2D plot, the model parameters are highly correlated. Luckily, I am able to visualize that in 2D.
Now imagine a higher dimensional problem, where the correlation in the samples cannot be shown in 2D. That could happen, if all samples lie in a plane with its normal pointing to [1,1,1] and each projection onto axis-aligned plots never reduces to a line. In that case, you would not see the correlation in the 2D pair plots. (Right??)
I am wondering, if we should provide a function or a plot that discovers those correlations...?
I think that kind of plot is quite interesting because that is often the reason why these inverse problems are ill-posed.
I was wondering if it would be possible to provide a general plot function for the predicted posterior or a least an approximation of it.
In variational Bayesian we usually compute the mean and std of the model response by error propagation (propagation of uncertainty), where we use a linear approximation of the model and compute based on the model Jacobian and the identified covariance of the model parameters the standard deviation of the model response. Since in the sampling approaches the covariance (computed from the samples) could also show a correlation between the noise and the model parameters, I'm not sure how to do the same here.
Another way, would be to re-compute the model response at samples (either the known one occurring during the sampling algorithm or new ones selected based on the identified posterior) and from that prepare a plot like this
For that I recompute the model response at all samples saved in inference_data
and plot the response with a alpha filter as well as computed the mean and the std of these response data. Of course, this is not exact for all problem, because in someway I assume here a normal distribution ...
Thanks for your input Annika! I'm not quite sure yet how to realize such a plot for a general case. Because the model response can be computed for each experiment that is added, and this could be many. However, if you have an idea how to approach this, feel free to start a new branch with it. Currently, I don't really have the time to work on this topic unfortunately.
plot the response with a alpha filter as well as computed the mean and the std of these response data
As we discussed that yesterday: I'd argue that this is not valid for all models (maybe only linear ones?). So you created the dotted line via
mean[f(x)] +- std[f(x)]
where f
is the physical model and x
are the posterior samples. And it should really be
f[mean(x) +- std(x)]
right? But I may be wrong...
I think the first one is correct, the predictive posterior is computed by using all the parameter samples, computing the pdf of f(x) at all these parameter points and adding a normal distribution (at least for the additive noise case).
I actually compute the model response as:
response = model(x_i)+normal(0,sigma_i)
and then plot these and compute the mean and std from that. But sure this is open for discussion
Nevertheless, if the model is computational expensive, it's not useful to compute the responses. Maybe there is a nicer semi-analytic way to do it
At least for the existing points x_i and the measurements and sensor j, f_j(x_i, theta), the posterior data points f(x_i, theta) for all samples of theta should be part of the output. So the additional effort computing this might not be too large.
As per the discussions over call, we can use this space to ideate on efficient ways to incorporate post-processing routines. We would need the following plots to diagnose the sampling process. (Approximation based inference methods will be discussed later):
Trace plots:
Pair plot
Posterior distribution with bounds
Posterior Predictive plots
Have to decide the following IMO: