Post-processing requirements

atulag0711 commented 3 years ago

As per the discussions over call, we can use this space to ideate on efficient ways to incorporate post-processing routines. We would need the following plots to diagnose the sampling process. (Approximation based inference methods will be discussed later):

Trace plots:
- User should be able to specify the sampled parameter(s) for which trace plot is needed.
- Should have a separate flag for burnin
Pair plot
- User should be able to choose the parameters for which the pairplots are needed.
- Should have a provision to plot overlapping prior dist.
Posterior distribution with bounds
- User chooses the sampled parameter(s)
- USer chooses what kind of bound is needed (+- s.d / +- 2s.d etc)
Posterior Predictive plots
- User needs to provide the forward solver and the required input for it.
- User needs to specify the QoI for which the predictive dist. is needed.

Have to decide the following IMO:

Common post processing for multiple Inference wrappers (taralli, torch etc)/ individual for each of them.
How should the posterior samples be returned? Big dict, pandas dataframe etc.? Stored in a triple store and then post processing queries the triple store?
What should we be using for plotting. Arviz, Seaborn, matplotlib, Plotly etc.?

joergfunger commented 3 years ago

The posterior predictive (4) is something we should look into in more detail at a later state. For me, this is computationally not cheap, requires much more information and might be integrated already in the sampling (if the quantity of interest can be defined a priori, the overhead of computing this might be not significant), but then it is not really a postprocessing. All the rest looks good to me, as for the questions

The goal would be a wrapper such as arviz (or we directly use arviz) for all together, unless there are really specific items that need to be adressed individually. Maybe we could define a BaseClass for all outputs, and then derive individual classes that have specific features for all others.
I would suggest the samples could be returned in a pandas df, the columns would correspond to our latent variables defined in the global inference problem (but as mentioned above, I would not directly return the df, but a baseClass that allows to add tool-specific features). The samples would also be stored in the database, eiher directly or with a reference to the raw data file.
As for the plotting library, I'm very flexible, I would suggest using the tools that already provide most of the functionality, but there I'm not the expert. We might spend some time to look into different options.

TTitscher commented 3 years ago

Both seaborn and arviz are based on matplotlib and, I guess, you can take the plots of both libraries "as is" for debugging, but also highly customize the style (by modifying the returned matplotlib axes/figure).

The arviz developers already made up their mind about reasonable datastructures, especially the InferenceData class that can be conveniently created from the output of various samplers as shown here. This InferenceData is the data source of all their plotting functions.

So my suggestion for a first design would be that all samplers can create an InferenceData instance and the user then calls arviz manually.

inference_data = our_emcee_wrapper.run(....)
arviz.plot_trace(inference_data, var_names=["E", "nu"], ...)
arviz.plot_pair(inference_data, ...)

However, I could not find any way to provide the posterior sample weight (AFAIK an important result of nested samplers). @rozsasarpi, how could we deal with that?

aklawonn commented 3 years ago

Thanks everyone for contributing! I prepared a sub-module at probeye/postprocessing (next to definition and inference) where we can put all of these inference-engine-independent post-processing routines. Btw, Thomas' proposal looks like a good approach to me.

rozsasarpi commented 3 years ago

However, I could not find any way to provide the posterior sample weight (AFAIK an important result of nested samplers). @rozsasarpi, how could we deal with that?

Yes, for nested sampling methods we will need to work with weighted samples. The relevant outputs are two numerical vectors: one is the sample, the other one is the weight vector.

I haven't yet looked under the hood of arviz but in general for all functions that do something with the sample you will need a version that can handle weighted samples, e.g. to compute the mean, standard deviation, credible regions, etc. If arviz does not support this then it may require quite an effort to extend their functions and methods (assuming that they would agree with such an extension).

TTitscher commented 3 years ago

Thanks Arpad! Some options:

provide InferenceData for all sampling methods without weights and suggest the use of arviz to the users
- not ideal, but if you have a posterior PDF like in variational Bayes, you could sample some InferenceData
For dynesty (nested with sample weights) use their plotting

joergfunger commented 3 years ago

Maybe we should contact the developers of arviz (@atulag0711 ?) to ask that question, maybe we just overlooked something or it is already on their todo list.

atulag0711 commented 3 years ago

Maybe we should contact the developers of arviz (@atulag0711 ?) to ask that question, maybe we just overlooked something or it is already on their todo list.

I dont know much about dynesty. So arent we implementing Inference tool specific visualization, or maybe I am mistaken? Because arviz has inbuilt method to identify pyro. So it will be easier.

Also, how do you envision the visualization implementation and the user needs to provide what? Do we expect the user to know arviz or the user for example just needs to provide the mcmc object and say I want posterior kde plot for parameter "x". If the later is true, then essentially we will just be adding a later of abstraction over arviz. Functionality/ease of use trade off will at play here then. I think we also need to provide a way to override our methods. This will be beneficial for users who needs more flexibility/ maybe knows arviz or matplotlib well enough.

joergfunger commented 3 years ago

All inference procedures should get the same output, if this is already included in arviz and we decide on using this, then this is fine. However, if we want to use other inference engines like dynesty, we should at least try to allow a visualization of these results as well, which would mean either write a similar tool specific visualization or try to adapt arviz to incorporate that - maybe it is even incorporated (weighted samples) and we would just have to write a wrapper that converts the output of dynesty into something that arviz ca understands see e.g.. (I would not prefer to have a postprocessing per inference engine, but rather agree on a joint format that is passed to a single postprocessing modul). As for what is plotted, I would use this issue for a discussion, for me prior and posterior (for single variables) and pair plots, the samples (individual for the different chains) over time/sampling (traceplots) to see burn in, approximated posteriors with parameterized pdfs (type could either be chosen by the user or automatically determined) would be nice. I would provide a standard plot, but certainly the user can add individual plots having direct access to arviz. This migh even mean that we just provide a standard list of arviz commands the user can copy paste, and if they want something else, we just reference to arviz. At least I was not fully satisfied with the output of arviz, but maybe I did not try long enough.

TTitscher commented 2 years ago

Just some findings:

Seaborns kdeplot accepts a weights argument that seems to work
I found no easy way to pass the weights into more complex combined plots like pairplot
Seaborns KDE is slow. A 2D KDE with (10000,2) samples takes 4 seconds (lightkde only 0.04 seconds).
Seaborn somehow separates the plots from a grid for more complex, combined plots, e.g. a PairGrid. This can be used to produce the pairplot like
```
grid = seaborn.PairGrid(some_data)
grid.map_diag(plotting_function_for_diagnals) # e.g. seaborn.kdeplot
grid.map_offdiag(plotting_function_for_off_diagonals) # e.g. seaborn.scatterplot
```
Maybe there is a way to: 1) include custom KDE plots (e.g. from lightkde for performance) 2) include custom analytical plots for known, continuous distributions (e.g. for the priors)

TTitscher commented 2 years ago

grafik

In my current work, I came across a plot like shown. The first two parameters are latent model parameters, the Lambda is a latent precision parameter. [VB = analytical variational Bayes and blue = dynesty nested sampling] By just looking at the individual 1D plots, it seems that the posterior and prior are almost equal, so the data has no influence. Obviously, when looking at the 2D plot, the model parameters are highly correlated. Luckily, I am able to visualize that in 2D.

Now imagine a higher dimensional problem, where the correlation in the samples cannot be shown in 2D. That could happen, if all samples lie in a plane with its normal pointing to [1,1,1] and each projection onto axis-aligned plots never reduces to a line. In that case, you would not see the correlation in the 2D pair plots. (Right??)

I am wondering, if we should provide a function or a plot that discovers those correlations...?

joergfunger commented 2 years ago

I think that kind of plot is quite interesting because that is often the reason why these inverse problems are ill-posed.

aradermacher commented 2 years ago

I was wondering if it would be possible to provide a general plot function for the predicted posterior or a least an approximation of it.

In variational Bayesian we usually compute the mean and std of the model response by error propagation (propagation of uncertainty), where we use a linear approximation of the model and compute based on the model Jacobian and the identified covariance of the model parameters the standard deviation of the model response. Since in the sampling approaches the covariance (computed from the samples) could also show a correlation between the noise and the model parameters, I'm not sure how to do the same here.

Another way, would be to re-compute the model response at samples (either the known one occurring during the sampling algorithm or new ones selected based on the identified posterior) and from that prepare a plot like this

For that I recompute the model response at all samples saved in inference_data and plot the response with a alpha filter as well as computed the mean and the std of these response data. Of course, this is not exact for all problem, because in someway I assume here a normal distribution ...

aklawonn commented 2 years ago

Thanks for your input Annika! I'm not quite sure yet how to realize such a plot for a general case. Because the model response can be computed for each experiment that is added, and this could be many. However, if you have an idea how to approach this, feel free to start a new branch with it. Currently, I don't really have the time to work on this topic unfortunately.

TTitscher commented 2 years ago

plot the response with a alpha filter as well as computed the mean and the std of these response data

As we discussed that yesterday: I'd argue that this is not valid for all models (maybe only linear ones?). So you created the dotted line via

mean[f(x)] +- std[f(x)]

where f is the physical model and x are the posterior samples. And it should really be

f[mean(x) +- std(x)]

right? But I may be wrong...

joergfunger commented 2 years ago

I think the first one is correct, the predictive posterior is computed by using all the parameter samples, computing the pdf of f(x) at all these parameter points and adding a normal distribution (at least for the additive noise case).

aradermacher commented 2 years ago

I actually compute the model response as: response = model(x_i)+normal(0,sigma_i) and then plot these and compute the mean and std from that. But sure this is open for discussion

Nevertheless, if the model is computational expensive, it's not useful to compute the responses. Maybe there is a nicer semi-analytic way to do it

joergfunger commented 2 years ago

At least for the existing points x_i and the measurements and sensor j, f_j(x_i, theta), the posterior data points f(x_i, theta) for all samples of theta should be part of the output. So the additional effort computing this might not be too large.

BAMresearch / probeye

Post-processing requirements #8