Parallelization of sampling

aklawonn commented 2 years ago

This issue should address the question on how to solve an inference problem using sampling routines (for example MCMC) in a parallelization framework. At the core of a sampling algorithm stands the repeated evaluation of a parameterized forward model (for example a finite element model), where at each evaluation another set of parameters is chosen. Characteristic for these algorithms is that one cannot define all parameter vectors (a parameter vector being the set of parameters used for a single forward model evaluation) in advance. That is, one cannot define, say a thousand jobs (a job being the evaluation of a forward model with a defined parameter vector) and simply submit them. Instead, these sampling algorithms chose the next parameter vector based on the forward model evaluation of the previous parameter vector. So there is a strictly sequential property to these algorithms (see also here).

Here are our thoughts on possible ways to use parallel forward model evaluation to speed up this process. Please feel free to comment them or to add your own proposals.

Parallel evaluations of multiple chains managed from within probeye A chain is a number of evaluations conducted in the sequential fashion described above. One can run multiple of such chains. These chains are independent and can hence be run in parallel. These chains could be prepared and submitted by probeye using some probeye-routines to be written.
Parallel evaluations of multiple chains managed from outside probeye This is the same idea as in the first point. But here, probeye itself would not know anything about parallelization. Instead, probeye itself would simply be run in parallel, each run generating its own chain. The results of these chains can then be evaluated together after each probeye-run has finished.
Using a surrogate model In this case, a response surface of the forward model is prepared in advance before probeye is run. This response surface (the surrogate model) is based on a number of forward model evaluations in the considered parameter space, which can be run in parallel. Instead of evaluating the forward model, probeye would then simply evaluate the response surface, which can be evaluated (once it is obtained) very quickly, so that no parallelization is required in probeye itself.

joergfunger commented 2 years ago

Just a comment, in case a response surface (e.g. a GP is used) as proposed in 3., we would also have to take the uncertainty of the forward model in the evaluation into account (which would mean changing the forward model interface), and maybe even perform an adaptive procedure to concentrate samples of the response surface close to the region where the posterior is important. In addition, we might face challenges for higher dimensions (I would guess the max is probably something between 3 and 8, depending on the individual contributions of each variable.) @JanKoune it would be interesting to know how you plan on using this for the Moerdijk bridge, and also if the reponse surface building should be a submodule in probeye (e.g. by knowing the priors and the forward model it might be easier to directly setup a response surface by just providing a number of samples that is used to generate samples from the parameter priors and then the forward model is computed). A challenge there is probably also the dimension of the model output (e.g. the number of data points being used, e.g. in a time series). Each data point would have to be a separate GP (or alternatively adding another time dimension to the GP), but e.g. when using 50 sensors we would "train" 50 GP (and also evaluate those in the inference).

TTitscher commented 2 years ago

As the most suitable approach is problem dependent, there IMO is no best approach. Regarding the surrogate model, I have some thoughts though:

I was often able to separate factor/offset parameters in the model. That is to decompose
- forward_model(theta0, theta1, ..., thetaN) = theta0 + theat1 * heavy_computation(theta2..N) where only heavy_computation is surrogated
- For certain N (maybe N < 6?) that may make a trivial (and stupid fast) interpolation surrogate model feasible
- (examples would be an uncertain Youngs modulus, specimen thickness or sensor calibration factor)
Surrogate models speed up all inference engines and often provide d_surrogate/d_theta
We already employ ROM/PGD surrogate models in other projects (and, as Jörg mentioned, must deal with the highly relevant topic of its uncertainty anyways)
For untruncated priors, we likely need surrogate models that recompute values on the fly, when samples fall outside a their "reliable region". Exciting stuff!
- Alternatively, I assume there to be techniques out there that update the posterior obtained by a coarse but wide surrogate model with the ones of a narrower one.

Gezzellig commented 2 years ago

Thijs van der Knaap here from TNO, Alex asked me to respond to this issue. I can't give any insights as to the properties of smart choices within Probeyes algorithms, that is not my expertise. But after hearing what Jörg was looking for, being able to scale easily on your HPC cluster, Dask might be an interesting option. This only works if you are using numpy/pandas/scikit-learn, but maybe somehow it might fit. If it does, than you will have a really powerful tool, without having to break your head on implementing the actual parallel execution yourself. This artical explains how you can run a Dask cluster on a HPC that uses job scheduling. https://blog.dask.org/2019/08/28/dask-on-summit

JanKoune commented 2 years ago

With respect to the surrogating, I think it would be interesting to test the AGP and BAPE algorithms (approxposterior), as these algorithms directly surrogate the loglikelihood + logprior, eliminating the problem of high dimensional output, e.g. in the case of timeseries as @joergfunger mentioned. It is likely also possible to change the underlying GP surrogate, and we could try to find a formulation of the GP surrogate that works for a large number of parameters (assuming there is literature on scaling GPs to high-dimensional problems). The IJsselbridge case would also be a convenient way to investigate the performance of the algorithm, e.g. for N > 10 parameters. There are some disadvantages compared to directly surrogating the forward model, mainly:

Not sure how the uncertainty of the surrogate can be considered in the probabilistic model.
We obtain a surrogate for the high density regions of the posterior, built with a small number of forward evaluations, which may not be sufficient if we want to calculate the posterior predictive.
No estimates of the evidence.

We are also looking into adaptive surrogating that focuses the sampling in regions where the forward model is highly nonlinear, however, this must also be tested for higher dimensions (in terms of parameters).

I think it would be convenient to include the surrogating as part of probeye, taking advantage of the interface for defining the parameters and forward model, as long as it can be general enough to be useful for the range of problems (in terms of No. of parameters and output dimensions) that we expect. Another note on this: if the approxposterior approach would be useful, it may be relatively simple to modify the existing code used for inference with emcee (see this example) and implement it as an additional solver.

JanKoune commented 2 years ago

An additional note on parallelizing MCMC: I am not sure how this would work with emcee (if the aim is to also use the offloader) because the chains are not independent (Autocorrelation analysis & convergence — emcee). Also, it is mentioned that the different processes must be able to communicate with each other (Parallelization — emcee). Maybe I am misunderstanding the documentation so I will look more into this. A possible workaround would be to perform different emcee runs in parallel, each with multiple walkers, and recombine the samples afterwards (i.e. point 2 in the first comment).

BAMresearch / probeye

Parallelization of sampling #64