Open aklawonn opened 2 years ago
Just a comment, in case a response surface (e.g. a GP is used) as proposed in 3., we would also have to take the uncertainty of the forward model in the evaluation into account (which would mean changing the forward model interface), and maybe even perform an adaptive procedure to concentrate samples of the response surface close to the region where the posterior is important. In addition, we might face challenges for higher dimensions (I would guess the max is probably something between 3 and 8, depending on the individual contributions of each variable.) @JanKoune it would be interesting to know how you plan on using this for the Moerdijk bridge, and also if the reponse surface building should be a submodule in probeye (e.g. by knowing the priors and the forward model it might be easier to directly setup a response surface by just providing a number of samples that is used to generate samples from the parameter priors and then the forward model is computed). A challenge there is probably also the dimension of the model output (e.g. the number of data points being used, e.g. in a time series). Each data point would have to be a separate GP (or alternatively adding another time dimension to the GP), but e.g. when using 50 sensors we would "train" 50 GP (and also evaluate those in the inference).
As the most suitable approach is problem dependent, there IMO is no best approach. Regarding the surrogate model, I have some thoughts though:
forward_model
(theta0, theta1, ..., thetaN) = theta0 + theat1 * heavy_computation
(theta2..N) where only heavy_computation
is surrogatedd_surrogate/d_theta
Thijs van der Knaap here from TNO, Alex asked me to respond to this issue. I can't give any insights as to the properties of smart choices within Probeyes algorithms, that is not my expertise. But after hearing what Jörg was looking for, being able to scale easily on your HPC cluster, Dask might be an interesting option. This only works if you are using numpy/pandas/scikit-learn, but maybe somehow it might fit. If it does, than you will have a really powerful tool, without having to break your head on implementing the actual parallel execution yourself. This artical explains how you can run a Dask cluster on a HPC that uses job scheduling. https://blog.dask.org/2019/08/28/dask-on-summit
With respect to the surrogating, I think it would be interesting to test the AGP and BAPE algorithms (approxposterior), as these algorithms directly surrogate the loglikelihood + logprior, eliminating the problem of high dimensional output, e.g. in the case of timeseries as @joergfunger mentioned. It is likely also possible to change the underlying GP surrogate, and we could try to find a formulation of the GP surrogate that works for a large number of parameters (assuming there is literature on scaling GPs to high-dimensional problems). The IJsselbridge case would also be a convenient way to investigate the performance of the algorithm, e.g. for N > 10
parameters. There are some disadvantages compared to directly surrogating the forward model, mainly:
We are also looking into adaptive surrogating that focuses the sampling in regions where the forward model is highly nonlinear, however, this must also be tested for higher dimensions (in terms of parameters).
I think it would be convenient to include the surrogating as part of probeye, taking advantage of the interface for defining the parameters and forward model, as long as it can be general enough to be useful for the range of problems (in terms of No. of parameters and output dimensions) that we expect. Another note on this: if the approxposterior
approach would be useful, it may be relatively simple to modify the existing code used for inference with emcee (see this example) and implement it as an additional solver.
An additional note on parallelizing MCMC: I am not sure how this would work with emcee (if the aim is to also use the offloader) because the chains are not independent (Autocorrelation analysis & convergence — emcee). Also, it is mentioned that the different processes must be able to communicate with each other (Parallelization — emcee). Maybe I am misunderstanding the documentation so I will look more into this. A possible workaround would be to perform different emcee runs in parallel, each with multiple walkers, and recombine the samples afterwards (i.e. point 2 in the first comment).
This issue should address the question on how to solve an inference problem using sampling routines (for example MCMC) in a parallelization framework. At the core of a sampling algorithm stands the repeated evaluation of a parameterized forward model (for example a finite element model), where at each evaluation another set of parameters is chosen. Characteristic for these algorithms is that one cannot define all parameter vectors (a parameter vector being the set of parameters used for a single forward model evaluation) in advance. That is, one cannot define, say a thousand jobs (a job being the evaluation of a forward model with a defined parameter vector) and simply submit them. Instead, these sampling algorithms chose the next parameter vector based on the forward model evaluation of the previous parameter vector. So there is a strictly sequential property to these algorithms (see also here).
Here are our thoughts on possible ways to use parallel forward model evaluation to speed up this process. Please feel free to comment them or to add your own proposals.