Closed JohannesBuchner closed 5 years ago
Hi @JohannesBuchner, thanks for your questions. I will try to answer them the best I can:
evaluate()
method of the Pipeline
class to evaluate a sample (set) in the emulator.
Providing a single sample will print the results, while providing a sample set will return the results in a dict.
This will give you the adjusted expectation and variance values, the implausibility values and the emulator iteration every sample was last evaluated at.
Using the evaluate()
method does perform a lot of checks, as it is a user-method.
Therefore, for advanced use, it may be better to use the _evaluate_sam_set()
-method, supplying the emul_i
, your sam_set
and using exec_code='evaluate'
.
This will return a tuple with all the results (adj_exp_val
, adj_var_val
, uni_impl_val
, emul_i_stop
, impl_check
).hybrid sampling
functions given by the utils
module.Well, basically, most interpolation methods are already Gaussian-based, so that would be quite similar to this.
I do not personally know of any public packages that implement a system like PRISM does.
In PRISM, one can turn off the regression process, leaving only the Gaussian processes (by setting method
to 'gaussian'
).
The reason why PRISM uses polynomial functions is that every model should have some underlying structure.
Identifying this structure by using polynomial functions provides the user with much more information than explaining the covariance entirely with Gaussian processes.
I hope that answers your questions.
Cheers, Ellert
Thank you for your answers!
Re 2): I was wondering if one could obtain the derivatives of the emulated model output w.r.t. the model parameters at a evaluated parameter set df(x) / dx_i
. Some exploration methods can benefit from gradients, and having approximate gradients could be helpful.
Maybe I am confused: I was thinking of a model to return a single number at each position in parameter space (e.g., a loglikelihood function). Or is the model meant to be the prediction in data space?
Hi @JohannesBuchner,
Ah, now I understand what you mean. PRISM does not calculate the gradients (derivatives) of the emulated model output, as it does not require them. However, given that it does provide the user with the polynomial terms and their corresponding coefficients, I guess it would not be very difficult to obtain their derivatives. For that however to be accurate enough, it would be advised to make sure that the emulator is converged up to the point it cannot converge any further to avoid getting information that is not true. I do realize now that some MCMC methods require the gradient field of a model to exist (like Hamiltonian Monte Carlo) and for that it may be useful to have an approximate gradient field. I might think about that actually.
One has to be very careful here with the definitions of 'model', 'comparison data' and 'emulator'.
A 'model' is any black box wrapped by a ModelLink
subclass, that takes a parameter set and returns a list/array of data values corresponding to the requested data points (which are given by data identifiers).
The 'comparison data' are the "real" values of these data points: The user want to find out what part of parameter space can generate model realizations that produce values that are very close to these "real" values.
An 'emulator' is an approximation of the model, that is made to replace the process of evaluating the model, in order to significantly speed up the convergence process.
Therefore, the emulator gives an expectation (prediction) of the value of the MODEL that would be returned by the model if it was evaluated there. It therefore makes an approximation of the model for the specified data points, which becomes more and more accurate in the region of parameter space where the probability is high that a model realization can explain the comparison data. In regions of parameter space where the probability is low that a model realization exists that can explain the comparison data, the emulator's approximation will be very rough and not accurate.
Does that answer your questions?
Hi,
this is a very cool project. I am interested in using some parts of it, in particular the emulator. If I understand correctly, I can use the pipeline to construct and regress the emulator for a d-dimensional model with existing N model evaluations. My questions are:
1) How do I call the emulator myself to obtain the estimated model output and its uncertainty? 2) Would it be possible for the emulator to compute gradients? (because it is a Gaussian approximation) 3) If I later obtain more samples (through my own external sampling process), how can I inform/update the pipeline?
I was also wondering if you know any packages of emulators using Gaussian processes? Have you thought of implementing them, what would the pros&cons be compared to the polynomial approach?
Cheers, Johannes