Closed sdwfrost closed 1 year ago
I've had some discussion with @eviatarbach about this.
The key point is that, in the notation of the paper linked, they view a forward map H
as deterministic.
In our code, we view the EKP forward map G
can be nonlinear and/or non-deterministic (and non-Gaussian). For example you could view our forward map G
in the paper notation as an evaluation theta -> p_D(z|H(theta))
where D
is some unknown distribution and H
as an unknown deterministic map.
If you do so, with perfect observation p(y|z)=z
then you obtain exactly the Algorithm 1 in the paper. (NB We don't yet have the timestepping implementation for h
)
The only non-default option you therefore require if you wanted to do something like this, is to be able to observe the non-deterministic map perfectly.
For this we have a keyword argument deterministic_forward_map::Bool = true
in update_ensemble!
(at least for EKI Inversion()
update). If
true
, the form p(y|G(x))
is assumed Gaussian with given noise covariance obs_noise_cov
false
, the form p(y|G(x))
is assumed to be a delta function y = G(x)
You should be able to run the code with the following alteration
update_ensemble!(..., deterministic_forward_map=false)
.obs_noise_cov = 1e-4*I
into EnsembleKalmanProcess(....,Inversion())
viewed as a small algorithmic regularization, though you can try with obs_noise_cov=0*I
.Let us know how you get on!
h
in the paper. It's on our radar, and we will hopefully get to it later in the year.Thanks @odunbar, this is super helpful. I have to deal with count (integer) data (eg number of new infections per day for an infectious disease), that I normally transform via log(x+1). In the standard framework, I get biased parameter estimates (populations are small, so my statistics are far from being Gaussian), though for an initial condition for another scheme, your EKP library is super useful.
Thanks!
I should caveat that EKP still only has good theory for linear & Gaussian problems.
That is to say, although our parameter distribution setups and the algorithms do allow for more interesting priors and likelihoods by pushing nonlinearities/non-Gaussianities into the forward map, this can still yield biases, as EKP still only evolve a Gaussian approximation to perform optimization. In practice though it is has shown effective in many cases.
PS RE the count data, I am by no means an expert, but we do have some work in a filtering context, where we converted infection test-results into PPV/FOR data for data assimilation (c.f. "Synthetic Data" section). This is a different transformation that puts the data into [0,1], and might be of interest to you, though the boundaries can still prove challenging.
Awesome! Thanks for the write-up, I'm pleased it worked out for you, it's really great to hear.
If you ever work with a model you can't run >10^4 times for EKS, we do also have an ML-based pipeline CalibrateEmulateSample.jl
(CES), currently just for moderate dimensional parameter O(10) , to accelerate obtaining approximate posterior distributions based on the output of EKI. In fact, for CES the posteriors do not need to be based on the Gaussianity that EKS requires, so they may also prove more flexible in capturing correlations.
[Please let me know if you are interested in trying this out too. It should hopefully just require a short script taking the priors and EKI ensemble as input]
Closing as resolved.
Is it straightforward to implement the algorithm described in https://www.sciencedirect.com/science/article/pii/S0167715222000967?