Posterior convolution for ODE sampling

EiffL commented 2 years ago

Ok, I've done some math to try to figure out what a convolved posterior might look like.

Let's start from here: then, the product of these two Gaussians has the following parameters:

Now, let's say we want to convolve this gaussian with sigma{tfg}, this is the noise we want to add to the full posterior. We can try to add noise to f for instance, and see how much noise sigma{tf} we should add to f, to achieve the desired amount of noise on fg. After some math, you find that:

So notice here that this will only work if σg is larger than the noise we add. I tested this on this notebook: https://colab.research.google.com/drive/1-900p399fHa4hnN8bd_OmDg4mlsbrzZZ?usp=sharing

The blue line is orginal fg, solid line is analytically tempered fg, and dashed is original g multiplied by tempered f with the above amount of noise.

So this works, but 2 annoying things:

here this is only when adding noise to one of the components, and this would only work if up to the size of the largest Gaussian. So should instead do this with adding noise to the two distributions.
I've only looked at what happens to the variance, but the mean is also affected by the tempering of components of the product. So we should check that the mean is going to be tractable :-|

EiffL commented 2 years ago

luckily in our case, the mean of the prior is zero at least.... probably makes things a little simpler

b-remy commented 2 years ago

Here is the demo when adding noise to the two distributions (assuming zero means): https://colab.research.google.com/drive/193mj8kGPNrZTwHwheT_gWkURK6-duTOp?usp=sharing

and the formula I derived:

EiffL commented 2 years ago

Sooooo, I've done some additionnal experimenting, and made a few obersrvations

Averaging samples from HMC posterior stopped before 0 temperature doesnt converge to correct posterior mean Here is an example with min_temp=0.03:
If we use HMC sampling with only prior term, looks like taking mean of samples behaves correctly and converges to zero.

=> This would mean we need the HMC sampling to go all the way to zero temperature, because if we stop it before, we are not actually sampling the expected convolved posterior

EiffL commented 2 years ago

Small question @b-remy, what is the smallest temperature we can reach with the full non-gaussian prior?

b-remy commented 2 years ago

It seems that we can reach any temperature we want at the end of the annealing, or at least the annealing trace returns the input min_temp like 1e-4, 1e-5 or 1e-6 if we let the chain running for a sufficiently long time.

However, reaching a very low temperature does not seem to be enough to perfectly recover the Wiener estimate. For instance, here is the results where, in the first experiment we reach 1e-4 temperature, and in the second experiment we reach 1e-6 temperature. Both posteriors mean agree while not perfectly matching the Wiener estimate.

(50 samples to compute the mean ^^^^)

@EiffL , you may notice that the power spectrum is much closer to what we observed last week, which is is due to:

actually reaching the min_temp input at the end of the annealing
computing the power spectrum on the mean posterior times the mask (idem for the Wiener estiamate and MAP). So that we're not confused by small statistics in regions where there is not enough samples to cancel out the prior.

EiffL commented 2 years ago

So, what is the smallest temperature we can reach with the full non-gaussian prior?

b-remy commented 2 years ago

The smallest temperature I managed to reach when fine-tuning one chain is 0.007. I tried the same parameters with batch size of 10 and also get pretty good samples visually and regarding their power spectra. I don't have the plot yet because the notebook is still running but the average map turns out to reach better metrics: rmse=2.24 x 10-2 and r=0.65.

I will update here the results with 20 samples

b-remy commented 2 years ago

Well, same metrics with 20 samples: here are the plots

EiffL commented 2 years ago

That looks really really good no :-) ?

b-remy commented 2 years ago

Yes pretty good :-), but as seen in both Gaussian and full prior examples we still cannot "perfectly" sample the whole posterior.

Maybe we will want to smooth the very small scales as you suggested, assuming that the SNR is way too low to contain any information. That's what I'm thinking about currently.

EiffL commented 2 years ago

ok cool, we don't necessarily need to do anything. We can just use that sampling strategy, apply it in the Gaussian case, and state that we can recover the expected posterior mean down to some given scale

CosmoStat / jax-lensing

Posterior convolution for ODE sampling #51