andreufont / LyaCosmoParams

Notes on cosmological parameters and Lyman alpha simulations
2 stars 2 forks source link

Choice of emulator kernel #64

Open Chris-Pedersen opened 4 years ago

Chris-Pedersen commented 4 years ago

Simeon and Keir's papers used a kernel that was a combination of a linear and a squared exponential (also known as radial basis function, or rbf):

kernel

the linear term seems odd as it is defined with respect to the origin, and it doesn't really make sense why you would want that property. One of the effects of this was that the hyperparameters have to be optimised on a certain ordering of the emulator parameters, which also doesn't seem necessary. So I've been looking at whether we can just drop the linear term and work with a squared exponential kernel.

The first place I looked was just at the emulator predictions, where the combined kernel is in white solid lines, and rbf only is in black dashed: rbf_comparison

The predictions here look very similar, so the next step was to run a sampler: rbf_sampler

rbf_sampler_cosmo

The rbf-only kernel performs significantly worse for reasons I don't fully understand right now. Perhaps the best choice would be to stick with the combined kernel for now and for the first paper.

keirkwame commented 4 years ago

You could also try floating the offset for the linear kernel ((x_i - c) . (x_j - c)). But I don't understand why the ordering matters? Dot product of two vectors is independent of the vector order (assuming you've ordered both the same).. The linear kernel helps a lot because at the scales you're looking at, the parameter dependence is rather linear. How do the hyper-parameter values change between the two cases?


From: Chris Pedersen notifications@github.com Sent: Wednesday, July 8, 2020 3:59 PM To: andreufont/LyaCosmoParams Cc: Subscribed Subject: [andreufont/LyaCosmoParams] Choice of emulator kernel (#64)

Simeon and Keir's papers used a kernel that was a combination of a linear and a squared exponential:

[kernel]https://user-images.githubusercontent.com/16047009/86925730-2104bd00-c129-11ea-9ffc-6cfa1722fbc5.png

the linear term seems odd as it is defined with respect to the origin, and it doesn't really make sense why you would want that property. One of the effects of this was that the hyperparameters have to be optimised on a certain ordering of the emulator parameters, which also doesn't seem necessary. So I've been looking at whether we can just drop the linear term and work with a squared exponential kernel.

The first place I looked was just at the emulator predictions, where the combined kernel is in white solid lines, and rbf only is in black dashed: [rbf_comparison]https://user-images.githubusercontent.com/16047009/86927214-f6b3ff00-c12a-11ea-98d7-504d7d09db22.png

The predictions here look very similar, so the next step was to run a sampler: [rbf_sampler]https://user-images.githubusercontent.com/16047009/86927627-7c37af00-c12b-11ea-8cbd-f693d0e13d45.png

[rbf_sampler_cosmo]https://user-images.githubusercontent.com/16047009/86927636-7f329f80-c12b-11ea-8407-8fc9227a9e7b.png

[https://user-images.githubusercontent.com/16047009/86927636-7f329f80-c12b-11ea-8407-8fc9227a9e7b.png]

The rbf-only kernel performs significantly worse for reasons I don't fully understand right now. Perhaps the best choice would be to stick with the combined kernel for now and for the first paper.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/andreufont/LyaCosmoParams/issues/64, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACJBFUNBMPMH2RM2F7O7PWLR2R3VPANCNFSM4OUTGRPQ.

Chris-Pedersen commented 4 years ago

The correlation length in the RBF kernel is very similar in the two cases (1.05 with linear term vs 1.12 without), but the sigma coefficient doubles (from 30 to 65) when I train without the linear term. I think this is connected to the theoretical uncertainty on the predictions, which would explain the wider contours.

I think only having 2 kernel hyperparameters just isn't enough to describe the entire parameter space, and so when we have the linear kernel just having an extra degree of freedom means the emulator is able to achieve a better global optimisation, even if the form of the kernel isn't great. I just still don't understand why it should be defined with respect to the origin - if the parameter dependence is linear wouldn't you want some linear dependence on the distance between the parameters, rather than their dot product?

Chris-Pedersen commented 4 years ago

Just to document some of the plots I've made looking at the anisotropic RBF-only kernel. Every test I've looked at, at the level of the emulator, this kernel returns significantly better predictions. First we look at results of the leave-one-out test:

one_out_old one_out_asym

where the asymmetric rbf only clearly does way better. To demonstrate this on a specific example, sim#29 is notoriously difficult to predict given the points lie on the edge of the convex hull of our full sim training set, predominantly in the thermal parameters:

sim29

training points from sim#29 are shown in red crosses in these two planes. When I compare emulator predictions for the isotropic, linear+rbf kernel with the new ones for this simulation:

sim29

once again the anisotropic kernel does better - not so well that it looks like a bug though. It just appears to learn the P1D better.

For reference, the optimised hyperparameters are:

[23.12839004169575, 0.47887435116481913, 3.6986473066226564, 3.2879986102659253, 1.3633969075896872, 1.9415496945918296, 11.459874249535781, 8.997481293770624e-06]

With the order being: sigma_rbf, , sigT, gamma, kF, Delta2_p, n_p, noise_var

Everything looks great from the emulator, now I look at several plots from the sampler with the new and old emulators.

sim15 sim8_1 sim8_2 sim22 central

Out of these 4 test sims, the emulator with less accurate and precise predictions gives tighter constraints. The only exception is sim 22 which is near the edge of the prior volume in cosmological parameters. It could be that the isotropic kernel predictions are so bad here that the anisotropic kernel gives better constraints. But it still seems odd that the worse kernel gives better constraints deeper inside the convex hull. To check whether emulator covariance plays a role here, I reproduce the last test, recovering cosmology from the central simulation turning off contributions from the emulator to the covariance matrix.

central_noemucov

Still the worse emulator does better, and we know that this is now not due to different behaviour of the non-stationary covariance matrix. Pat mentioned that the improved emulator might be more accurately modelling degeneracies that lead to inflated constraints which could be the explanation. Just for completeness below I show results from the central simulation with and without emulator covariance, for both the anisotropic and isotropic kernel

emucov_test emucov_test_iso

I still want to proceed with the paper using the anistropic kernel as its just a far better emulator, and we want to model the P1D as well as possible. So I will continue with this, but it would be nice to eventually understand what is going on here.

andreufont commented 4 years ago

I like Pat's comment. We often talked about how by using a poor interpolation (or extrapolation!) technique you might be artificially breaking degeneracies and getting better constraints that you should.

However, the constraints from the "worse emulator" are also unbiased, at least in the tests you showed, but maybe this is just a consequence of the simplified setup here (all sims have same treecool file, same random seed, etc).

Let's leave it for the next paper. It is good that you documented this well here, thanks!