LSSTDESC / CCL

DESC Core Cosmology Library: cosmology routines with validated numerical accuracy
BSD 3-Clause "New" or "Revised" License
141 stars 64 forks source link

Too simple extrapolation of power spectra in cosmology calculator mode #876

Open nikfilippas opened 3 years ago

nikfilippas commented 3 years ago

This is similar to #816 and #708 .

The cosmology calculator is an excellent way to experiment with passing matter power spectra produced by emulators into CCL, which has the advantage of increasing the speed of the calculations. Emulators constructed on neural networks have some unavoidable scatter (i.e. the derivatives are not that smooth).

In the current implementation of the cosmology calculator, when the user wishes to probe scales (and scale factors) outside of the boundaries of the P(k,a) passed, CCL extrapolates high-k's with a quadratic and low-k's linearly.

While this has some physical motivation (large scales are linear), if a neural-net emulator's boundaries are not quite at the linear regime (and also due to the emulator's scatter) extrapolation doesn't work very well. While this is mostly on the emulator, I think CCL could be a bit more elaborate with extrapolation.

I have been testing baccoemu (arXiv:2104.14568), in the neural-net mode, using the nonlin matter power spectrum. The NN has been trained in k \in [1e-2, 5] h/Mpc. At the large scales (low k) this is roughly at the knee of the power spectrum.

This showcases the failed extrapolation: fig1

Unfortunately, changing the order of the extrapolation from 1 to 2 or 3 doesn't do much, mainly because of this NN-related scatter. This can be more easily demonstrated using the derivative of P(k). In the figures below, extrapolation should aim for the emu curve to approach the CCL curve. 1) Here's the linear extrapolation: fig2 2) The quadratic (effectively gives the same result as the linear): fig3 3) The cubic (well...): fig4

In all cases, CCL fails to correctly predict the power spectrum even slightly outside the input ranges.

nikfilippas commented 3 years ago

Potential solutions:

  1. Instead of only taking into account the n points at the edges for the n-th order extrapolation, we could try a savgol_filter or something similar near the edges so the scatter is smoothed out.
  2. Fit a straight line at the n-th derivative using N>n number of points near the edges.
  3. Use a template power spectrum to estimate the derivative at these high and low scales.
nikfilippas commented 3 years ago

Sol. 1: For example, here is what you can get using an extrapolation order of 2 on the Savitzky-Golay filtered prediction of the power spectrum with a window of 15. Extrapolation works a lot better. fig7

elisachisari commented 3 years ago

Yes, this is a known issue, see: #499 #708 #816. It would be great to have a better method implemented.

nikfilippas commented 3 years ago

Apart from the potential solutions I've listed I struggle to see anything obvious and computationally cheap that will solve the issue. Extrapolating accurately is difficult. Let me know if you have anything else in mind that I could try and test.

nikfilippas commented 3 years ago

Update: Sol1: Savgol filter, Sol2: best-fit, Sol3: use template.

Overall I think a good method would be a C-level implementation of the Savitzky-Golay filter to extrapolate power spectra that have been sampled at sufficiently large/small scales. Probably also add a warning if the first point is > 1e-2 and the last one is < 10.