LSSTDESC / CCL

DESC Core Cosmology Library: cosmology routines with validated numerical accuracy
BSD 3-Clause "New" or "Revised" License
139 stars 64 forks source link

HOD based Halo Model predictions of correlation functions #828

Open mShuntov opened 3 years ago

mShuntov commented 3 years ago

Hi CCL team,

Thanks for developing and maintaining this exceptionally powerful and versatile code. I am using it in my work in order to fit auto and cross-correlation measurements. My data are measurements of the 2pt angular correlation function for two samples of galaxies: lens sample (z = 1.15) and source sample (z=3.35). I am measuring the auto-correlation of lenses (due to clustering) and the cross-correlation between lenses and sources (due to magnification, because no z-overlap between the samples).

The goal is to fit these measurements using the halo model and an HOD parameterization (following Nicola et al. 2019 parameterizations).

In writing the code to perform this, I am closely following the examples given here. However, the predictions I get from the CCL (using the default parameter values for the HOD, and for most other parameters controlling the numerical performance of the code) show unstable behaviour and fail to agree with the measurements. The figure below shows the lens-lens auto-correlation measurements (blue) and lens-source cross-correlation measurements (orange) along with the CCL predictions using the HOD power spectrum (solid) and a default 'boltzmann_camb' power spectrum.

w_th

I fail to understand what causes the wiggles in the correlation function and the offset and shape compared to the data. This behaviour prevents successfully fitting the data.

I have been tweaking many of the parameters that possibly drive this behaviour (the arrays holding values of the scale factor, wavenumber k, multipole l, etc), but in combination with the parameters controlling the numerical performance the degree of freedom to tweak these is huge.

I would be grateful if you would be interested to have a look at this problem and develop some discussion about what drives this discrepancy between the model and my data.

In the meanwhile, I keep going investigating all the parameters that control the splines, fftlog precision, integration accuracy etc.

Thank you!

For completeness, I am sharing with you the notebook that performs these computations.

import pyccl as ccl
import numpy as np
import os
from scipy.special import sici, erf
import matplotlib.pyplot as plt

Load data. The data are measurements of correlation (due to clustering) between galaxies at redshift of ~1.15 (lenses) and cross-correlation (due to magnification) between lenses and galaxies at high redshift of ~3.35 (lenses)

# cross-correlations lens-source
w_LS_data = {'theta'    : np.array([0.010818, 0.0197816, 0.0361722, 0.0661436, 0.1209489, 0.2211646, 0.404417, 0.7395084]),
             'w_th'     : np.array([0.0516967, 0.0305469, 0.0172719, 0.0128365, 0.006257, 0.0024252, 0.0002548, -8.9e-05]),
             'w_th_err' : np.array([0.0170539, 0.0134549, 0.0103374, 0.0071149, 0.0061504, 0.0044403, 0.0023535, 0.0021713])}

# auto-correlations lens-lens
w_LL_data = {'theta'    : np.array([0.0006046, 0.0008842, 0.001293, 0.0018908, 0.002765, 0.0040435, 0.005913, 0.0086468, 0.0126447, 0.018491, 0.0270403, 0.0395423, 0.0578247, 0.0845601, 0.1236564, 0.1808291, 0.2644355, 0.3866974, 0.5654872, 0.8269405]),
             'w_th'     : np.array([1.1310948, 1.0964981, 0.6036151, 0.4611592, 0.3166772, 0.2223176, 0.1674735, 0.1108344, 0.0809346, 0.0579251, 0.0456306, 0.0350482, 0.0298885, 0.0214808, 0.0146694, 0.0086748, 0.0030189, -0.0017086, -0.001826, 0.0002964]),
             'w_th_err' : np.array([0.0973727, 0.084832, 0.0389069, 0.0257532, 0.0222459, 0.0195117, 0.0129087, 0.0092477, 0.0083354, 0.0072959, 0.0070525, 0.0066702, 0.0060109, 0.0059983, 0.006447, 0.0055545, 0.0040846, 0.0031083, 0.0034477, 0.0038521])}

Define global variables and parameter values


# CCL numerical perform values 

cosmo.cosmo.spline_params.ELL_MIN_CORR = 1
cosmo.cosmo.spline_params.ELL_MAX_CORR = np.int(1e9)
cosmo.cosmo.spline_params.N_ELL_CORR = np.int(1e7)

cosmo = ccl.Cosmology(  Omega_c = 0.27,
                        Omega_b = 0.045,
                        h = 0.67,
                        sigma8 = 0.83,
                        n_s = 0.96,
                        transfer_function = 'boltzmann_camb',
                        matter_power_spectrum = 'halofit',
                        mass_function = 'tinker10',
                        halo_concentration = 'duffy2008')

k_arr = np.geomspace(1E-4, 1E4, 1*64)
l_arr = np.unique(np.geomspace(1, 1.0e6, 3*64).astype(int))
a_arr = np.linspace(1/(1+10), 1, 1*64)
th_arr = np.geomspace(6E-4,3E0, 256)
z = np.linspace(0, 10, 3*256)

Define sample-specific values

mag_low_lens = 99
mag_upp_lens = 99
mag_low_source = 99
mag_upp_source = 99

mass_low_lens = 99
mass_upp_lens = 99
mass_low_source = 99
mass_upp_source = 99

z_mean_lens = 1.15
z_width_lens = 0.05
z_mean_source = 3.55
z_width_source = 0.09
nz_lens = np.exp(-((z-z_mean_lens)/z_width_lens)**2/2)
nz_source = np.exp(-((z-z_mean_source)/z_width_source)**2/2)

slope = 1.5
sz_magnif = np.ones_like(z)*slope

Define the two tracers we're studying: galaxy number counts of lenses and sources

lens_tracer = ccl.NumberCountsTracer(   cosmo,
                    has_rsd = True,
                    dndz = (z, nz_lens),
                    bias = (z, np.ones_like(z)),
                    mag_bias = None)
source_tracer = ccl.NumberCountsTracer( cosmo,
                    has_rsd = True,
                    dndz = (z, nz_source),
                    bias = (z, np.ones_like(z)),
                    mag_bias = (z, sz_magnif))

Define HOD/CCL relevant quantities: halo mass definition, concentration-mass relation, mass function, halo bias, NFW profile...

hmd_200m = ccl.halos.MassDef200m()
c_M = ccl.halos.ConcentrationDuffy08(hmd_200m)
n_M = ccl.halos.MassFuncTinker08(cosmo, mass_def = hmd_200m)
b_M = ccl.halos.HaloBiasTinker10(cosmo, mass_def = hmd_200m)
prof_M_NFW = ccl.halos.profiles.HaloProfileNFW( c_M, 
                    fourier_analytic = True,
                    projected_analytic = False, 
                    cumul2d_analytic = False, 
                    truncated = True)
hmc = ccl.halos.HMCalculator(   cosmo, 
                    n_M, 
                    b_M, 
                    hmd_200m, 
                    log10M_min = 8.0, 
                    log10M_max = 16.0, 
                    nlog10M = 128, 
                    integration_method_M = 'simpson', 
                    k_min = 1e-05)

Define halo profile characterizing the galaxy overdensity using a Halo Occupation Distribution (HOD) model


class HaloProfileHOD(ccl.halos.HaloProfileNFW):
    def __init__(   self, 
                    c_M_relation,
                    lMmin = 12.02, 
                    lMminp = 0,
                    lM0 = 6.6, 
                    lM0p = 0,
                    lM1 = 13.27, 
                    lM1p = 0,
                    sigmaLogM = 0.4,
                    alpha = 1.,
                    ap = 0.65):

        self.lMmin=lMmin
        self.lMminp=lMminp
        self.lM0=lM0
        self.lM0p=lM0p
        self.lM1=lM1
        self.lM1p=lM1p
        self.ap=ap
        self.a0 = 1./(1+ap)
        self.sigmaLogM = sigmaLogM
        self.alpha = alpha
        super(HaloProfileHOD, self).__init__(c_M_relation)
        self._fourier = self._fourier_analytic_hod

    def _lMmin(self, a):
        return self.lMmin + self.lMminp * (a - self.a0)

    def _lM0(self, a):
        return self.lM0 + self.lM0p * (a - self.a0)

    def _lM1(self, a):
        return self.lM1 + self.lM1p * (a - self.a0)

    def _Nc(self, M, a):
        # Number of centrals
        Mmin = 10.**self._lMmin(a)
        return 0.5 * (1 + erf(np.log(M / Mmin) / self.sigmaLogM)) # log or log10 ?!

    def _Ns(self, M, a):
        # Number of satellites
        M0 = 10.**self._lM0(a)
        M1 = 10.**self._lM1(a)
        return np.heaviside(M-M0,1) * ((M - M0) / M1)**self.alpha

    def _fourier_analytic_hod(self, cosmo, k, M, a, mass_def):
        M_use = np.atleast_1d(M)
        k_use = np.atleast_1d(k)

        Nc = self._Nc(M_use, a)
        Ns = self._Ns(M_use, a)
        # NFW profile fourier_analytic is taken from this class
        uk = self._fourier_analytic(cosmo, k_use, M_use, a, mass_def) / M_use[:, None]

        prof = Nc[:, None] * (1 + Ns[:, None] * uk)

        if np.ndim(k) == 0:
            prof = np.squeeze(prof, axis=-1)
        if np.ndim(M) == 0:
            prof = np.squeeze(prof, axis=0)
        return prof

    def _fourier_variance(self, cosmo, k, M, a, mass_def):
        # Fourier-space variance of the HOD profile
        M_use = np.atleast_1d(M)
        k_use = np.atleast_1d(k)

        Nc = self._Nc(M_use, a)
        Ns = self._Ns(M_use, a)
        # NFW profile
        uk = self._fourier_analytic(cosmo, k_use, M_use, a, mass_def) / M_use[:, None]

        prof = Ns[:, None] * uk
        prof = Nc[:, None] * (2 * prof + prof**2)

        if np.ndim(k) == 0:
            prof = np.squeeze(prof, axis=-1)
        if np.ndim(M) == 0:
            prof = np.squeeze(prof, axis=0)
        return prof

Define the 2pt correlator for the HOD halo profile


class Profile2ptHOD(ccl.halos.Profile2pt):
    # This class implements the Fourier-space 1-halo 2-point correlator for the HOD profile.
    # Takes the fourier variance from the HaloProfileHOD object

    def fourier_2pt(self, prof, cosmo, k, M, a,
                    prof2=None, mass_def=None):

        # if not isinstance(prof, HaloProfileHOD):
        #     raise TypeError("prof must be of type `HaloProfileHOD`")

        if prof2 is not None:
            if prof2 is not prof:
                raise ValueError("prof2 must be the same as prof")

        return prof._fourier_variance(cosmo, k, M, a, mass_def)

Define functions that get predictions for the correlation function and power spectra to get model of cross-correlation due to magnification we use the galaxy-matter power spectrum; for auto-correlations due to clustering we use the galaxy-galaxy power spectrum


def get_ccl_hod_predictions_TESTER(lens_tracer, source_tracer, theta, lMmin, lM0, lM1):
    '''gets ccl model predictions'''

    prof_gal = HaloProfileHOD(c_M, lMmin=lMmin, lM0=lM0, lM1=lM1)
    HOD2pt = Profile2ptHOD()

    pk2D_gg = ccl.halos.halomod_Pk2D(cosmo, hmc, 
                    prof = prof_gal, 
                    prof2 = prof_gal, 
                    prof_2pt = HOD2pt,
                    normprof1 = True, normprof2 = True,
                    lk_arr = np.log(k_arr), a_arr = a_arr,
                    get_1h=True, 
                    get_2h=True, 
                    extrap_order_lok=1, 
                    extrap_order_hik=2)
    pk2D_gM = ccl.halos.halomod_Pk2D(cosmo, hmc, 
                    prof = prof_gal, 
                    prof2 = prof_M_NFW, 
                    normprof1 = True, normprof2 = True,
                    lk_arr = np.log(k_arr), a_arr = a_arr,
                    get_1h=True, 
                    get_2h=True, 
                    extrap_order_lok=1, 
                    extrap_order_hik=2)

    cl_LL = ccl.angular_cl(cosmo, lens_tracer, lens_tracer, l_arr, 
                    p_of_k_a=pk2D_gg, 
                    l_limber=-1.0, limber_integration_method='qag_quad')
    cl_LS = ccl.angular_cl(cosmo, lens_tracer, source_tracer, l_arr, 
                    p_of_k_a=pk2D_gM, 
                    l_limber=-1.0, limber_integration_method='qag_quad')

    w_LL = ccl.correlation(cosmo, l_arr, cl_LL, theta, type='NN', method='FFTLog')
    w_LS = ccl.correlation(cosmo, l_arr, cl_LS, theta, type='NN', method='FFTLog')

    return cl_LL, cl_LS, w_LL, w_LS

Next, we want to compare with predictions using the default camb power spectrum. Here we define a galaxy bias


def get_ccl_default_predictions_TESTER():
    # gets the angular power spectrum and correlations using the default nonlinear power spectrum
    bz = 0.95/ccl.growth_factor(cosmo,1./(1+z))

    lens_tracer = ccl.NumberCountsTracer(   cosmo,
                    has_rsd = True,
                    dndz = (z, nz_lens),
                    bias = (z, bz),
                    mag_bias = None)
    source_tracer = ccl.NumberCountsTracer( cosmo,
                    has_rsd = True,
                    dndz = (z, nz_source),
                    bias = (z, bz),
                    mag_bias = (z, sz_magnif))

    cl_LL = ccl.angular_cl(cosmo, lens_tracer, lens_tracer, l_arr, 
            p_of_k_a=None, 
            l_limber=-1.0, limber_integration_method='qag_quad')
    cl_LS = ccl.angular_cl(cosmo, lens_tracer, source_tracer, l_arr, 
            p_of_k_a=None, 
            l_limber=-1.0, limber_integration_method='qag_quad')

    w_LL = ccl.correlation(cosmo, l_arr, cl_LL, th_arr, type='NN', method='FFTLog')
    w_LS = ccl.correlation(cosmo, l_arr, cl_LS, th_arr, type='NN', method='FFTLog') 

    return cl_LL, cl_LS, w_LL, w_LS

def get_power_spectra_TESTER():
    # gets the halo model power spectrum and a default non-linear power spectrum

    prof_gal = HaloProfileHOD(c_M, lMmin=12., lM0=6.6, lM1=13.2)
    HOD2pt = Profile2ptHOD()

    pk_gg = ccl.halos.halomod_power_spectrum(cosmo, hmc, k_arr, 1.,
                                         prof_gal, prof_2pt=HOD2pt,
                                         normprof1=True)

    pk_gM = ccl.halos.halomod_power_spectrum(cosmo, hmc, k_arr, 1.,
                                         prof = prof_gal, prof2 = prof_M_NFW,
                                         normprof1=True, normprof2=True)

    pk_MM = ccl.halos.halomod_power_spectrum(cosmo, hmc, k_arr, 1.,
                                         prof_M_NFW, normprof1=True)

    pk_nl_MM = ccl.nonlin_matter_power(cosmo, k_arr, 1.)

    return pk_gg, pk_gM, pk_MM, pk_nl_MM

cl_LL, cl_LS, w_LL, w_LS = get_ccl_hod_predictions_TESTER(lens_tracer, source_tracer, th_arr, 12., 6.6, 13.2)
cl_LL_d, cl_LS_d, w_LL_d, w_LS_d = get_ccl_default_predictions_TESTER()

Plotting

fig, ax = plt.subplots(1, figsize=(5,4), dpi=150)

# ax.title.set_text(title)
lfac = l_arr * (l_arr + 1) / (2 * np.pi)
ax.plot(l_arr, cl_LL*lfac, alpha=0.7, color='blue', label='LL')
ax.plot(l_arr, cl_LS*lfac, alpha=0.7, color='orange', label='LS')
ax.plot(l_arr, cl_LL_d*lfac, alpha=0.4, linestyle='--', color='blue', label='LL default Pk')
ax.plot(l_arr, cl_LS_d*lfac, alpha=0.4, linestyle='--', color='orange', label='LL default Pk')
ax.legend(fontsize=8)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('$\ell$')
ax.set_ylabel('$\ell(\ell+1)C_\ell/(2\pi)$')
plt.show()

output_3_0

fig, ax = plt.subplots(1, figsize=(5,4), dpi=150)

ax.plot(th_arr, w_LL, alpha=0.7, color='blue', label='LL model')
ax.plot(th_arr, w_LL_d, alpha=0.4, linestyle='--', color='blue', label='LL model default Pk')
ax.plot(th_arr, w_LS, alpha=0.7, color='orange', label='LS model')
ax.plot(th_arr, w_LS_d, alpha=0.4, linestyle='--', color='orange', label='LS model default Pk')
ax.errorbar(w_LS_data['theta'], w_LS_data['w_th'], yerr=w_LS_data['w_th_err'], fmt='o',linewidth=1.2, 
         markersize = 8, capsize=3, alpha=0.5, label='LL data', color='orange')
ax.errorbar(w_LL_data['theta'], w_LL_data['w_th'], yerr=w_LL_data['w_th_err'], fmt='o',linewidth=1.2, 
         markersize = 8, capsize=3, alpha=0.5, label='LL data', color='blue')
ax.legend(fontsize=8)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('$\\theta$ [deg]')
ax.set_ylabel('$w(\\theta)$')
plt.show()

w_th

# Comparing the different 3D power spectra
pk_gg, pk_gM, pk_MM, pk_nl_MM = get_power_spectra_TESTER()

fig, ax = plt.subplots(1, figsize=(5,4), dpi=150)

# ax.title.set_text(title)
ax.plot(k_arr, pk_gg, alpha=0.7, color='blue', label='HM Pk gg')
ax.plot(k_arr, pk_gM, alpha=0.7, color='orange', label='HM Pk gM')
ax.plot(k_arr, pk_MM, alpha=0.7, color='green', label='HM Pk MM')
ax.plot(k_arr, pk_nl_MM, alpha=0.4, linestyle='--', color='black', label='default matter Pk')
ax.legend(fontsize=8)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('$k$')
ax.set_ylabel('$P(k)$')
plt.show()

output_5_0


# just testing the P(k) using different functions 

kmin=1e-4
kmax=1e4
nk=128
k = np.logspace(np.log10(kmin), np.log10(kmax), nk) 

# Scale factor
a = 1. 

# Calculate all these different P(k)
pk_li = ccl.linear_matter_power(cosmo, k, a)
pk_nl = ccl.nonlin_matter_power(cosmo, k, a)
pk_hm = ccl.halomodel_matter_power(cosmo, k, a)
pk_1h = ccl.halomodel.onehalo_matter_power(cosmo, k, a)
pk_2h = ccl.halomodel.twohalo_matter_power(cosmo, k, a)

plt.plot(k, pk_li, 'b-',  label='Linear')
plt.plot(k, pk_nl, 'r-',  label='Nonlinear')
plt.plot(k, pk_hm, 'g-',  label='Halo model')
plt.plot(k, pk_1h, 'g:',  label='One-halo term')
plt.plot(k, pk_2h, 'g--', label='Two-halo term')
plt.xscale('log')
plt.yscale('log')
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.xlabel(r'$k\quad[Mpc^{-1}]$',fontsize=22)
plt.ylabel(r'$P(k)\quad[Mpc]^3$',fontsize=22)
# plt.ylim([1e1,1e5])
plt.legend()
plt.show()

output_6_1

damonge commented 3 years ago

Hi @mShuntov

have you tried

pennalima commented 3 years ago

Hi @mShuntov ,

We have cross-checked CCL with other libraries (NumCosmo and Colossus), and we verified that just the analytic NFW projected functions present good match (the relative difference varies 10^{-11} to 10^{-6}). Computing the projected density numerically (NFW, Einasto or Hernquist profiles) we obtain 10% agreement for some R values.

So, for now, you should use the "analytic" NFW profile:

prof_M_NFW = ccl.halos.profiles.HaloProfileNFW(c_M, fourier_analytic = True, projected_analytic = True, cumul2d_analytic = True, truncated = False)

I ran your notebook using it, but the difference between data and the predicted curve is even bigger.

mShuntov commented 3 years ago

Hi,

Thanks for the suggestions. Indeed, changing the range of the parameters controlling the splines and numerical accuracy cosmo.cosmo.spline_params.ELL_MAX/MIN_CORR and cosmo.cosmo.spline_params.N_ELL_CORR fixed this behaviour. Previously I was only changing the definitions of the l_arr

image

The Bessel integrator crashes with a CCLError: Error 18: message and I can't pin down what causes it.

Regarding the profiles, pyccl.halos.profiles.HaloProfile object has a method to update precision parameters used by the FFTLog to compute Hankel transforms. I've been trying to tweak this but I don't see any change. This is the syntax I used:

prof_M_NFW.update_precision_fftlog( padding_lo_fftlog = 0.01,
                                    padding_lo_extra = 0.01,
                                    padding_hi_fftlog = 100.,
                                    padding_hi_extra = 100.,
                                    large_padding_2D = False,
                                    n_per_decade = 1000,
                                    extrapol = 'linx_liny',
                                    plaw_fourier = -1.5,
                                    plaw_projected = -1.)

Is this the wrong way of doing it?

Finally, usually, the correlation measurements are done using some cuts in the stellar mass of the lensing sample of galaxies. Is there a way to inform the model that we're interested in a certain stellar mass range; or is this governed by the HOD parameterization that is defined in the HaloProfileHOD class?

mShuntov commented 3 years ago

Hi @mShuntov ,

We have cross-checked CCL with other libraries (NumCosmo and Colossus), and we verified that just the analytic NFW projected functions present good match (the relative difference varies 10^{-11} to 10^{-6}). Computing the projected density numerically (NFW, Einasto or Hernquist profiles) we obtain 10% agreement for some R values.

So, for now, you should use the "analytic" NFW profile:

prof_M_NFW = ccl.halos.profiles.HaloProfileNFW(c_M, fourier_analytic = True, projected_analytic = True, cumul2d_analytic = True, truncated = False)

I ran your notebook using it, but the difference between data and the predicted curve is even bigger.

Hi,

I'm coming back to you regarding this issue. In fact when requesting a non-truncated NFW profile

prof_M_NFW = ccl.halos.profiles.HaloProfileNFW( c_M, 
fourier_analytic = True,
projected_analytic = False, 
cumul2d_analytic = False, 
truncated = False)

there appears a big difference in the predicted cross-correlation function between two samples of galaxies at different redshifts, whereas the auto-correlation of galaxies in the lower redshift bin doesn't change.

I'm attaching some figures showing this difference. image

image

Could you tell if this is due to wrong usage of the code, or is it some bug? The piece of code I'm using is the same as the one I posted earlier.

Thanks.