tmcclintock commented 4 years ago

Halo theory in CCL

(Tagging relevant people: @elisachisari @damonge @villarrealas @pennalima @aimalz @beckermr @alexander-mead @BenediktDiemer @cavestruz @matthewkirby @elikrause @KuanWang-Astro @zdu863 @c-d-leonard. I’m sure I forgot someone that will want to see this post. My bad.)

This issue is meant to subsume issues #656, #655, and #291. It is also relevant for #344, #235, #275, and #219, so maybe close those too.

As of now, halos are missing from CCL. This is a problem because a huge fraction of the community would benefit from these theory tools. Both cluster science and HOD models are unable to use CCL without these routines. The upside is that most of halo theory is very easy to write down!

This issue contains (my vision for) an outline for how a full halo model should be implemented in CCL. It includes where we need to make changes to the existing code structure, what new files we need to create and build tests for, what theory routines should go into CCL for now, and what pieces of code I think we should toss out/rewrite. I also point out routines that I think should exist elsewhere (namely Firecrown). At the end I also gave a wishlist for things that would be nice but tricky to implement (like how to generally accommodate emulators).

If you don’t spend most of your day thinking about halos and want some background reading, my suggestion is to look at Frank van den Bosch’s lectures. Lectures 4, 6, 8, 9, 10, and 13 would cover everything I mention here. The seminal review paper on halo theory is Cooray & Sheth (2002).

An aside: halo theory =/= cluster science

Cluster science is distinct from halo theory because the former includes a selection function. A ton of effort outside of CCL and DESC is spent understanding cluster (and galaxy) selection functions. Numerically, dealing with selection functions amounts to doing extra integrals over PDFs of your observables and any other quantities you need to consider to link your observable with halo mass. For instance, with redMaPPer clusters, to accurately model some observable (forgetting about redshift for a second) you have to integrate over a fixed bin of observed richness, an infinite integral over true richness, and an infinite integral over halo mass.

The selection function can be arbitrarily more complicated than that, so CCL should probably not try to support cluster science directly. Those routines are probably best left to Firecrown, and as CCL developers we should look to provide highly-optimized versions of functions of halo mass so that those higher-level routines won’t be bogged down.

What I am NOT talking about

When I say “halo theory” I don’t mean halofit. Halofit is an analytic fit to the nonlinear power spectrum. It got its name because it models the power spectrum with a “1-halo” and “2-halo” term, but it has nothing to do with halo theory w.r.t. CCL. This is because, for all practical purposes, halofit makes no effort to model halos themselves. It has no bearing on the halo bias, halo mass function, halo clustering, or anything else halo related. It can be thought of as a black box that spits out the matter power spectrum.

When I say “halo theory” I don’t mean @alexander-mead’s halo-based power spectrum, although this model admittedly takes a lot from actual halo theory. Again, for all practical purposes, Alex’s routine is a method for modeling the nonlinear power spectrum. It makes some assumptions about how matter is distributed in halos (in Fourier space) that are most stringent than in halofit. Like halofit however, it is a black box for predicting the power spectrum. If we wanted to be super-duper sticklers for theoretical consistency, then one could not mix Alex’s model with any parts of halo theory that he did not explicitly include (e.g. non-NFW profiles).

What I am talking about

CCL should accommodate the most basic science that can be done with halos (not clusters; see above). This includes in order of my opinion of priority:

Mass definitions
Mass conversions
Density profiles
Halo number counts
Mass-concentration relations
Halo bias
Halo correlation functions
Projected density profiles
Halo occupation distributions
Fourier-space versions of normalized density profiles

For the most part, the lower points on the list won’t function without the higher parts being implemented correctly if we want to support arbitrary halo mass definitions. I will now go through these one by one and outline what goes into implementing each part of halo theory fully. I will provide some background where I think it’s warranted, but I will probably just link a paper that I think would do the job.

I know that I cannot exhaustively list all the potential models that we could accommodate. There are at least a half dozen different mass-concentration relations. People that have read more of the literature than I should chime in when something is left out, and I can edit in more models later.

0. Mass definitions

Halo mass is written in terms of the background density of the Universe

CodeCogsEqn

In this equation, \bar{\rho} is the background density being considered, and is usually either the critical density or the matter density \bar{\rho} = rho_m = Omega_m * rho_crit. Delta refers to the overdensity relative to the background of the halo. In other words, the halo lives in a peak in the density field that is Delta times more dense than rho. Under this definition, a single halo mass M maps to a single halo radius R (R_\Delta in the equation). Sometimes people like to work in physical units rather than comoving, so then a (1+z) appears attached to R and rho becomes a physical density. In the literature you will see things like 200m, 500c, or vir. This indicates which background density is being considered (mean or critical) and which overdensity value is being used (200, 500, or the virial value for that cosmology at some redshift).

We need to decide what halo definitions we want to support. In simulations, people build models under certain halo definitions (e.g. my emulators are for 200m). There are also halo definitions that don’t assume sphericity, like friends-of-friends halo finders, that at fixed halo mass won’t match up to measurements made under a spherical halo definition.

In addition to deciding which halo definitions to support, we should decide how much work we want to put in to warn the user about potentially incompatible models. For instance, it’s one thing to warn the user that mass function A was calibrated for the 200m definition and is incompatible with halo bias B calibrated for 500c, but it is more work to say that two models calibrated under 200m used different halo finders, so they might not agree.

Finally, we also need to decide where mass definitions are specified. The most restrictive and simple way for the user to handle this is to specify a mass definition at the same time that they specify cosmology. This way, everything is done internally, and we can implement functions very cleanly like halo_bias(M, z) rather than halo_bias(M, z, definition). The other extreme is to perhaps make a halo_definition object which gets passed into things. This would have the upside of allowing for more flexibility, such as allowing the user to compute quantities under more than one halo definition at a time (which is of scientific interest).

To summarize, for halo definitions we need to decide the following:

[ ] decide which halo definitions to support
[ ] decide how much flexibility in combining models we allow and what kind of warnings should exist
[ ] decide where we let the user define halo definitions

1. Mass conversions

Once we know which halo definitions we want to support, we should implement a way to convert between mass definitions. To do this in the most general way, we need to also have density profiles (next subsection). Implementing mass conversions early on is nice because it can save us the trouble of rewriting code later on.

Performing a mass conversion involves solving:

CodeCogsEqn (1)

Given the left hand side, the halo mass and radius under definition Delta_1, what is the right hand side (note that any factors of Omega_m are absorbed into the Deltas)? We need a second equation to get at both the new halo mass and radius. Without getting into the weeds, one solves this by assuming a density profile of the halo and specifying the concentration (if it’s NFW). After some math and root finding, you can get the right hand side as well as the concentration (or any other profile parameters, given the assumed profile) under the new definition.

The steps for implementing this include:

[ ] write up the mass conversion equations in LaTeX form to put in the CCL note so that we don’t screw this up
[ ] implement the NFW profile as a base (see next section)
[ ] implement mass conversions assuming and NFW profile, but write it in such a way as to allow any profile definition (e.g. Einasto)
[ ] test the conversions against Colossus

2. Density profiles

Halo density profiles describe where the mass is located inside of a halo. First though, a bit of terminology via an equation:

CodeCogsEqn (2)

The halo density profile rho can be written in terms of the normalized density profile u, or the halo-matter cross correlation function 1-halo term xi. All of these descriptions are equivalent, but people prefer their own particular function. What we should really do is implement them once and then have routines all call the same function. My preference is that we should only be implementing correlation functions, since they are unitless, but I could see how working with density profiles would be more digestible.

Also, more often than not people are computing density profiles over a range of scales and masses. So our design should reflect this.

We should do the following to bring halo profiles in realspace into CCL:

[ ] decide which is the default representation (density profile, normalized density profile, or correlation function)
[ ] implement the NFW profile (so that mass conversions can be completed)

Then we can implement other profiles.

[ ] Einasto profile
[ ] isothermal sphere?
[ ] plummer profile?

Note for those that follow the literature, these would all constitute “1-halo” terms. The Diemer & Kravtsov model is a combined 1 and 2 halo term, so it should wait until after correlation functions are implemented.

Another note: @KuanWang-Astro implemented the NFW profile in ccl_haloprofile.c. Once we have steps 0 and 1 finished, we should return to that implementation and see if it needs updating. Kudos for taking the first leap into the great unknown :).

3. Halo number counts

Halo number counts would be the halo mass function and integrals of it. In other words, models like the Tinker et al. (2008) mass function provide a model for the comoving number density of halos (dn/dM). Equivalently, one can write the halo mass function in terms of the halo multiplicity f(M). Additional desired functionality would be integrals over the mass function over bins of mass and/or redshift.

Here is a checklist of what we would want:

[ ] Tinker 2008 & 2010 mass functions & multiplicity functions (the former calls the latter)
[ ] Press & Schecter mass functions & multiplicity functions

To do these things, we also need a fast implementation of halo peak height from mass. See, e.g. equation 3 in my halo mass function emulator paper. This is an integral over the power spectrum given some window function. For now we should go with convention and assume a spherical top-hat window function (see the wishlist later). Checkbox items are:

[ ] implement rms variance as a function of mass (i.e. sigma(M)) at fixed redshift. This is equivalent to inverse peak height
[ ] implement the derivative dsigma/dM as a function of mass (this goes into the mass function)

Right now, @villarrealas has implemented these things in ccl_massfunc.c along with the halo bias. However, I think it’s worth revisiting this implementation and modifying it. First, I think the computation the derivative of sigma(M) should be made analytic, rather than the numerical derivative in place now (ccl_massfunc.c lines 486+487). Second, it is worth pulling out the sigma(M) and its derivative out of the mass function file into its own file. This is because those quantities are used in, e.g. the halo bias, correlation functions, mass-concentration relationships, and other areas. Right now, the massfunc file is the halo-theory behemoth that must be tamed.

4. Mass-concentration

As mentioned before, under the NFW model one can describe the halo density with only two parameters (mass and concentration). People like single parameter models roughly 100% more than two parameter models, hence many people spend a lot of time developing mass-concentration relations. Thus, CCL should support them.

All jokes aside though, I think a mass--concentration file would be very beneficial. Additionally, not only should it have mass--concentration relations, but also mass--parameter relations under different models. For instance, the paper introducing the Diemer-Kravtsov profile gave fitting functions for parameters in their profile in terms of halo mass, so it would make sense to put them here.

It is likely that users will want to compute c(M) for an array of masses and/or redshifts, so we should keep this in mind when writing code.

To do items here:

[ ] identify mass-concentration relations that we should use (e.g. Bhattacharya, Duffy, Bullock, Ludlow, Child, Diemer).
[ ] pick one and implement it well
[ ] implement the rest
[ ] test them against Colossus

5. Halo bias

At large scales, halos cluster proportionally to matter. To first order, the halo bias is only mass and redshift dependent, and there exist a few models out there for the halo bias. We should implement them, since they appear in, e.g. cluster lensing, cluster clustering, HOD models, and probably more places. Also, they are very simple to implement.

Just like for concentration, users will want b(M,z) at arrays of masses and redshifts.

The to do list:

[ ] identify halo bias relations people want to use (Tinker 2010 main text and appendices, Sheth & Tormen)
[ ] implement one well
[ ] implement the rest well
[ ] find implementations to test against and do so. The cluster_toolkit has the Tinker halo bias implemented, and would be a good baseline for that.

@villarrealas has implemented some of these in the ccl_massfunc.c file, but the halo bias should be pulled out into its own file.

6. Halo correlation functions

Halos cluster. Therefore, we should implement halo-matter cross correlation functions (xi{hm}) and halo autocorrelation functions (xi{hh}). Thankfully, this is easy to do and we have all the tools already implemented. This is because, at small scales the halo-matter cross correlation function is proportional to the density profiles (\xi_{hm}(r,M) \propto rho(r,M)), while at large scales it is proportional to the halo bias times the matter autocorrelation function (b(M) xi_mm(r)). The halo autocorrelation is 0 at small scales (halo exclusion) and can usually be modeled as xi_hh(r, M1, M2) = b(M1)b(M2)*xi_mm(r).

Note: everything in the above paragraph has a redshift dependence that I’m dropping to save characters.

So, as long as we trust the Fourier transform of the matter power spectrum (which @zdu863 merged into master last year) to go from P(k) to xi(r), then we have the 2-halo term done already. It’s just a matter of putting all the pieces together.

The to-do items look like this:

[ ] implement the halo autocorrelation function
[ ] implement ways to combine 1-halo and 2-halo terms (take the max or add them)
[ ] test against the cluster_toolkit. Note that these tests would technically be superfluous, since we would already test the density profiles, halo bias, and matter autocorrelation functions themselves.

At this point it would be worth double checking that the benchmarks for xi_mm(r) are good.

7. Projected density profiles

Weak lensing and the like are measured in projection on the sky. For instance, the surface mass density (proportional to kappa) looks like

CodeCogsEqn (3)

From there it is simple to go to the differential surface mass density DeltaSigma. To see these equations written out, look at section 5.1 in this paper.

Note: in the above equation the arguments would also include any other properties of the halo, like concentration, but this is obviously model dependent and would be accommodated by our implementation of how the user specifies halo definitions.

The to-do list would be the following:

[ ] implement the line-of-sight integral to go from the halo-matter cross correlation function to projected surface mass density
[ ] implement the projected radial integral to go to differential surface mass density
[ ] implement azimuthal averaging, i.e. the ability to integrate over annuli projected on the sky (this is easy)

Systematics like miscentering and reduced shear would live in Firecrown.

Note that an alternative formalism to all this is to go from xi_hm to the halo-matter power spectrum P_hm(k), and then do an inverse transform (an integral w/ a Bessel function) to get DeltaSigma directly. This was done by our HSC collaborators, and might be faster computationally. This should be explored.

Another note is that it is my opinion that since CCL is a theory-only code we should not provide the means to compute projected clustering w(theta) or gamma(theta), or anything else that requires knowledge of source/lens redshift distributions. Open for discussion.

8. Halo occupation distributions (HODs)

HOD is a prescription to put galaxies into halos. An HOD model coupled with the halo model parts described above would allow us to do galaxy-galaxy lensing directly. It’s possible that this should live in Firecrown rather than CCL, but I’d like to hear the audience’s thoughts.

The todo list would include:

[ ] identify forms of HOD models that people use
[ ] implement them
[ ] find a good benchmark to test against and make tests

9. Fourier-space versions of correlation functions & density profiles

As I alluded to earlier when I was talking about Alex’s halo-based power spectrum model, one can also compute halo density profiles in Fourier space. Essentially, the normalized density profiles can be transformed according to

CodeCogsEqn (4)

I personally see this as low priority. BUT, implementing (3D) Fourier transforms of the profiles we have written in realspace is nice because it “completes the picture” and also might allow someone to build more complicated versions of the halo-based power spectrum model with different 1-halo terms (which is an actual research project).

The todo items would be

[ ] find analytic forms for some transformations and implement those
[ ] use fftlog to do numerical transforms for those that cannot be solved analytically
[ ] find benchmarks and test against them

Gorey details

IMO including the halo theory should be a milestone for V2. I think it would be a travesty to reach the LSST era and not have a proper halo theory in CCL. Nevertheless, getting the job done will result in growing pains.

New files

I think the following files should get torn up in order to be replaced by new files that are more sensibly organized:

ccl_correlation.c - the term “correlation” is too broad for what this file does. I think it should be broken down into smaller parts, or at a minimum have the 3d transform of the matter power spectrum separated away from anything done in projection. I also think anything RSD related should be pulled out and put into their own files.
ccl_massfunc.c - this file is a behemoth that does more than its name suggests. Mass functions should be isolated. sigma(M) and its derivatives should be isolated and put next to peak height. Halo bias should be isolated.

Here are my suggestions for new files that should exist and what should be put in them

ccl_mass_definitions.c - this file contains routines to do mass conversions. Also, if we go this design route, it would have the structure (an enum or whatever) for a halo definition (overdensity, mean vs crit, assumed density profile)
ccl_halo_profiles.c - this file would have halo density profiles as well as normalized density profiles. If I had my way, this file would just have wrappers that would call the corresponding correlation functions, but this can be discussed.
ccl_mass_concentration.c - this file would have mass-concentration relations as well as other mass-property relations mentioned above.
ccl_3d_halo_correlation_functions.c - this file would have correlation functions for 1-halo terms like the NFW and einasto profiles, as well as 2-halo terms that look like b(M)*xi_mm. It also has routines for combining them, so that one can call xi = xi_hm(r, M) and get back a curve that already has the 1 and 2 halo terms combined.
ccl_halo_statistics.c or maybe ccl_peak_statistics.c - this file would have the routines that compute sigma(M,z), its derivatives, and peak height.
ccl_halo_mass_functions.c - self explanatory
ccl_halo_bias.c - self explanatory
ccl_projected_halo_profiles.c - this file has all projected halo profiles described above.
ccl_fourier_space_halo_profiles.c - self explanatory

Development cycle

The hardest part of getting all this done is getting developers to work on it. The easiest way to do this is to make the development cycle easier. I’ll admit that one of the main reasons why I took a hiatus on contributing to CCL was because the process was difficult. I don’t have the skillz that the senior devs have. I think that largely thanks to @beckermr the workflow has been made easier.

Since my outline constitutes a huge addition to the codebase in CCL, it would be great if the senior devs could spend a little bit of time trying to think if the workflow suggested on the README accommodates this effort. Also, I only outlined what needs to be done on the C side. It would be good to know what would need to be done in Python to make everything work correctly.

Benchmarks

Some of the things I suggested implementing don’t have benchmarks. We also need to decide what halo definitions we make benchmarks for (since 200m is obvious, but 201m is probably not worth looking at). We should consult cluster people, especially those that work in other wavelengths, to see what definitions they would like. Off the top of my head I know we should benchmark 200m, 500c, and vir. Not sure what else.

Stretch goals

Here is a non-exhaustive list of things that would be super dope to have in CCL, but probably not worth including in the V2 benchmark

A seamless way to incorporate halo theory emulators. Things like DarkQuest and the Aemulus Project have emulators to compute various parts of the halo theory I outlined above. For instance, I have an emulator for b(M,z). In an ideal world, we implement things so that emulators (perhaps existing externally from CCL) can be hot-swapped in where appropriate.
Halo splashback, assembly bias, and other fun things that exist in simulations. These are all research topics, but some things, like splashback radii, have analytic fitting functions that could in principle be implemented in CCL. Not sure it’s worth the development time but this is the pipedream section.
Nonspherical halos. Cluster science today requires us to consider non-spherical halos, so this will be something that DESC folks will work on. It’s possible to kick this up to Firecrown, but it would be cool if we had it in CCL. I actually think this is the highest priority item out of our stretch goals.
Arbitrary window functions. Halos don’t have to be defined by a spherical top hat. We could have arbitrary window functions, like a Gaussian. This only effects the integrand when computing sigma(M) and its derivatives, so this might not actually be such a stretch goal and could be very easy to do.
Implement really tiny effects like the true critical overdensity for collapse \delta_{crit}. I think @alexander-mead already has this implemented in fortran, so it would be possible to port that over.

villarrealas commented 4 years ago

So, my first impression is that a lot of this makes sense, but it is going to be a substantial bit of work. The changes suggested involve pretty dramatic changes to underlying routine and probably involve disentangling a lot of routines from the cosmology object in order to allow for the mixed utilization of various parts. Thankfully, since the C API is depreciated, this may be a little easier (although our C code would become an abomination if unaltered).

Aside from that during the planning phase, a very serious concern I have is about model consistency. This is briefly mentioned in the above text, but (I believe) one of the stated goals of CCL in the early periods was to make certain that a user generating summary statistics would not accidentally create measurements that were inherently incompatible with each other. This probably motivated some of the intermediate steps for, say, the halo mass function not being publicly exposed. However, given a desire to open up more of the process from end to end as callable functions and being passed into each other, it is certainly worth considering how we are making sure any naive outputs are sane.

Aside from those two concerns that I think we need to address before settling into any extensive refactoring of this code, I think the idea is good. Unfortunately, I cannot offer time to help personally... ultimately, I think the big blocker is a halo expert who is happy coding in Python that has time to develop this out, and can put any slow pieces into the C.

beckermr commented 4 years ago

We should give up on model consistency. None of our models are consistent so demanding them to be so means we ship nothing.

tmcclintock commented 4 years ago

@villarrealas can you explain exactly what you mean by model consistency in this case, and what you mean by "inherently incompatible"?

villarrealas commented 4 years ago

@tmcclintock One easy to imagine case is somebody pulls halo mass function for an FOF based halo definition and then pulls halo bias for a SO based halo definition, and then uses both in some calculation. Since they're describing entirely different objects, that could cause some severe inconsistencies.

However I should note that I am inclined to agree with Matt that ensuring model consistency for everything inside of the halo model seems like a nightmare of checks and cross-checks for something that is accurate on only the 10% level anyway in most cases. I just wanted to bring up that (to my knowledge at least) this is a change of stated goals.

tmcclintock commented 4 years ago

It sounds like we are all in agreement on that. Changing goals is fine, I think. We can't idiot proof CCL.

beckermr commented 4 years ago

If we want to put in major changes to the python API, we need to do it ASAP. We can't change the function signatures or locations once we release v2.

I'd be ok stripping halo stuff out of v2 so that we can add it back later.

damonge commented 4 years ago

Hi all,

sorry it took me a while to get time to get to this. Thanks a lot to @tmcclintock for starting this. I agree it's absolutely necessary for us to have a robust halo model framework for DESC.

Here are a few thoughts:

I would be more ambitious and make this general enough that you can generate halo model prediction not just for galaxies or for matter, but for any other quantity. To this end, I would make a generic HaloProfile class that can accommodate both matter profiles, galaxy density profiles and anything else (e.g. pressure, gas density, n^2*T for X-rays etc.). I would then create specific subclasses for matter profiles and galaxy density profiles, and let users create their own profiles if they need them. Each of these classes should return the real-space profile, the projected profile, the fourier-space profile and, I would argue, at least the 2nd order cumulant in fourier space (which you need to compute the 1-halo term).
I would use python as much as possible. There are certain things where C makes sense. E.g. several things to do with mass conversion, matter density profiles, mass functions, halo biases etc. since those will be used by all halo profiles. The rest, I would code up in python.
I would create a routine/class/whatever that, given two HaloProfiles, spits out a Pk2D object corresponding to the halo model prediction for the cross-correlation between the quantities described by those profiles. This way you could generate predictions for g x g, g x m, m x m, g x tSZ, m x tSZ ...

This framework would then make it easy to add functionality in CCL to compute non-Gaussian covariances, bispectra etc.

What do you think? Do we have a timeline on this? I'd be happy to start implementing this in my spare time now (we actually have coded some of this up for the HSC project in LSS with @anicola).

damonge commented 4 years ago

@tmcclintock @beckermr let me know if you have any thoughts on the above. I'd be very happy to start implementing this kind of functionality soon.

beckermr commented 4 years ago

So we should decide if we want to do this work post v2 or before v2

damonge commented 4 years ago

I think it can wait until after v2 if we want to do that soon.

Tagging @elikrause

damonge commented 4 years ago

OK, so here's a concrete proposal for a possible API. Tagging the relevant people: @tmcclintock @elikrause @beckermr @elisachisari

Aims of this extension:

compute P(k,z) between any two quantities U and V using the halo model
provide functionality to compute all the intermediate quantities that go into the above:
- halo mass functions
- halo bias
- concentration-mass relation
- halo profiles

Some equations

Halo-model power spectra

The power spectrum is split between the 1-halo and 2-halo terms: $P_{UV}(k,z)=P_{UV}^{1h}(k,z)+P_{UV}^{2h}(k,z)$ The 2-halo term reads $P_{UV}^{2h}(k,z)=\int dM_1\,n(M_1)\,\langle u(k|M_1) \rangle \int dM_2\,n(M_2)\,\langle v(k|M_2) \rangle\,P_{hh}(k,z;M_1,M_2)$ where $P_{hh}(k,z;M_1,M_2)$ is the cross-spectrum between halos of masses M1 and M2, and $\langle u(k|M_1) \rangle$ is the mean profile of quantity U around a halo of mass M1 in Fourier space. Assuming a linear bias relation for halos, this simplifies to: $P_{UV}^{2h}(k,z)=I^1_{U}(k) I^1_{V}(k)P_L(k)$ where PL(k) is the linear matter power spectrum, and I've defined <a href="https://www.codecogs.com/eqnedit.php?latex=\dpi{150}&space;I^n{U_1U_2...U_m}(k)=\int&space;dM\,n(M)\,b_h(M)\,\langle&space;u_1(k|M)u_2(k|M)...u_m(k|M)\rangle" target="blank"><img src="https://latex.codecogs.com/gif.latex?\dpi{150}&space;I^n{U_1U_2...U_m}(k)=\int&space;dM\,n(M)\,b_h(M)\,\langle&space;u_1(k|M)u_2(k|M)...um(k|M)\rangle" title="I^n{U_1U_2...U_m}(k)=\int dM\,n(M)\,b_h(M)\,\langle u_1(k|M)u_2(k|M)...um(k|M)\rangle" /> The 1-halo term reads <a href="https://www.codecogs.com/eqnedit.php?latex=\dpi{150}&space;P{UV}^{1h}(k)=I^0_{UV}(k)" target="blank"><img src="https://latex.codecogs.com/gif.latex?\dpi{150}&space;P{UV}^{1h}(k)=I^0{UV}(k)" title="P{UV}^{1h}(k)=I^0_{UV}(k)" />

Mass definitions

Now, the really tricky part is gonna be translating between mass definitions. The most common way to define halo mass is through a spherical overdensity criterion. The halo mass is defined as the mass enclosed by a sphere inside which the density is a factor $\Delta$ higher than the average density $\rho_x$ , where $\rho_x$ is normally either the matter density (x=M) or the critical density (x=c): $M_{\Delta_x} = \frac{4\pi}{3}\Delta\,\rho\,R_{\Delta_x}^3$ The problem is that all the quantities above (mass functions, halo biases, halo profiles), are usually only provided for a discrete set of mass definitions $(\Delta,x)$ , and often you don't have all the pieces you need to build a halo model power spectrum for the same definition. A typical way to get around this difficulty is to assume that use the concentration-mass relation assuming an NFW profile for all halos.

For the NFW profile, the mass enclosed within a given radius is $M(<R)=4\pi\,\rho_0\,r_s^3\left[{\rm ln}\left(1+\frac{R}{r_s} \right ) - \frac{R/r_s}{1+R/r_s}\right ]$ . Defining the concentration parameter as $c_{\Delta_x}=R_{\Delta_x}/r_s$ we get $M_{\Delta_x}=\frac{4\pi}{3}\rho_c(z)\Omega_x(z)\Delta_x\,R_{\Delta_x}^3=4\pi\rho_0\,\left(\frac{R_{\Delta_x}}{c_{\Delta_x}}\right)^3\left[\ln\left(1+c_{\Delta_x}\right)-\frac{c_{\Delta_x}}{1+c_{\Delta_x}}\right]$ and therefore the concentrations for two different mass definitions $\Delta_x$ and $\Delta_y$ are related through the implicit equation: $\frac{c_{\Delta_x}^3\Delta_x\Omega_x}{c_{\Delta_y}^3\Delta_y\Omega_y}=\frac{\ln\left(1+c_{\Delta_x}\right)-\frac{c_{\Delta_x}}{1+c_{\Delta_x}}}{\ln\left(1+c_{\Delta_y}\right)-\frac{c_{\Delta_y}}{1+c_{\Delta_y}}}.$ So, you can translate between two mass definitions as follows:

Get a concentration-mass relation for $M_{\Delta_x}$ , compute $c_{\Delta_x}$ and $R_{\Delta_x}$
Solve the previous equation to find $c_{\Delta_y}$ .
From $c_{\Delta_y}$ and $R_{\Delta_x}$ compute $R_{\Delta_y}$ and then $M_{\Delta_y}$ .

This is not the only way to change between mass definitions, but it's the most generic one I know of. There are good reasons not to like it though. E.g. you can find that, even if you start from a concentration-mass relation that has been fit to simulations, the concentration-mass for a different mass definition you derive using this method does not actually match the one you would find from direct fit to simulations with this new mass definition. For this reason I propose making the interface generic enough that other ways of translating between mass definitions can be supported in the future.

Specific API proposal

The API will come in the form of the following classes and functions.

Mass definition

The MassDef class will look as follows:

class MassDef(object):
    def __init__(self, Delta, dens_type, c_m_relation=None):
        ''' Store Delta and dens_type (where dens_type is either 'matter' or 'critical').
           These two define this mass definition.
           If c_m_relation is provided, it would be a string that defines one of a set of
           concentration-mass relations we should support. If not provided, then an
           error will be raised if the `get_concentration` or `translate_mass` methods
           below are called.
        '''

    def get_concentration(self, M, z):
        ''' Return the concentration for a halo of mass M at redshift z'''

    def translate_mass(self, M, z, m_def_other):
        ''' Return the mass under definition `m_def_other` for a halo of mass M (under
           the current mass definition) and redshift z. `m_def_other` should be
           another `MassDef` object defining the output mass definition (which doesn't
           need to have an associated c-M relation).
        '''

There is key functionality that should live in C here. In particular, we want the non-linear equation solver that translates between c-M relations to be as fast as possible. Other possibly useful methods would be get_radius(M,z) and get_mass(R,z).

Mass functions

class MassFunc(object):
    def __init__(self, mf_parametrization, mass_def):
        ''' This defines a mass function. `mf_parametrization` will be a string that determines the
           mass function parametrization to use from a set of parametrizations we need to support.
           Alternatively, we can support generic mass functions (a la generalized Pks), but this
           can be discussed later. `mass_def` is a `MassDef` object defining the mass definition
           to use for this mass function. An error will be thrown if the chosen parametrization
           does not support this mass definition.
        '''

    def multiplicity_function(self, M, z, mass_def=None):
        ''' Return multiplicity function for a halo of mass M and redshift z. The mass would
           correspond to mass definition `mass_def`. If `mass_def` is not provided, it is
           understood that the mass definition is the same one that this `MassFunc` object was
           initialized with. Otherwise, unless the mass definitions coincide, `mass_def` should
           be able to translate masses into this object's mass definition.
        '''

    def mass_function(self, M, z, mass_def=None):
        ''' Same as above for the mass function'''

Things that should live in C: sigmaM and dsigmaMdlnM splines (as they are right now). Possibly some of the halo mass function parametrizations if they require a lot of interpolation. Emulators?

Halo bias

Same structure as the mass function, but for the halo bias.

class HaloBias(object):
...

Halo profile

class HaloProfile(object):
    def __init__(self, mass_def, name):
        ''' Initialize a given halo profile. We need to provide a mass definition and a
           name (for bookkeeping).
        '''

    def profile_r(self, r, M, z, mass_def):
        ''' Return the mean profile at comoving distance `r` from the halo centre. M, z
           and mass_def are the same as those passed to `MassFunc.multiplicity_function`.
        '''

    def profile_k(self, k, M, z, mass_def):
        ''' Return the mean Fourier-space profile at comoving wavenumber `k` from the
           halo centre. M, z and mass_def are the same as those passed to
           `MassFunc.multiplicity_function`.
        '''

    def profile_squared_k(self, k, M, z, mass_def):
        ''' Return the second-order moment of the Fourier-space profile at comoving
           wavenumber `k` (i.e. the quantity that goes into the 1-halo term). By
           default this would just be the square of the mean profile.
        '''

    def profile_norm(self, M, z, mass_def):
        ''' Return the k->0 limit of the Fourier-space profile (or, equivalently, the
           volume integral of the profile.
        '''

The idea would be to make it possible for people to code up their own halo profiles that inherit from this class (e.g. to include profiles for any other quantity), but we should provide a few basic ones (e.g. NFW, Einasto, anything else CL want).

Halo-model power spectra

The following set of functions should be supported:

def mean_field(z, profile, halo_bias, mass_function):
    ''' Return the k->0 limit of I^0_U (defined above) at redshift `z`. This
       correspond to the halo model prediction for the mean value of the
       field U. `profile` is a `HaloProfile` object, corresponding to the
       quantity U. `mass_function` is a `MassFunc` object and halo_bias is
       a `HaloBias` object.
    '''
def large_scale_bias(z, profile, halo_bias, mass_function):
    ''' Return the k->0 limit of I^1_U (defined above) at redshift `z`.
       `profile` is a `HaloProfile` object, corresponding to the quantity U.
       `mass_function` is a `MassFunc` object and halo_bias is a `HaloBias`
       object.
    '''

def scale_dependent_bias(k, z, profile, halo_bias, mass_function):
    ''' k-dependent version of `large_scale_bias`. '''

def pk_hm_1_halo(k, z, profile1, mass_function, profile2=None):
    ''' 1-halo power spectrum term for the cross-correlation between
       the quantities represented by `profile1` and `profile2`. If
       `profile2` is `None`, then this will compute the auto-correlation
       of `profile1`.
    '''

def pk_hm_2_halo(k, z, profile1, mass_function, profile2=None):
    ''' 2-halo power spectrum term for the cross-correlation between
       the quantities represented by `profile1` and `profile2`. If
       `profile2` is `None`, then this will compute the auto-correlation
       of `profile1`. This function could be extended with more optional
       arguments if we want to use emulators of the halo-halo power
       spectrum.
    '''

def pk_hm(k, z, profile1, mass_function, profile2=None):
    ''' Wrapper for 1-halo + 2-halo power spectrum'''

This could be extended to compute all integrals of the form I^n_{UVW...}, (e.g. for bi-spectra etc.) but not necessarily right now.

I am not sure this functionality needs to be wrapped into a class, but maybe it makes sense to do so (e.g. if we want to pass halo-model-specific precision parameters for the mass integrals etc.).

damonge commented 4 years ago

Alright, everything to do with mass definitions, n(M), b(M) and c(M) has been solved in #688 Left to do:

profiles
halo model power spectra

LSSTDESC / CCL