JulienPeloton commented 5 years ago

While looking at halo mass distribution in cosmoDC2 (v1.1.4) (see #54 in the context of HackUrDC2), I found a rather odd feature. There many different halos (i.e. different positions on sky) with exactly the same mass!

54 is using Spark to manipulate the data, but here is an example using GCR to access the data. Note that we select halo mass from only central galaxies to avoid double counting:

import GCRCatalogs
import numpy as np

gc = GCRCatalogs.load_catalog('cosmoDC2_v1.1.4_small')

data = gc.get_quantities(['halo_mass'], filters=['halo_id > 0', 'is_central'])

print("Number of Halos", len(data["halo_mass"])) # 31319288
print("Number unique Halo mass", len(np.unique(data["halo_mass"]))) # 19104

halo_mass_distribution

That means many halos with mass lower than $10^{13} {M}_\odot$ have exactly the same mass.

Question: Is that expected?

yymao commented 5 years ago

I think this is expected because halo mass should be a multiple of particle mass, and hence for smaller halos there would be many halos that have the same mass)

JulienPeloton commented 5 years ago

Thanks for the reply. I'm not an expert in simulations, so could you elaborate a bit more on this?

cwwalter commented 5 years ago

I'm not an expert either but I think what Yao is saying is the following:

In the N-body simulations the "particles" that are tracked are quite large dark matter masses. Once the halo mass gets near the size of those minimum particle masses because of the quantization you will see the same halo masses. In other words, for small halo masses you will only be able to have one or two or three particle masses etc worth of halos. You can't have (for example) 2.5 so you expect for relatively low halo masses to several with exactly the same mass where that value is N*(particle_mass).

From your graph above, since I see abut two orders of magnitude before they are all unique, I guess this is important for something like up to 100*particle_mass, I suppose the effect being more and more likely if there are lots of low mass halos.

I'm sure Yao etc can fill in more details, or correct my simple understanding of his comment.

rmandelb commented 5 years ago

Also, in case this helps you interpret the plot further: The particle mass in the simulations used for cosmoDC2 is a few x 10^9 solar masses, so that's the basic unit of quantization of halo masses in cosmoDC2 -- all halo masses must by definition be some integer times this particle mass value

This is not a problem for any of the science that we are trying to do. I like to think of it as a small rounding error (i.e., halos that should ideally have had a mass that is some non-integer multiple of the particle mass got rounded to an integer multiple of the particle mass). Once we're above some minimal mass threshold, we're not missing halos, we're just slightly rounding their masses off - and that rounding error is still a lot smaller than the physical effects that cause scatter between observable galaxy properties and host dark matter halo properties. To be quantitative, for halo masses that are around 10^11, the halo mass is ~40 times the particle mass in cosmoDC2, so the rounding error due to this quantization effect is at most ~1% of the mass, typically less.

salmanhabib commented 5 years ago

To add to what Rachel said, this mass quantization is kind of a trivial thing -- the important point to keep in mind also is that at low halo mass (i.e., halo masses not too large compared to the simulation particle mass), the notion of halo mass becomes quite approximate anyway. These masses are FOF (friends of friends) masses, which can be thought of roughly as a mass enclosed by an isodensity boundary -- as the number of particles associated with the halo becomes smaller, the FOF density estimation becomes pretty noisy (as well as biased), so these masses should not be taken all that seriously in the first place. All of this is discussed in a number of references, which I can put here if people are interested.

yymao commented 5 years ago

@JulienPeloton sorry for my original short reply. Everyone above has already clarified my reply. I'll be happy to answer further questions.

JulienPeloton commented 5 years ago

Thanks @cwwalter @rmandelb @salmanhabib @yymao for your explanations! I'm glad this is a known effect and not a problem with the simulation. @salmanhabib, I would be interested in reading some references for completeness.

I hit this quantization problem while looking at the velocity dispersion–halo mass relation. Below is the 2D histogram velocity dispersion-halo mass (logarithmic values), with 1D distributions on sides (more descriptions and plots on the hackurdc2 notebook):

mass-veldisp-relation

You can clearly see stripes along the velocity axis for masses < 5e13 M_o. While this might not be harmful, it does not look so great... Would you recommend then to keep only halo masses above ~1e13 M_o to have a meaningful analysis? Or you already concluded previously that one can safely keep the whole range of masses despite the quantization?

Thanks!

salmanhabib commented 5 years ago

@JulienPeloton , just to be careful, Figure 1 is not really a physical "effect", it's just an FOF mass "round-off" (as Rachel explained). I don't understand this second figure in comparison to your first figure in terms of the halo mass distribution. Can you explain further? Thanks!

Note that there are many different definitions of halo mass, since a "halo" is not a very well-defined object. For example, at a fixed SO mass, there could be a wide scatter for FOF masses and vice versa. Here are a couple of papers to help you: Lukic et al, arXiv:0803.3624 [astro-ph] and More et al, arXiv:1103.0005 [astro-ph.CO]. The FOF mass bias was first discussed by Warren et al (ref. is in the Lukic et al paper). There are several other papers but these are probably the most directly relevant.

JulienPeloton commented 5 years ago

@salmanhabib thanks for the references!

Concerning the second figure, here is what I've done:

Take the cosmoDC2 catalog.
Select only objects from non-synthetic halos (halo_id > 0).
Select only objects with stellar_mass > 5e10 M_o.

For those selected objects:

Compute the 3D velocity norm (sqrt(vx**2 + vy**2 + vz**2)).
Group data by halos and compute the velocity dispersion (1 number per halo).

The Figure 2 is then the 2D histogram of velocity dispersion (as computed above) vs mass (as given in the catalog) for each halo who passes the steps above.

The stripes in the figure, and the fact it is seen only for smallish masses, led me to link that to the quantization effect. But I might be wrong.

yymao commented 5 years ago

@JulienPeloton I am not sure if the stripes in your second figure is related to the halo mass round-off issue that we discussed above. For one thing, the width of the stripes is much wider the particle mass (quantization unit). Even though there are halos that have exactly the same mass due to the round-off error, one usually won't see visible stripes in the halo mass function as your first figure demonstrates.

Now, I am not sure what is the source that causes the stripes. If you don't apply the stellar mass cut, would the stripe pattern disappear? If so, the cause would have something to do with the stellar mass halo mass relation.

salmanhabib commented 5 years ago

Yes, this is something else as @yymao says. Time for @dkorytov or @aphearin to weigh in --

rmandelb commented 5 years ago

For one thing, the width of the stripes is much wider the particle mass (quantization unit).

That is one reason why it can't be related to the particle mass quantization. The other issue is that the stripes are perfectly evenly spaced in log(halo mass). Particle mass quantization results in effects that are evenly spaced in halo mass. For effects that are evenly spaced in log(halo mass) I would guess there is something with power law-ish behavior involved... so stellar vs. halo mass does seem like a good guess.

The fact that the 1D distribution of halo masses earlier in this thread shows no striping with a spacing of Delta(log(halo mass))~0.15 as in the second plot also reinforces the idea that this isn't something fundamental about the halo mass distribution itself, but rather is related to some derived quantity.

JulienPeloton commented 5 years ago

Thanks @yymao @rmandelb, I think I have enough to conclude this is likely unrelated to quantization. I will close this thread then and move the discussion on the second figure to another thread.

Thanks all!

dkorytov commented 5 years ago

I was able to reproduce the effect. The source of it is from how we sample Universe Machine (UM) galaxies into the Outer Rim (OR) halo light cone.

OR is a gravity-only simulation with 3 Gpc/h box length. UM is a model on top Multidark that produces an accurate galaxy population. To populate the OR lightcone with galaxies, we match halos from UM and OR and copy all the galaxies in a UM halo onto a OR lightcone galaxy. The matching is done by binning both halos by mass and randomly assigning an UM halo to an OR halo within the same bin. You can see the bins if you plot the host halo mass vs central stellar mass (2nd figure). I guess with the Mstar>5e10 cut, the effect jumps out more.

Same Cuts as Above

screenshot from 2018-12-28 16-57-44

All Central Galaxies

screenshot from 2018-12-28 16-57-29

aphearin commented 5 years ago

Nice work reproducing the effect @dkorytov, and especially to @JulienPeloton for discovering this feature. I guess this will put a limit on the accuracy with which cosmoDC2 observations could constrain the halo mass of stacked galaxy sample. In a future implementation, we could switch to the same bin-free method we use to GalSample Galacticus galaxies (a noisy nearest-neighbor search), doing that for the halo-halo correspondence rather than the bin-based method here.

dkorytov commented 5 years ago

@aphearin, this conversation has moved to #57. Just a heads up. :)

LSSTDESC / DC2-analysis

Halo masses in cosmoDC2: attack of the clones? #55

54 is using Spark to manipulate the data, but here is an example using GCR to access the data. Note that we select halo mass from only central galaxies to avoid double counting:

Same Cuts as Above

All Central Galaxies