Improve low-mass synthetic centrals

aphearin commented 6 years ago

We model synthetic low-mass galaxies in cosmoDC2 by sprinkling in subhalos until dn/dMpeak, the subhalo peak mass function, is a perfect power law down to a very low subhalo mass completeness limit. Then we paint stellar masses onto these fake subhalos using log-normal scatter about a power law approximation to the median relation <M*|Mpeak> in UniverseMachine, and then otherwise treat them like ordinary galaxies.

We have two treatments for the spatial positions of these galaxies, a centrals implementation, and a satellites implementation; the former distributes these galaxies at spatially random points within each populated healpixel, the latter distributes them according to a spherically symmetric NFW profile centered at a randomly chosen host halo. These two implementations give nice bracketing cases for the density-dependence of blending, which will occur at much higher rates in the satellites implementation relative to the centrals implementation, even though these two implementations are designed to produce identical luminosity functions, by construction.

Currently, our centrals implementation falls short of achieving the desired number of additional galaxies to complete the power law. This failure happens for two reasons:

We only generate enough fake galaxies to double the size of the mock, as opposed to generating enough to hit the desired dn/dmag. This was done to reduce the memory load for short term purposes and should be trivial to fix.
The fake centrals are distributed randomly within a rectangle enclosing the healpixel, so some galaxies are lost due to masking. A sketch of the solution to this is demonstrated in the following gist: https://gist.github.com/aphearin/59a23dae6843c5d4674a56d523535001

@evevkovacs - can you take a look at the gist and try to implement the improvement?

aphearin commented 6 years ago

In cosmoDC2_v0.4, we decided to use the satellite treatment of synthetic low-mass galaxies, rather than hold up the pipeline for this fix. Based on this round of DESCQA tests, our current satellites implementation performs well on the dn/dmag and dn/dz tests, which are the primary tests this synthetic population was designed for; here is an example plot for convenience.

hsc_dndmag

Based on this result, I think we should just use the satellites treatment for the final cosmoDC2 release. We can improve the implementation of the centrals after the cosmoDC2 release. Before I move this issue to post-cosmoDC2 milestone, tagging @rmjarvis and @rmandelb to see whether they have an opinion concerning our final choice for the implied density-dependence of the blending.

rmandelb commented 6 years ago

First, just commenting on the plot: The number counts look very good! Is it similar in other bands? While we are not quite comparing apples to apples (because the HSC curve is after blending and other observational effects, and the other is before it), it is really nice to see this agreement, and hopefully the observed counts in the sims will be similar.

Now to answer your actual question: previous work suggests that blending in a survey the depth of LSST is dominated by pairs at different redshifts, so that implies that in real life the blending is mildly but not very density-dependent (except for in the cores of massive clusters which are a small fraction of the sky indeed). So - for this effect that you're describing to change that balance between blends at the same vs. different redshifts in a significant way, my guess is that you'd have to be modifying the small-scale galaxy correlation function in a way that would probably screw up other tests in DESCQA... like do we have a test of the angular clustering as a function of scale in magnitude bins, or something that would pick up an overly prominent 1-halo term? The redmapper richness-mass relation would also be a good test for this, unless these satellites are sufficiently faint that they wouldn't pass the luminosity cut to be caught by redmapper. Essentially I find it hard to imagine that we have no clustering or cluster-related tests that would catch the use of the satellite treatment being a serious problem.

Curious to hear @rmjarvis 's take.

aphearin commented 6 years ago

@rmandelb - these are very low-mass satellites, dwarf galaxy mass with all stellar masses M<10^8, following a power-law down to 10^6; so, extremely sparse observational data on the statistics of this population, I should have provided that context earlier in this issue. In DESCQA, there are no two-point constraints in this regime, and these galaxies have zero impact on cluster richness as well, simply because they are so faint. These galaxies do* impact the very faint-end of the luminosity function probed by HSC (which, yes, looks good in other bands as well, another plot below in the y-band, see the above DESCQA link for the full gamut).

hsc_y

aphearin commented 6 years ago

@rmandelb - This DESCQA readiness test run recently run by @evevkovacs shows that there are a number of problems remaining to solve for the centrals treatment, such as bi-modality of the stellar mass function, wild outlying redshifts, and bimodal fluxes; none of those are mysterious problems, they just take time. The satellites treatment is free of any such errors in its current implementation, and since this population almost exclusively impacts this particular DESCQA test, that's why I had been advocating to just delay cleaning up the centrals treatment, and go ahead and run cosmoDC2 with the satellites treatment. Since galaxies at this low of a mass are likely to have a satellite fraction of ~30-40%, then the truth must live somewhere in between these extremes. However, if there is an important reason for analyses of density-dependent blending to use the opposite extreme by choosing the centrals treatment instead, then we can take the additional time to solve those problems for cosmoDC2. I honestly don't know how to evaluate that, so input from @rmandelb and @rmjarvis and whomever else is welcome.

evevkovacs commented 6 years ago

Here are some more details (note the plots below were run on the baseline catalog (before Galacticus matching) : For the baseline catalog made with centrals option there are still some problems: There are some weird high values for the redshifts (possibly interpolation failures), and the Magnitude and stellar-mass distributions still havw some bimodality. Test reports that halo_ids are not unique but galaxy_id's are ok. sm_cen mag_cen redshift_cen

For the baseline catalog made with satellites option, the redshift and stellar-mass distributions look ok. Test reports that halo_ids are unique but galaxy_ids are not.

rmandelb commented 6 years ago

Yes, there are clearly some kinks to be worked out with the centrals option, but if you're not planning to use that for the cosmoDC2 version that feeds into image sims, I can imagine that fixing that is lower priority.

For the satellites option, what does "redshift and stellar mass distributions look ok" mean - just a lack of extreme outliers in the former and bimodality in the latter? Or is the redshift distribution relatively smooth / does it satisfy basic sanity checks, as well?

evevkovacs commented 6 years ago

"Look ok" means no bimodality and overall reasonably smooth. Here are the plots redshift_sats mag_sats sm_sats

The redshift distribution has some fluctuations for z>1.5. I'm not sure whether or not to be worried about this. The plot is for just one healpixel (~54 sq. deg.) so could these be due to cosmic variance?

rmandelb commented 6 years ago

Definitely much better than the satellite model. I'm not finding the redshift distribution fluctuations very worrisome. Thanks for the plots!

evevkovacs commented 6 years ago

@rmandelb Maybe you misspoke, but just to be clear, the plots in my last comment were for the satellite model NOT the central model.

rmandelb commented 6 years ago

Yes, I just mistyped, sorry.

evevkovacs commented 6 years ago

With the implementation of the improved loop to make sure that all the simulated synthetic central galaxies end up inside the healpixel, the bimodality in the stellar-mass and Magnitude distributions has gone away. (The issue was that not enough centrals were being saved to "fill in" the expected distributions at low M*) See readiness test on the baseDC2 catalog. Same test also shows the redshift distribution which after the bugfix, lies between 0 and 3 as expected.

LSSTDESC / cosmodc2

Improve low-mass synthetic centrals #14