flux and shape distributions in DR3

sbailey commented 7 years ago

Characterize the flux and object shape distributions for objects that pass the target selection cuts applied to DR3
provide utility functions that can randomly sample those distributions
update select_mock_targets to randomly assign fluxes and shapes to LRG, ELG, QSO targets that come from mocks inputs that don't provide them

Do this in a way that can be trivially re-applied to DR4 and/or with future updates to the target selection cuts.

belaa commented 7 years ago

A few plots of magnitude and shape distributions for LRGs, ELGs, and QSOs (shown in that order) that have passed selection cuts in DR2. The cuts were only applied to a subset of the tractor data and returned 3752 LRGs, 20808 ELGs and 2105 QSOs. The columns 'dev_ell' and 'exp_ell' are the magnitude of the ellipticity for the deVaucouleurs and exponential models, respectively.

I've modified io.py to extract shape vars e1, e2 from tractor after cuts were applied. If it would be useful I can make a pull request to make it part of the standard code.

lrg_corner

elg_corner

qso_corner

geordie666 commented 7 years ago

@belaa: Yes, please issue a pull request. If the code looks reasonable I will merge this into io.py

sbailey commented 7 years ago

I explored a bit incorporating the flux distributions from the real data into the mocks. Some notes:

galaxia mocks (MWS, stdstars) have SDSS ugriz, though the current desitarget.mocks.io.read_galaxia function only extracts the SDSS r band. It could be updated to get ugriz, and then use the transformations in equations 4-6 of DESI-1788 to get DECam mags:
- gDECaLS − gSDSS = 0.01684 − 0.11169 (g − r)SDSS
- rDECaLS − rSDSS = −0.03587 − 0.14144 (r − i)SDSS
- zDECaLS − zSDSS = −0.00756 − 0.07692 (i − z)SDSS
BGS Durham MXXL mocks have SDSS r apparent mag and a restframe g-r color. @moustakas advised me that it is better to not do anything than to do something wrong, so I suggest that we make a small adjustment for rSDSS -> rDECaLS for a median BGS color, but otherwise not try to fake up g and z band.
ELG, LRG, QSO are easy since they don't have any flux information -- we can just draw ELG/LRG/QSOs randomly from the DR3 catalog and use those colors.

Suggested interface: add an option to select_mock_targets to point it to a real data target selection catalog and pull fluxes and shapes from there. If @belaa can convert these distributions into parameterizations, that could be used as a fallback (or primary?) method for filling in these quantities with approximately correct distributions, scatter, and correlations.

moustakas commented 7 years ago

Suggested interface: add an option to select_mock_targets to point it to a real data target selection 
catalog and pull fluxes and shapes from there. If @belaa can convert these distributions into 
parameterizations, that could be used as a fallback (or primary?) method for filling in these 
quantities with approximately correct distributions, scatter, and correlations.

I don't know if you have something specific in mind already @belaa, but mixtures-of-Gaussians should work really well to characterize these correlations. Here's one example.

dkirkby commented 7 years ago

As a first step, lets implement a Gaussian mixture model only using (g, r, z) for LRG, ELG, QSO (and defer the shapes until later). The cuts create some sharp edges in the distribution that would require additional components and will still not be modeled well (where the density is often highest), so I propose that we fit the GMM on a sample with the cuts relaxed (or even removed), and then apply the cuts after sampling from the GMM.

dkirkby commented 7 years ago

To illustrate this problem with applying GMM to a distribution after hard cuts (and since I was thinking about this problem already in another context):

gmm_with_cuts

The histogram shows data sampled from a single Gaussian with a cut that removes 40% on the high side. The model shows the "best fit" GMM (minimum BIC) which has 8 Gaussians, and still doesn't look great.

sbailey commented 7 years ago

fixed via #128 (Gaussian mixture model) and #127 (sample real data target catalog) to get distributions of fluxes into the mocks.

moustakas commented 7 years ago

It would be helpful for #136 if we could get the shapes incorporated into these GMMs, even if they're preliminary / imperfect. Is that relatively easy? (Thank you for the WISE fluxes, BTW!) .

I wonder if we could deal with the sharp cut-offs by broadening our selection boundaries, for example, by pre-selecting all targets that come within 1-sigma of the selection box given their individual uncertainties. Yes, this would let in some riffraff (and would require some new code), but the way I'm setting up the code the targeting cuts are applied after each object in the GaussianRandomField mocks are assigned physical properties by these GMMs.

desihub / desitarget

flux and shape distributions in DR3 #106