desihub / desitarget

DESI Targeting
BSD 3-Clause "New" or "Revised" License
18 stars 23 forks source link

flux and shape distributions in DR3 #106

Closed sbailey closed 7 years ago

sbailey commented 7 years ago

Do this in a way that can be trivially re-applied to DR4 and/or with future updates to the target selection cuts.

belaa commented 7 years ago

A few plots of magnitude and shape distributions for LRGs, ELGs, and QSOs (shown in that order) that have passed selection cuts in DR2. The cuts were only applied to a subset of the tractor data and returned 3752 LRGs, 20808 ELGs and 2105 QSOs. The columns 'dev_ell' and 'exp_ell' are the magnitude of the ellipticity for the deVaucouleurs and exponential models, respectively.

I've modified io.py to extract shape vars e1, e2 from tractor after cuts were applied. If it would be useful I can make a pull request to make it part of the standard code.

lrg_corner

elg_corner

qso_corner

geordie666 commented 7 years ago

@belaa: Yes, please issue a pull request. If the code looks reasonable I will merge this into io.py

sbailey commented 7 years ago

I explored a bit incorporating the flux distributions from the real data into the mocks. Some notes:

Suggested interface: add an option to select_mock_targets to point it to a real data target selection catalog and pull fluxes and shapes from there. If @belaa can convert these distributions into parameterizations, that could be used as a fallback (or primary?) method for filling in these quantities with approximately correct distributions, scatter, and correlations.

moustakas commented 7 years ago
Suggested interface: add an option to select_mock_targets to point it to a real data target selection 
catalog and pull fluxes and shapes from there. If @belaa can convert these distributions into 
parameterizations, that could be used as a fallback (or primary?) method for filling in these 
quantities with approximately correct distributions, scatter, and correlations.

I don't know if you have something specific in mind already @belaa, but mixtures-of-Gaussians should work really well to characterize these correlations. Here's one example.

dkirkby commented 7 years ago

As a first step, lets implement a Gaussian mixture model only using (g, r, z) for LRG, ELG, QSO (and defer the shapes until later). The cuts create some sharp edges in the distribution that would require additional components and will still not be modeled well (where the density is often highest), so I propose that we fit the GMM on a sample with the cuts relaxed (or even removed), and then apply the cuts after sampling from the GMM.

dkirkby commented 7 years ago

To illustrate this problem with applying GMM to a distribution after hard cuts (and since I was thinking about this problem already in another context):

gmm_with_cuts

The histogram shows data sampled from a single Gaussian with a cut that removes 40% on the high side. The model shows the "best fit" GMM (minimum BIC) which has 8 Gaussians, and still doesn't look great.

sbailey commented 7 years ago

fixed via #128 (Gaussian mixture model) and #127 (sample real data target catalog) to get distributions of fluxes into the mocks.

moustakas commented 7 years ago

It would be helpful for #136 if we could get the shapes incorporated into these GMMs, even if they're preliminary / imperfect. Is that relatively easy? (Thank you for the WISE fluxes, BTW!) .

I wonder if we could deal with the sharp cut-offs by broadening our selection boundaries, for example, by pre-selecting all targets that come within 1-sigma of the selection box given their individual uncertainties. Yes, this would let in some riffraff (and would require some new code), but the way I'm setting up the code the targeting cuts are applied after each object in the GaussianRandomField mocks are assigned physical properties by these GMMs.