Closed sbailey closed 7 years ago
A few plots of magnitude and shape distributions for LRGs, ELGs, and QSOs (shown in that order) that have passed selection cuts in DR2. The cuts were only applied to a subset of the tractor data and returned 3752 LRGs, 20808 ELGs and 2105 QSOs. The columns 'dev_ell' and 'exp_ell' are the magnitude of the ellipticity for the deVaucouleurs and exponential models, respectively.
I've modified io.py to extract shape vars e1, e2 from tractor after cuts were applied. If it would be useful I can make a pull request to make it part of the standard code.
@belaa: Yes, please issue a pull request. If the code looks reasonable I will merge this into io.py
I explored a bit incorporating the flux distributions from the real data into the mocks. Some notes:
desitarget.mocks.io.read_galaxia
function only extracts the SDSS r band. It could be updated to get ugriz, and then use the transformations in equations 4-6 of DESI-1788 to get DECam mags:
Suggested interface: add an option to select_mock_targets
to point it to a real data target selection catalog and pull fluxes and shapes from there. If @belaa can convert these distributions into parameterizations, that could be used as a fallback (or primary?) method for filling in these quantities with approximately correct distributions, scatter, and correlations.
Suggested interface: add an option to select_mock_targets to point it to a real data target selection
catalog and pull fluxes and shapes from there. If @belaa can convert these distributions into
parameterizations, that could be used as a fallback (or primary?) method for filling in these
quantities with approximately correct distributions, scatter, and correlations.
I don't know if you have something specific in mind already @belaa, but mixtures-of-Gaussians should work really well to characterize these correlations. Here's one example.
As a first step, lets implement a Gaussian mixture model only using (g, r, z) for LRG, ELG, QSO (and defer the shapes until later). The cuts create some sharp edges in the distribution that would require additional components and will still not be modeled well (where the density is often highest), so I propose that we fit the GMM on a sample with the cuts relaxed (or even removed), and then apply the cuts after sampling from the GMM.
To illustrate this problem with applying GMM to a distribution after hard cuts (and since I was thinking about this problem already in another context):
The histogram shows data sampled from a single Gaussian with a cut that removes 40% on the high side. The model shows the "best fit" GMM (minimum BIC) which has 8 Gaussians, and still doesn't look great.
fixed via #128 (Gaussian mixture model) and #127 (sample real data target catalog) to get distributions of fluxes into the mocks.
It would be helpful for #136 if we could get the shapes incorporated into these GMMs, even if they're preliminary / imperfect. Is that relatively easy? (Thank you for the WISE fluxes, BTW!) .
I wonder if we could deal with the sharp cut-offs by broadening our selection boundaries, for example, by pre-selecting all targets that come within 1-sigma of the selection box given their individual uncertainties. Yes, this would let in some riffraff (and would require some new code), but the way I'm setting up the code the targeting cuts are applied after each object in the GaussianRandomField mocks are assigned physical properties by these GMMs.
Do this in a way that can be trivially re-applied to DR4 and/or with future updates to the target selection cuts.