Open yymao opened 6 years ago
I have a general comment regarding three test: dN/dmag, color-mag and color distribution test. These are really testing different aspects of the same thing, namely the full distribution of magnitudes. I.e. if you had statistically indistiguishable distribution of magnitudes from reality, you would automatically pass all three. The first test looks at 1D histograms of mags, the second test is mag vs dmag and the third is dmag vs dmag correlations (where dmag is delta mag, e.g u-g, e.g. color). My worry is two-fold:
by using disparate test, e.g HSC for something and SDSS for something else we could force ourselves into a corner we it would be impossible to satisfy all the same time (due to differing depth).
the focus is too much on getting 1D and 2D projections right, but you really want a general distribution to be correct enough.
So, my suggestion would be to merge them into a single test that perhaps has more than one test in it. A possible test would be: for magnitude vector v = (u,g,r,i,z,y) calculate median, mean and its covariance matrix. The possible validation criteria could then be (after accounting for depth):
I think this are nice overall test without going into vagaries of color-color histograms that will never look perfect from galacticus but they also don't really matter that very much for our fundamental science. The mean and scatter of magnitudes are directly connected to the number of detected objects, their SNR, etc., so they are very relevant.
For photo-z and clusters, they might actually care about some of the color-color distributions though.
I agree with the basic idea that we don't want to paint ourselves into a corner by devising validation tests based on different datasets that turn out to be impossible to satisfy. My feeling had been that we may indeed need to check all these different things, but our validation criteria cannot be super tight, and that's how we avoid painting ourselves into a corner.
@yymao @evevkovacs - Just to collect some basic progress notes here: SL and SN confirmed they care more about the sprinkler. It's useful for the extragalactic catalogs to be at least vaguely reasonable but the existing validation tests are enough to ensure that. I will continue to work with the remaining analysis WGs.
In this comment I will collect the list of working group contacts for extragalactic catalog validation. Currently several are listed as TBD, so this is not very useful, but I will edit the comment as I get more information:
@morriscb is the other contact for PZ, we're planning on discussing tests on Friday and updating shortly after that.
@sschmidt23 pointed out this thread to me -- I want to concur with Rachel on this: for assessing photo-z performance we care much much more about getting the range of galaxy SEDs correct than the overall luminosities of objects (which is what the magnitude vectors is more sensitive to). The distribution of galaxy colors is our best way of assessing SED. I don't expect Galacticus to be perfect at this by any means but rather the intention of our color tests is to be able to assess which simulations / parameter values improve things vs. make them worse.
@sschmidt23 @slosar @morriscb @j-dr @erykoff -
Thanks for agreeing to represent PZ, CL, and LSS on the extragalactic catalog validation. As a reminder, in the next 2 days we'd like to have the following:
a finalized list of what validation tests are needed for the DC2 extragalactic catalogs for your science; open new issues if anything is missing from the current list in this repository.
for each validation test, we need a clear validation criterion and validation dataset. Comment in the issues about these, please. Please remember that if we want to make sure the catalogs can support our planned DC2 work, but without being so stringent that the tests become nearly impossible to pass given the current limitation of our mock catalog-making knowledge/methods.
ideally, you would have a volunteer from your working group who can implement the test. If you don't, please still post the issue and we can do one further advertisement for volunteers (but we cannot assume there is enough available effort within the CS working group to implement all requested tests themselves).
If you have any questions about defining tests / validation criteria / etc., please comment on here or the relevant issue. I am happy to answer questions, as are @yymao and @evevkovacs . Also, they have tried to make it easy to implement new tests without having to learn all the ins and outs of DESCQA -- see https://github.com/LSSTDESC/descqa/blob/master/README.md .
@jablazek @elikrause @timeifler @joezuntz -
Please comment in this thread with the name / GitHub idea of the person who will work on the extragalactic catalog validation for your working group (for TJP I believe one person was asked but may not have confirmed; I did not hear a name for WL yet). See the message above this one for what we are asking those people to do, and direct them to this thread.
@rmandelb For WLWG @msimet has volunteered to be the DESCQA liaison.
@rmandelb : @patricialarsen has volunteered for TJP. @timeifler, she has been doing WL-related work as well on proto-DC2 and is interested in coordinating with @msimet.
@rmandelb @jablazek @patricialarsen This is great to hear, Patricia has already reached out to Melanie and myself.
@rmandelb @yymao Is there a living list of current tests or does simply the list of issues with "validation test" label acts as such? Can you elevate my rights so that I can add "validation:required" to some tests, like for example this galaxy bias/clustering test? (or should I tell you which one I think are required?)
@slosar - the list of issues w/ "validation test" label is the living list of tests. I would love to elevate your rights but I'm not fancy enough to do that (I have labeling privs but not "giving other people labeling privs" privs, apparently).
Perhaps @yymao or @evevkovacs can comment on the difference between the "validation test" and "validation test:required" labels; there are far more of the former than the latter, and I'm not sure how to interpret that. Are you wanting the analysis WGs to flag which ones are particularly important so they can be called "required"? I did not quite realize that so I hadn't requested that from anybody.
Yes, validation test:required is intended to flag tests which are required by the working groups and which the catalog must pass in order to be satisfactory. Other validation tests are those which have been suggested and may be nice to have but aren't as high priority to implement.
There is also the table 10 in the planning document which now lives on Github: https://github.com/LSSTDESC/DC2_Repo/tree/master/Documents/DC2_Plan. That list provides a quick overview and has the same required/nice to have distinction. You can edit that table in principle. Yao seems to be the only one who has the power to help with the labels ...
@slosar: the original idea is that the WGs will report to @rmandelb and discuss here on the set of required validation tests, and then @rmandelb will add the labels on them.
However, if this workflow is not efficient, I'm happy to make necessary changes to make things easier!
@yymao @rmandelb Ok, so rachel, could you add "validation test:required" to: https://github.com/LSSTDESC/descqa/issues/10 (bias/clustering test). The other two that I count as required, issue 11 (N(z)) and 7 (dN(z)/dmag) already have them. Others relevant to LSS in Table 11 would be nice to have, but I wouldn't quite count them as required. Perhaps for DC3.
Done. Thanks for thinking through which ones are more important than the others for LSS. And I believe the bias/clustering test is also required for PZ to achieve its goals with the clustering redshift analysis, as well.
@j-dr and @erykoff - can you please let us know the status of cluster-related extragalactic validation tests? See e.g. this comment: https://github.com/LSSTDESC/descqa/issues/50#issuecomment-357552766 earlier in this thread for info on what we are looking for.
I'm about to go offline for a day, but Yao, Eve, and others on this thread may be able to answer if you have questions about the process.
We had a discussion of validation tests within the LSS group a two main issue arose:
@slosar Could you please clarify exactly what test(s)/check(s) you are proposing under your first bullet. The galaxy-shape modeling in the extra-galactic catalog is very simple. All we have are sizes for the disk and bulge and we assume n=1 Sersic profiles for disks and n=4 profiles for bulges to get half-light radii. The value of the magnification is given at the galaxy location. I think you are proposing a check that is better done on the image simulation result rather than the extra-galactic catalog, but I may have misunderstood.
What validation data set and criterion are you proposing to use for second bullet? Validating a 2d histogram is not as straightforward as validating a 1-pt distribution and I was wondering what you had in mind.
@slosar - thanks for the feedback from LSS. To answer your questions:
you are right that this is more an "is it implemented correctly" issue, rather than one where we are testing the extragalactic catalogs against data. The catalogs have lensed magnitudes and sizes in them, so I guess the question is do we need to explicitly test the densities are changing in the correct way at fixed magnitude? Or do we just need to test that unlensed vs. lensed magnitudes and sizes don't have some bug? (because if those are right, then the density trends being correct for a fixed flux cut seems to follow directly)
I think you are correct that #7 and #11 could be replaced by a test of N(z,mag) for some single band, and this takes care of a number of issues with only testing N(z) and N(mag) in 1D. And, as you said, we can do this in just 1 band, because we also have tests of color distributions which take care of the other bands. I guess the question is what validation dataset would we have for this new 2D N(z,mag) test? Right now we're using DEEP2 for N(z) down to some fixed mag thresholds, but we're using HSC for N(mag) because it's a larger survey so we expect fewer issues with cosmic variance and can conveniently chop up the sample into smaller slices in magnitude without too much noise. If we do the full 2D test then I guess we would have to do some kind of parametric fits to DEEP2 in the 2D plane? Or perhaps use the 1D N(z) in mag bins, and use HSC to set the normalization of N(mag) integrated across z?
@evevkovacs @rmandelb Thanks for your quick responses:
Regarding magnification. I think we should be modulating the number densities by actually displacing the galaxies, even if we do it in the born approximation (if you're doing proper ray-tracing even better!) This means that for each galaxy, you generate the kappa field (integrated mass density) which you can than transform into gamma1, gamma2 and displacement vector (delta ra, delta dec), which is a non-local and hence somewhat painful operation. Is this being done right now? Then the galaxy catalog would have, ra, dec,z,etc and also dra,ddec, kappa, gamma_1, gamma_2 (if you don't have dra and ddec, you are having only part of the lensing effect present) If we confirm that these latter quantities have the right correlations (kappa-kappa, kappa-gamma_t, displacement gamma_r, etc) that would be good enough for me. CCL can do this and alsonso could be arm-twisted into doing this. This is the same category as https://github.com/LSSTDESC/descqa/issues/8 . While not absolutely absolutely crucial, it would be very useful to have proper magnification in DC2 because this could lead to some exciting WG projects.
Yes, I think 1D N(z) in mag bins in absolute counting units (as in number/sq deg not rel probability) is the simplest thing to do. I think in each mag bin you can then use data that we have for that bin up to redshift at which you trust it. I think the criteria should be to be within 20% of measurement (including counting noise).
If that is OK, will write both issues and then rachel probably needs to close 7 and 11. I think all the work that already went into them will of course keep on being very useful.
@slosar -
For (1): the catalog has both pre- and post-lensing positions (in addition to the pre- and post-lensing magnitudes and sizes that I mentioned earlier). I assume that there is an intent to use the post-lensing quantities for everything including positions when making the sims.
You are correct that we could use the statistical properties of pre- and post-lensing positions for a flux-limited sample to test these correlations.
For (2): this sounds reasonable to me. HSC can give the overall normalization of the number density across all z in the mag bins, and DEEP2 can give the dN/dz within the mag bins. I agree that it would be best to combine these into a single test rather than having separate dN/dmag and dN/dz tests. @evevkovacs - since you were asking about what Anze intended as well, are you comfortable with this suggestion given his clarification? @slosar - I agree about what needs to be done to the issues, but I want to give Eve a chance to comment on the way you've framed this test before we do that.
Patricia Larsen will comment on 1). For 2), we can change the N(z) tests to check the normalization. Can you point me to the datasets?
Looping in @duncandc since he has been working on #7. It sounds like the newly proposed test is closer to #11, just with a magnitude cut. Do we still want to pursue #7?
@slosar @rmandelb I disagree pretty strongly about combining dN/dz and dN/dmag into one test unless we still do a separate dN/dmag test. The reason is that dN/dmag is something we can measure directly empirically, whereas dN/dz (in mag bins) is a substantial extrapolation (by 2-3 mag) at LSST depth, and poorly constrained on the higher-z tail (z>1) even at the brighter magnitudes, primarily due to the sample/cosmic variance in the spectroscopic samples we have there.
We thus should worry much more if we don't match dN/dmag than we should if we don't match dN/dmag/dz , unless you're talking about i~20-21 or less where spectroscopic samples are more complete and wider samples can be used. However, that magnitude range is essentially irrelevant (by numbers) for LSST science.
@slosar
I can confirm that the galaxy position does shift due to lensing in our catalogues. These deflections are calculated within the ray-tracing algorithm at the same time as the other weak lensing observables. This should be in the catalogue as a pre-lensed and a post-lensed position. The magnitudes are also altered to take into account the lensing magnification (this may not have actually been implemented in previous versions but it certainly will be for DC2).
So yes, this is all currently implemented. We are certainly doing sanity checks on the deflection fields as we compute them and we have two independent codes to compute these fields (flat-sky and full-sky) so the implementation shouldn't have any major bugs.
If you want to implement a proper validation test to confirm this then we could do something as simple as plot the distribution of lensed-unlensed positions and make sure they're the right order of magnitude. Or if you want to check the statistics thoroughly you could do further validation tests (although note that the change in flux/luminosity/magnitude should be fine since this is computed via the shear and convergence which are well-tested).
@janewman-pitt-edu @rmandelb Jeff, you make a good point re dN/dmag being more directly observable. It is also more relevant for things like blending than N(z). So I take back my original suggestion. But at minimmu, Issue #7 should specify which magnitude is this for, now I don't see this anywhere. I think as long as it is for one magnitude we're fine (and then color test will take care of others). Similarly, the issue #11 should be a simple test in the same fiducial magnitude that N(z, m<mag) is correct, with perhaps less stringent requirements. So, in other words, we keep the same as now, but instead of 2D, we do two projections, in mag and z. Fine, but specify tests better.
@patricialarsen Thanks. Ok, if you think you're doing the right thing then I trust you do. In any case, I think the important thing is to have the input as well as output catalogs available with as part of DC2 with all this information in them.
@janewman-pitt-edu - I originally thought as you did - let's decouple dN/dmag and dN/dz because the former is directly observable and the latter is not.
However: it's important to note that protoDC2 has a z<1 cut. So it would actually be quite wrong to compare, say, HSC dN/dmag against protoDC2 dN/dmag without accounting for this. We may have other catalogs with z cuts. A combined test would allow us to account for this (which I admit we have to do using corrections based on the expected dN/dz, so it's not perfect) rather than doing an incorrect test of dN/dmag with no corrections.
So I actually want to push back on you a bit. As I said, I did originally agree with you; but the dN/dmag and other tests are wrong if defined in a way that doesn't account for redshift cuts.
Just my two cents: we should design the validation tests to be most useful for the cosmoDC2 catalog (reminder: that's the big catalog we want to use for DC2). protoDC2 is meant as a test bed to implement all the tests rather than a catalog that will fully fulfill all the requirements. Due to some changes in the cosmoDC2 production strategy we will have a slightly longer period for full cosmoDC2 validation and also opportunities to "tweak" that catalog more.
On 1/25/18 10:19 AM, Rachel Mandelbaum wrote:
@janewman-pitt-edu https://github.com/janewman-pitt-edu - I originally thought as you did - let's decouple dN/dmag and dN/dz because the former is directly observable and the latter is not.
However: it's important to note that protoDC2 has a z<1 cut. So it would actually be quite wrong to compare, say, HSC dN/dmag against protoDC2 dN/dmag without accounting for this. We may have other catalogs with z cuts. A combined test would allow us to account for this (which I admit we have to do using corrections based on the expected dN/dz, so it's not perfect) rather than doing an incorrect test of dN/dmag with no corrections.
So I actually want to push back on you a bit. As I said, I did originally agree with you; but the dN/dmag and other tests are wrong if defined in a way that doesn't account for redshift cuts.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/descqa/issues/50#issuecomment-360517640, or mute the thread https://github.com/notifications/unsubscribe-auth/AMQ9jE753VrWsHtRtj9rN-QVFeP0maJyks5tOKmigaJpZM4RCgG7.
Agreed with @katrinheitmann's point.
On the other hand, isn't a redshift cut in mock catalogs kind of unavoidable, unless we snitches even more larger boxes for high-z bright galaxies?
I agree with Katrin. So, I think it is easy to have a dN/dmag test and specify a redshift range. The validation data sets for different ranges can be different, For example, HSC data at high redshift and something else (eg SDSS) for lower redshifts.
Yes, but I thought the worry was that the redshift cut in protoDC2 was low at z~1 (cosmoDC2 will be higher). But maybe I misunderstood.
On 1/25/18 10:46 AM, Yao-Yuan Mao wrote:
Agreed with @katrinheitmann https://github.com/katrinheitmann's point.
On the other hand, isn't a redshift cut in mock catalogs kind of unavoidable, unless we snitches even more larger boxes for high-z bright galaxies?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/descqa/issues/50#issuecomment-360526343, or mute the thread https://github.com/notifications/unsubscribe-auth/AMQ9jPsl92UBxxDfIC4XNULND8lGrXPcks5tOK__gaJpZM4RCgG7.
Maybe I am missing something, but I thought that the observational data had redshift information and therefore selection cuts could be made to match what is in the simulated catalog if need be.
@yymao - true, there is always a redshift cut in mocks. But the issue is that if you're going to i=25, then we expect a few % of objects above z=3, but ~40% of objects above z=1. So if we do a test with a tolerance of 20%, we don't care if the mock has zmax=3. We care very much if it has zmax=1. My concern is about whether the redshift cut is sufficiently low that we're expecting to lose of order 10% or more of the galaxies we'd see in the real survey, in which case the validation test is invalid.
@evevkovacs - we don't in general have redshift information for imaging surveys. We have photo-z, but they are not sufficiently good to use in validation tests. For some of the validation tests we're using here, the validation dataset is SDSS or DEEP2, which provides spectroscopic information. That's why those tests can be defined easily in z ranges. But if our validation dataset is from an imaging survey like HSC, then we can't make the validation test in z ranges.
@katrinheitmann - I guess I figured we want these tests to be generally useful, so we have to assume that some mocks will have strict z limitations. I'm OK with saying we should design the tests for the ideal case, but if we do that, then I would strongly advocate for the test to not be run at all if the mock catalog (like protoDC2) has some condition that makes the test invalid. For example, all tests that integrate implicitly across all z for a survey the depth of HSC or LSST should all be disabled if the mock catalog has a zmax that is too low (say below z=2). Otherwise we will have the system generating plots that people will use to draw completely wrong conclusions.
Is that possible to do?
@rmandelb thanks for the clarification! And to your technical question, yes, a test can decide not to generate any plot (or do whatever alternative things) if it finds the catalog's max z is, say, less than 2.
Is it fair to say we need to mocks to have max z > 2? We can probably check if that's sufficient by run the test on buzzard (max z ~ 2.1). And looking ahead to cosmoDC2, what redshift cut do we think it'll have, @katrinheitmann @evevkovacs?
Yes, certainly. The test writer is free to specify conditions as she/he sees fit. For eaxample, it would be simple to set a requirement on the maximum redshift delivered by the catalog and if that requirement is not satisfied, the catalog is skipped.
We can probably check if that's sufficient by run the test on buzzard (max z ~ 2.1).
Sorry if I am missing something, but how can we check whether that's sufficient by running the test on buzzard?
My proposal would be to take our best understanding of dN/dz for the faintest magnitude limit for which we test the dN/dmag, integrate that to find the max redshift for which we'd be missing more than, say, 5% of the galaxies, and set that as the max redshift for the dN/dmag test.
@rmandelb sorry, you're right. I was thinking that we can just check if Buzzard matches to HSC dN/dmag, but then, of course, even if it doesn't match, we still don't know whether it is due to insufficient max z or something else.
I think we'll be missing >5% of galaxies with a z>2 cut even by i~23 or so...
This may not be a complete list, but for the WLWG we will definitely need a power spectrum test (eg #35) and an ellipticity distribution test (#14). There was also a suggestion that #14 be done as a function of redshift--I'm not sure yet if that's required or desired, but I wanted to check if you'd consider that a separate validation test, or an implementation issue for the existing ellipticity distribution test.
It would not be a separate test. I am working on the ellipticity distribution. z ranges can be added in a configuration file. Do you have an idea of what z bins would be of interest? That would be helpful in configuring the plots etc.
@rmandelb : I've gotten back further from my email and seen your pushback directly :) Yes, we certainly shouldn't compare dN/dm to redshift-incomplete samples. However, I don't see that as a reason to drop dN/dm entirely but rather as a driver to disregard it where it is irrelevant.
We need to keep in mind though that dN/dmag/dz will only be at all well-constrained (and not that well given small survey areas) to r~23 and z~1-1.4. The situation is worse for delta-mag bins than for integrated dN/dz down to a given magnitude because the latter seems to fit a simple functional form but the former does not (we could differentiate to get a smooth prediction for number in a delta-mag bin, but I wouldn't have much confidence that that'd look all that realistic; summing up a broader range can erase a lot of issues).
@rmandelb @janewman-pitt-edu Ok, so it seems that HSC is deep enough but without z-s and with larger cosmic variance and on the other hand DEEP2 can give some information on redshifts, but is incomplete. So I think there are two ways to generate tests:
I have a slight preference for the first option as it has two advantages: i) it naturally grows with growing catalogs (i.e. if catalog doesn't go beyond z=1, fine you don't compare there) and ii) if there are internal tensions between the two datasets they become immediately obvious. My understanding is that rachel supports this option to... However, I don't feel knowledgeable enough about this to judge if it is doable.
HSC has much smaller cosmic variance than DEEP2...
I think once you break DEEP2 into differential magnitude bins you're already getting dodgy. I believe the N(<m, z) constraints much more.
One follow-up thought: we could implement this as N(<m, z) with a variety of limiting magnitudes rather than as N( m, z) in differential magnitude bins. I think that'd work better, and we could use the DEEP2 extrapolations (with a grain of salt) to do them.
Another pair of required tests from the WLWG: galaxy size distributions and galaxy position angle distributions (assuming our ellipticity distribution test is for |e|, not e1 and e2). We're working on validation criteria now.
There is size-magnitude test under development. See https://github.com/LSSTDESC/descqa/issues/13 Galaxy position angles are randomly assigned. See position-angle distribution in readiness tests: See https://portal.nersc.gov/project/lsst/descqa/v2/?run=2018-01-30_4&test=readiness_protoDC2
This epic issue serves as the general discussion thread for all validation tests on the extragalactic catalogs in the DC2 era.
Note: Please feel free to edit the tables in this particular comment of mine since we will use them to keep track of the progresses of validation tests
:arrow_right: Required tests that we have identified (for DC2):
➡️ Tests that are not currently required but good to have:
Analysis WGs are encouraged to join this discussion and to provide feedback on these validation tests. This epic issue is assigned to the Analysis Coordinator @rmandelb, and will be closed when the Coordinator deems that we have implemented a reasonable set of validation tests and corresponding criteria for DC2.
@yymao, @evevkovacs, and @katrinheitmann can provide support to the implementation of these validation tests in the DESCQA framework. In addition to GitHub issues, discussions can also take place on the #desc-qa channel on LSSTC Slack.
P.S. The corresponding issue in DC2_Repo is https://github.com/LSSTDESC/DC2_Repo/issues/30