LSSTDESC / DC2-production

Configuration, production, validation specifications and tools for the DC2 Data Set.
BSD 3-Clause "New" or "Revised" License
11 stars 7 forks source link

DC2 Public Release Preparation #401

Closed yymao closed 3 years ago

yymao commented 3 years ago

[Updated Dec 9 by @heather999 @yymao]

Catalogs to release

Based on the discussion in #desc-dc2-public-data-release, the DM/DA telecons, and consultation with various groups, we have decided that the initial release at the 2021 AAS meeting will include:

  1. DR6 WFD object catalog, with only the DPDD-like columns. Native quantities won't be included.
  2. "Truth-match" catalog, with coordinates, redshifts, and static fluxes for galaxies, stars, and SNe, and nearest neighbor match from the object catalog.

Both catalogs are partitioned by tracts (according to Run 2 skymap) to allow easy access and match.

Further data products (e.g., photo-z) are planned to be release a later time, after the initial release.

Preparation

We use this issue to track the progress of DC2 Public Release.

To-do for object catalog

To-do for truth-match catalog

To-do for access infrastructure

To-do for documentation and announcement

DESC note

heather999 commented 3 years ago

Thanks for setting up the checklist! We also need to make sure each catalog is accessible via Globus Sharing at NERSC so that public users can download files. There was also some talk about the need to mirror the data and perhaps stand up a back up website - especially in light of the recent NERSC outages.

yymao commented 3 years ago

Per the discussion in the Nov 18 DM/DA telecon, here's the "baseline" proposal of catalogs to be made public:

Note that in this "baseline" proposal, the public dataset does not include: photo-z or metacal add-on catalogs, native quantities in the original deepCoadd catalogs, transient information in the truth catalogs.

Comments on this "baseline" proposal are welcome.

egawiser commented 3 years ago

I've made this point on Slack but will add it here: users of this catalog will expect photo-z's, since that is a standard data product in similar surveys (e.g., HSC) and will be provided with LSST catalogs. If we do not provide them, they will calculate their own, in many cases poorly. DESC users of DC2 will want photo-z's as well, and it makes sense to provide the same photo-z catalog as part of the public release that will be made available internally to DESC. I think it's quite flexible how much detail those "photo-z"s provide - a single point estimate with uncertainty and something like BPZ ODDS, rather than a full PDF, would satisfy the vast majority of users both inside and outside DESC. Significantly more science will result from DC2 if we provide a photo-z catalog (both internally and externally).

johannct commented 3 years ago

as mentioned on slack as well, PZWG is investigating alternative ways to build a useful but simplified photoz catalogue. As of now, I believe that the current internal photoz catalogue is meant to stay internal. As the alternative photo-z catalogue is not implemented, the current "baseline" referred above justifiably puts pz catalogue aside. "baseline" does not mean "final".

aimalz commented 3 years ago

To catch up this issue thread, @sschmidt23, @johannct, and I have been discussing this further on Slack and are favoring the idea of releasing mock photo-z PDFs and/or point estimates (and/or an interface to the code that provides them) rather than the BPZ photo-z PDFs.

The mock photo-z point estimates/PDFs are generated from a forward model of p(z_true, z_est), which can correspond to the Science Book requirements on bias, scatter, and outlier rate that are already baked into all our cosmology forecasting or the (z_phot vs. z_spec) plots from any given estimation code with whatever assumptions (training set or templates/prior) you want on whatever data set (such as Buzzard rather than DC2, meaning we could even use the existing data that became the PZ DC1 paper's scatterplots). The code for this is already public and validated (and very lightweight), but it's part of chippr so hasn't gotten much use outside of n(z) inference; by popular request, I'll be separating it into a standalone package during Sprint Week.

The advantage of the mock photo-z point estimates/PDFs is that they don't depend on the problematic photometry. On Slack, @egawiser gave examples of the use cases for the photo-z data products packaged with the public DC2 release, and it sounds like they don't actually need self-consistency with the photometry but do need realism with respect to general quality. The issue with the BPZ photo-z PDFs is that they're consistent with the photometry but not necessarily realistic in quality, and fiddling with the priors or templates won't fix that to make them appropriate for those non-DESC use cases.

So, the proposal is to provide a handful of instances of the p(z_true, z_est) forward model of photo-z PDFs/point estimates and the chippr-based code to generate photo-z PDFs/point estimates from those for each input z_true conditioned on a given p(z_true, z_est) model. As a default, that model could be based on the Science Book requirements. If we provide some p(z_true, z_est) models that are more realistic, @sschmidt23 suggested having separate models for centrals and satellites if that information is provided in the catalog that contains z_true.

Thoughts?

katrinheitmann commented 3 years ago

A skeleton for the DC2 Data Release Note is now available here: https://www.overleaf.com/4819233759jvvcmmqjzfcc

yymao commented 3 years ago

Truth-match catalog progress can be found in #403.

I also went checking if there's any remaining issues on the object catalog column definition, and notice https://github.com/LSSTDESC/gcr-catalogs/issues/397 that we should resolve before the release.

yymao commented 3 years ago

Updates to GCRCatalogs that should to be implemented before public release are tracked here: https://github.com/LSSTDESC/gcr-catalogs/milestone/11

yymao commented 3 years ago

:tada: Public Release is online: https://lsstdesc-portal.nersc.gov/