Run2.2i DR6-WFD release

heather999 commented 3 years ago

This issue is to capture the discussion concerning preparation and validation of DR6 and its precursor releases such as DR6c, DR6d.

To Do

[x] Identify a consistent set of object and metacal catalogs and pixel level data for release
[ ] Identify validation criteria
[x] Fix any bad coadd files see https://github.com/LSSTDESC/ImageProcessingPipelines/issues/158
[x] Update desc-dc2-dm-data
[x] Create a new GCRCatalogs release that includes this data
[x] Prepare and update documentation
[x] Make an announcement

erykoff commented 3 years ago

As I promised on the Wednesday telecon, I have looked into the exposure time map to propose a region that should be the "DR6 WFD" footprint, avoiding the DDF fields which will only complicate analysis; these deep coadds can be in a separate catalog to make it clear that this is a fundamentally different dataset.

Here is what the i-band exptime map looks like as of the rsynced files the other day:

The scale is cut at 4000 because otherwise we're looking at the upper right corner blowing everything out. Zooming in, we get:

Note that tracts 4848, 4849, 5062, 5063, 5064 are not in the map since they are not done processing. I propose first of all that we exclude those.

The tricker question is 4850. Exclude the whole tract or just the dense patches? (We have to be careful then). These are the patches:

If we exclude patches, I propose a "clean" cut excluding 4850: 4,3^5,3^6,3^4,4^5,4^6,4^4,5^5,5^6,5^4,6^5,6^6,6 .

Meanwhile, I noticed that two tract parquet catalogs that are nominally in the coverage are not on nersc. These are 4428 and 2909. These are at the hairy edge and could probably be excluded, but we want to do so explicitly:

This leaves the following "full" tracts in the proposed DR6 footprint, with the question of what to do with 4850:

2723^2724^2725^2726^2727^2728^2729^2730^2731^2732^2733^2734^2735^2896^2897^2898^2899^2900^2901^2902^2903^2904^2905^2906^2907^2908^3074^3075^3076^3077^3078^3079^3080^3081^3082^3083^3084^3085^3086^3256^3257^3258^3259^3260^3261^3262^3263^3264^3265^3266^3267^3268^3441^3442^3443^3444^3445^3446^3447^3448^3449^3450^3451^3452^3453^3454^3631^3632^3633^3634^3635^3636^3637^3638^3639^3640^3641^3642^3643^3825^3826^3827^3828^3829^3830^3831^3832^3833^3834^3835^3836^3837^4022^4023^4024^4025^4026^4027^4028^4029^4030^4031^4032^4033^4034^4035^4224^4225^4226^4227^4228^4229^4230^4231^4232^4233^4234^4235^4236^4429^4430^4431^4432^4433^4434^4435^4436^4437^4438^4439^4440^4441^4636^4637^4638^4639^4640^4641^4642^4643^4644^4645^4646^4647^4648^4851^4852^4853^4854^4855^4856^4857^4858^4859^4860^5065^5066^5067^5068^5069^5070^5071^5072^5073^5074

erykoff commented 3 years ago

Oh, and one could also argue that for 4850 we should exclude all the patches 4, and 5, and 6,*.

wmwv commented 3 years ago

Meanwhile, I noticed that two tract parquet catalogs that are nominally in the coverage are not on nersc. These are 4428 and 2909. These are at the hairy edge and could probably be excluded, but we want to do so explicitly:

I agree. We should definitely exclude 2909 and ~4428~ 4488. (Which I understand is your proposal.)

erykoff commented 3 years ago

4428 not 4488, and yes, that is my proposal.

wmwv commented 3 years ago

If we exclude patches, I propose a "clean" cut excluding 4850: 4,3^5,3^6,3^4,4^5,4^6,4^4,5^5,5^6,5^4,6^5,6^6,6 .

I agree that we should exclude patches from 4850, but otherwise include it. And I agree that this simple list of 9 patches to exclude is good.

heather999 commented 3 years ago

Sounds good - I'm just waiting on the u-band multiband data to finish coming over and then I can create a copy of the pixel data excluding the identified tracts: 2909, 4428 ~4488~, 4848, 4849, 5062, 5063, 5064
and for 4850 excluding: 4,3^5,3^6,3^4,4^5,4^6,4^4,5^5,5^6,5^4,6^5,6^6,6

heather999 commented 3 years ago

The transfer from CC is finished. I have shut down the nightly transfer from CC. There is a new copy of the object and metacal catalogs labeled run2.2i-wfd-dr6d: /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/dpdd/run2.2i-wfd-dr6d I have not removed the extra tracts - this is the full set from CC as of today. I'll leave it to @wmwv to deal with these catalogs as he sees fit. I'm guessing for the 4850 catalog we would regenerate that once the 9 12 patches are excluded. I'll work on backing up the pixel level data and then removing the tracts and patches from 4850 in preparation for a WFD DR6 release.

heather999 commented 3 years ago

I've made a copy of the pixel data directories within the same area /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/rerun, naming them run2.2i-coadd-wfd-dr6d, run2.2i-coadd-wfd-dr6d-u, run2.2i-wfd-coadd-dr6d-grizy and removed the suggested tracts and for tract 4850, those 12 patches. I've put copies of the repositoryCfs.yaml files onto CSCRATCH so we can continue to trick the butler and avoid issues if running in batch at NERSC. I think this is ready to go.

heather999 commented 3 years ago

After discussing this at today's DESC DM meeting - I have done the following:

Made sure that the full tracts we are excluding are not included in the object or metacal catalog areas for DR6d.

For tract 4850: reran make_object_catalog, using the DR6d rerun area where those 12 patches in tract 4850 are excluded. Then merged the parquet files to produce the DR6d 4850 object catalog

python DC2-production-0.4.0/scripts/make_object_catalog.py /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/rerun/run2.2i-coadd-wfd-dr6d 4850 -o $PWD/output
python DC2-production-0.4.0/scripts/merge_parquet_files.py $PWD/output/object_4850_*.parquet -o $PWD/output/object_tract_4850.parquet --assume-consistent-schema

Using the existing metacal tract/patch files for 4850, removing the files for those 12 patches, merged the resulting parquet files to produce a new 4850 metacal parquet file
```
python DC2-production-0.4.0/scripts/merge_parquet_files.py /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/dpdd/run2.2i-wfd-dr6d/metacal_table_summary/4850/metacal_4850*.parquet -o /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/dpdd/run2.2i-wfd-dr6d/metacal_table_summary/4850/metacal_tract_4850.parquet
```
The versions of the software used should be identical to what is in use at CC and is documented in the IPP repo. This includes LSSTDESC/DC2-production 0.4.0.

I'm going to copy these catalogs to the shared area at NERSC and start up a GitHub issue on GCRCatalogs to see about getting these included.

One missing piece is creating the dc2_object_run2.2i_dr6d parquet files, so I'll see about learning how to do that. This Slack message may offer a clue.

heather999 commented 3 years ago

Naming discussion ongoing in https://github.com/LSSTDESC/gcr-catalogs/pull/483 Settling on dr6-wfd-v1, I've set aside the CC rsync rerun areas and renamed what were the dr6d rerun directories to:

run2.2i-coadd-wfd-dr6-v1
run2.2i-coadd-wfd-dr6-v1-u
run2.2i-coadd-wfd-dr6-v1-grizy

yymao commented 3 years ago

Sorry for coming late to the party but I feel this naming scheme is a bit confusing -- is run2.2i-coadd-wfd-dr6-v1-u a subset of run2.2i-coadd-wfd-dr6-v1 or an independent data set?

heather999 commented 3 years ago

Sorry, yes it is confusing run2.2i-coadd-wfd-dr6-v1 is the full datset composed of 2 subsets which are stored in two other directories (run2.2i-coadd-wfd-dr6-v1-u and run2.2i-coadd-wfd-dr6-v1-grizy) due to the need for different configuration parameters during the processing. Users would only need to be aware of run2.2i-coadd-wfd-dr6-v1. so we're really only talking about one dataset.

yymao commented 3 years ago

Thanks for the clarification. That's fine as long as it's documented. I still wonder if we should alias run2.2i-coadd-wfd-dr6 to run2.2i-coadd-wfd-dr6-v1 in GCRCatalogs

wmwv commented 3 years ago

@heather999 I've run write_gcr_to_parquet.py for dc2_object_run2.2i_dr6_wfd_v1 using

DC2-production commit 2882ade gcr-catalogs commit c93c01b (the PR for issues/482, https://github.com/LSSTDESC/gcr-catalogs/pull/483)

And copied the results to

/global/cfs/projectdirs/lsst/production/DC2_ImSim/Run2.2i/dpdd/run2.2i-wfd-dr6d/dc2_object_run2.2i_dr6d

wmwv commented 3 years ago

RA, Dec Coverage looks correct

heather999 commented 3 years ago

@wmwv Just catching up on this. I think I should copy the contents of /global/cfs/projectdirs/lsst/production/DC2_ImSim/Run2.2i/dpdd/run2.2i-wfd-dr6d/dc2_object_run2.2i_dr6d to /global/cfs/cdirs/lsst/shared/DC2-prod/Run2.2i/dpdd/Run2.2i-dr6-wfd-v1/dc2_object_run2.2i_dr6_wfd_v1 - just want to make sure. Actually, I'll go ahead and do that - I can remove it if that's an error.

I'd like to target Monday morning to officially announce DR6 WFD. What pieces are we missing @JoanneBogart @yymao ? I made a start at updating the Confluence page. We probably should include a reference about the DPDD translation of the object catalogs on Confluence as well to help capture the discussion on Slack?

yymao commented 3 years ago

We can include a link to the translate link here and make it clear that if one uses GCR or the "DPDD" parquet files (that Michael generated) then they are using the "translated" version.

We should certainly avoid reproduce the translation code on confluence since we'll certainly forget to update it in the future.

heather999 commented 3 years ago

We seem to be finished :)

LSSTDESC / DC2-production

Run2.2i DR6-WFD release #397