Closed heather999 closed 3 years ago
As I promised on the Wednesday telecon, I have looked into the exposure time map to propose a region that should be the "DR6 WFD" footprint, avoiding the DDF fields which will only complicate analysis; these deep coadds can be in a separate catalog to make it clear that this is a fundamentally different dataset.
Here is what the i-band exptime map looks like as of the rsynced files the other day:
The scale is cut at 4000 because otherwise we're looking at the upper right corner blowing everything out. Zooming in, we get:
Note that tracts 4848, 4849, 5062, 5063, 5064 are not in the map since they are not done processing. I propose first of all that we exclude those.
The tricker question is 4850. Exclude the whole tract or just the dense patches? (We have to be careful then). These are the patches:
If we exclude patches, I propose a "clean" cut excluding 4850: 4,3^5,3^6,3^4,4^5,4^6,4^4,5^5,5^6,5^4,6^5,6^6,6 .
Meanwhile, I noticed that two tract parquet catalogs that are nominally in the coverage are not on nersc. These are 4428 and 2909. These are at the hairy edge and could probably be excluded, but we want to do so explicitly:
This leaves the following "full" tracts in the proposed DR6 footprint, with the question of what to do with 4850:
2723^2724^2725^2726^2727^2728^2729^2730^2731^2732^2733^2734^2735^2896^2897^2898^2899^2900^2901^2902^2903^2904^2905^2906^2907^2908^3074^3075^3076^3077^3078^3079^3080^3081^3082^3083^3084^3085^3086^3256^3257^3258^3259^3260^3261^3262^3263^3264^3265^3266^3267^3268^3441^3442^3443^3444^3445^3446^3447^3448^3449^3450^3451^3452^3453^3454^3631^3632^3633^3634^3635^3636^3637^3638^3639^3640^3641^3642^3643^3825^3826^3827^3828^3829^3830^3831^3832^3833^3834^3835^3836^3837^4022^4023^4024^4025^4026^4027^4028^4029^4030^4031^4032^4033^4034^4035^4224^4225^4226^4227^4228^4229^4230^4231^4232^4233^4234^4235^4236^4429^4430^4431^4432^4433^4434^4435^4436^4437^4438^4439^4440^4441^4636^4637^4638^4639^4640^4641^4642^4643^4644^4645^4646^4647^4648^4851^4852^4853^4854^4855^4856^4857^4858^4859^4860^5065^5066^5067^5068^5069^5070^5071^5072^5073^5074
Oh, and one could also argue that for 4850 we should exclude all the patches 4, and 5, and 6,*.
Meanwhile, I noticed that two tract parquet catalogs that are nominally in the coverage are not on nersc. These are 4428 and 2909. These are at the hairy edge and could probably be excluded, but we want to do so explicitly:
I agree. We should definitely exclude 2909 and ~4428~ 4488. (Which I understand is your proposal.)
4428 not 4488, and yes, that is my proposal.
If we exclude patches, I propose a "clean" cut excluding 4850: 4,3^5,3^6,3^4,4^5,4^6,4^4,5^5,5^6,5^4,6^5,6^6,6 .
I agree that we should exclude patches from 4850, but otherwise include it. And I agree that this simple list of 9 patches to exclude is good.
Sounds good - I'm just waiting on the u-band multiband data to finish coming over and then I can create a copy of the pixel data excluding the identified tracts: 2909, 4428 ~4488~, 4848, 4849, 5062, 5063, 5064
and for 4850 excluding: 4,3^5,3^6,3^4,4^5,4^6,4^4,5^5,5^6,5^4,6^5,6^6,6
The transfer from CC is finished. I have shut down the nightly transfer from CC. There is a new copy of the object and metacal catalogs labeled run2.2i-wfd-dr6d: /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/dpdd/run2.2i-wfd-dr6d
I have not removed the extra tracts - this is the full set from CC as of today. I'll leave it to @wmwv to deal with these catalogs as he sees fit. I'm guessing for the 4850 catalog we would regenerate that once the 9 12 patches are excluded.
I'll work on backing up the pixel level data and then removing the tracts and patches from 4850 in preparation for a WFD DR6 release.
I've made a copy of the pixel data directories within the same area /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/rerun
, naming them run2.2i-coadd-wfd-dr6d
, run2.2i-coadd-wfd-dr6d-u
, run2.2i-wfd-coadd-dr6d-grizy
and removed the suggested tracts and for tract 4850, those 12 patches. I've put copies of the repositoryCfs.yaml
files onto CSCRATCH so we can continue to trick the butler and avoid issues if running in batch at NERSC. I think this is ready to go.
After discussing this at today's DESC DM meeting - I have done the following:
python DC2-production-0.4.0/scripts/make_object_catalog.py /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/rerun/run2.2i-coadd-wfd-dr6d 4850 -o $PWD/output
python DC2-production-0.4.0/scripts/merge_parquet_files.py $PWD/output/object_4850_*.parquet -o $PWD/output/object_tract_4850.parquet --assume-consistent-schema
python DC2-production-0.4.0/scripts/merge_parquet_files.py /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/dpdd/run2.2i-wfd-dr6d/metacal_table_summary/4850/metacal_4850*.parquet -o /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/dpdd/run2.2i-wfd-dr6d/metacal_table_summary/4850/metacal_tract_4850.parquet
The versions of the software used should be identical to what is in use at CC and is documented in the IPP repo. This includes LSSTDESC/DC2-production 0.4.0.
I'm going to copy these catalogs to the shared area at NERSC and start up a GitHub issue on GCRCatalogs to see about getting these included.
One missing piece is creating the dc2_object_run2.2i_dr6d parquet files, so I'll see about learning how to do that. This Slack message may offer a clue.
Naming discussion ongoing in https://github.com/LSSTDESC/gcr-catalogs/pull/483 Settling on dr6-wfd-v1, I've set aside the CC rsync rerun areas and renamed what were the dr6d rerun directories to:
run2.2i-coadd-wfd-dr6-v1
run2.2i-coadd-wfd-dr6-v1-u
run2.2i-coadd-wfd-dr6-v1-grizy
Sorry for coming late to the party but I feel this naming scheme is a bit confusing -- is run2.2i-coadd-wfd-dr6-v1-u
a subset of run2.2i-coadd-wfd-dr6-v1
or an independent data set?
Sorry, yes it is confusing run2.2i-coadd-wfd-dr6-v1 is the full datset composed of 2 subsets which are stored in two other directories (run2.2i-coadd-wfd-dr6-v1-u and run2.2i-coadd-wfd-dr6-v1-grizy) due to the need for different configuration parameters during the processing. Users would only need to be aware of run2.2i-coadd-wfd-dr6-v1. so we're really only talking about one dataset.
Thanks for the clarification. That's fine as long as it's documented. I still wonder if we should alias run2.2i-coadd-wfd-dr6
to run2.2i-coadd-wfd-dr6-v1
in GCRCatalogs
@heather999 I've run write_gcr_to_parquet.py
for dc2_object_run2.2i_dr6_wfd_v1
using
DC2-production
commit 2882ade
gcr-catalogs
commit c93c01b
(the PR for issues/482, https://github.com/LSSTDESC/gcr-catalogs/pull/483)
And copied the results to
/global/cfs/projectdirs/lsst/production/DC2_ImSim/Run2.2i/dpdd/run2.2i-wfd-dr6d/dc2_object_run2.2i_dr6d
RA, Dec Coverage looks correct
@wmwv Just catching up on this. I think I should copy the contents of /global/cfs/projectdirs/lsst/production/DC2_ImSim/Run2.2i/dpdd/run2.2i-wfd-dr6d/dc2_object_run2.2i_dr6d
to /global/cfs/cdirs/lsst/shared/DC2-prod/Run2.2i/dpdd/Run2.2i-dr6-wfd-v1/dc2_object_run2.2i_dr6_wfd_v1
- just want to make sure. Actually, I'll go ahead and do that - I can remove it if that's an error.
I'd like to target Monday morning to officially announce DR6 WFD. What pieces are we missing @JoanneBogart @yymao ? I made a start at updating the Confluence page. We probably should include a reference about the DPDD translation of the object catalogs on Confluence as well to help capture the discussion on Slack?
We can include a link to the translate link here and make it clear that if one uses GCR or the "DPDD" parquet files (that Michael generated) then they are using the "translated" version.
We should certainly avoid reproduce the translation code on confluence since we'll certainly forget to update it in the future.
We seem to be finished :)
This issue is to capture the discussion concerning preparation and validation of DR6 and its precursor releases such as DR6c, DR6d.
To Do