LSSTDESC / DC2-production

Configuration, production, validation specifications and tools for the DC2 Data Set.
BSD 3-Clause "New" or "Revised" License
11 stars 7 forks source link

Ingest Run2.2i DR6 data at NERSC #410

Closed heather999 closed 3 years ago

heather999 commented 3 years ago

An ingest of all DR6 raw files has started at NERSC to provide a full DM butler repo including the raw area. For Y4 & Y5, the list of "needed" visits was obtained using the list Jim created as referenced in issue: https://github.com/LSSTDESC/DC2-production/issues/390

heather999 commented 3 years ago

The ingest is completed and the raw directory has been populated in /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/raw/ which has symlinks to the raw data at NERSC. The number of full visits matches what was generated at CC: 31488 Checking the number of sensor visits by year, there is a mismatch in Y2 between what I have at NERSC and what was ingested at CC. The numbers below were obtained by doing a select in SQLITE3 using the numbers from https://confluence.slac.stanford.edu/display/LSSTDESC/DC2+Run2.2i Y1 644938 select COUNT(id) from raw WHERE visit<=262622 Y2 648745 (CC) vs 651930 (NERSC) select count(id) from raw WHERE visit>262622 and visit<=497120 Y3 755205 select COUNT(id) from raw WHERE visit>497120 and visit<=741024 Y4 743603 select COUNT(id) from raw WHERE visit>741024 and visit<=991360 Y5 860171 select COUNT(id) from raw WHERE visit>991360 and visit<=1235370

Everything matches except Y2.. and the only thing that comes to mind is the "hole" described in https://github.com/LSSTDESC/DC2-production/issues/387 but those visits were simulated and added in later. I suspect what was ingested at NERSC is a superset of what was ingested at CC, but I'm not sure why there is this difference.

The repo at NERSC will continue to use the registry.sqlite3 file generated at CC, though we also now have a NERSC generated registry.sqlite3 which will be saved on the side.

heather999 commented 3 years ago

There are 3185 sensor visits found at NERSC in Y2 that are not ingested at CC. Just for the record, I have a list of those visits, detector, filter which after spot checking at CC, really do not seem to be in the ingest. The list is available at NERSC: /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/sensorvisits_missing_from_cc_ingest

johannct commented 3 years ago

they have been transferred to CC, so ingestion stage at CC likely forgot about these for some reason.

heather999 commented 3 years ago

Now that the NERSC ingest is done - I'm ready to close this issue. Are we content to just ignore these missed visits? Or plan to include them in some upcoming reprocessing?

johannct commented 3 years ago

I call them lost, as far as i am concerned