Closed heather999 closed 3 years ago
The ingest is completed and the raw
directory has been populated in /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/raw/
which has symlinks to the raw data at NERSC. The number of full visits matches what was generated at CC: 31488
Checking the number of sensor visits by year, there is a mismatch in Y2 between what I have at NERSC and what was ingested at CC. The numbers below were obtained by doing a select in SQLITE3 using the numbers from https://confluence.slac.stanford.edu/display/LSSTDESC/DC2+Run2.2i
Y1 644938 select COUNT(id) from raw WHERE visit<=262622
Y2 648745 (CC) vs 651930 (NERSC) select count(id) from raw WHERE visit>262622 and visit<=497120
Y3 755205 select COUNT(id) from raw WHERE visit>497120 and visit<=741024
Y4 743603 select COUNT(id) from raw WHERE visit>741024 and visit<=991360
Y5 860171 select COUNT(id) from raw WHERE visit>991360 and visit<=1235370
Everything matches except Y2.. and the only thing that comes to mind is the "hole" described in https://github.com/LSSTDESC/DC2-production/issues/387 but those visits were simulated and added in later. I suspect what was ingested at NERSC is a superset of what was ingested at CC, but I'm not sure why there is this difference.
The repo at NERSC will continue to use the registry.sqlite3 file generated at CC, though we also now have a NERSC generated registry.sqlite3 which will be saved on the side.
There are 3185 sensor visits found at NERSC in Y2 that are not ingested at CC. Just for the record, I have a list of those visits, detector, filter which after spot checking at CC, really do not seem to be in the ingest. The list is available at NERSC: /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/sensorvisits_missing_from_cc_ingest
they have been transferred to CC, so ingestion stage at CC likely forgot about these for some reason.
Now that the NERSC ingest is done - I'm ready to close this issue. Are we content to just ignore these missed visits? Or plan to include them in some upcoming reprocessing?
I call them lost, as far as i am concerned
An ingest of all DR6 raw files has started at NERSC to provide a full DM butler repo including the
raw
area. For Y4 & Y5, the list of "needed" visits was obtained using the list Jim created as referenced in issue: https://github.com/LSSTDESC/DC2-production/issues/390