Closed jchiang87 closed 3 years ago
Starting a to do list and adding some questions.
To define the tracts/patches in the DDF region, can we re-use Eli's definition for Run2.2i DR6?
Can this down-selection occur before ingest and/or producing the calexps for Run3.1i?
Do we want to handle down selection by implementing a filter as part of the processing workflow?
[x] Document visit down-selection process for Run3.1i DDF
[x] Ingest down-selected Run3.1i Y1 with DMstack v19
[x] Run processCcd for Run3.1i Y1 to produce calexps
[x] Ingest Run2.2i Y1 DR2 sensor-visits, skip existing raw files
[x] Reuse Run2.2i Y1 calexps by symlinking them to the new DR2 repo
[ ] Run sky correction to generate skyCorr data products for all data in the DR2 repo
[x] Symlink Run2.2i skyCorrection.py
data products to the DR2 repo
[ ] Symlink the existing Run2.2i Y1 warps, avoiding any that include Run3.1i data
[x] Copy existing Run2.2i Y1 warps.
[x] Move aside Run2.2i Y1 warps that include DDF sensor-visits.
[ ] Make needed DR2 warps.
[ ] Process DR2 coadds/multiband
[ ] Utilize coadd/multiband processing outputs to produce DR2 dpdd catalogs
[ ] Ingest Run3.1i Y2 sensor-visits
[ ] processCcd for Run3.1i Y2 calexps
[ ] make Run3.1i Y2 warps
[ ] DR3 Run3.1i DDF region coadd/multiband and dpdd catalog proccessing
I don't think we need Eli's definitions. We just replace the Run2.2i raw files with the Run3.1i versions for the cases where they both exist. The Run3.1i sensor-visits were already selected to just cover the DDF.
Given that we're mixing Run2.2i and Run3.1i data, I think it would easiest to make a new repo and ingest the Run3.1i raw files first, then if there's an option to ingest the Run2.2i files into the same repo where it skips existing raw files, we'd get the dataset we want.
We'd have a single visit list and ingest the raw files for each visit. The processing pipeline wouldn't need to any special filtering since it would just process the data in the repo.
Ok that helped, updated the above to do list to hopefully more accurately capture the steps.
It turns out the visit selection is rather easy: the minion_1016 opsim db identifies visits associated with the WFD and DDF observations by propID
. For the Y1 DC2 visits, here is a plot of the pointing directions for the WFD (propID==54
) and DDF (propID==56
) visits:
So we just need to select
propID==54
from minion_1016 to identify the desired visits. I'll document this at the wiki.
Here are visit depth maps for DC2 Y1 and Y1+Y2 using propID=54
visits. The DDF is indicated in the upper right corner.
Copying from Slack: https://lsstc.slack.com/archives/C978LTJGN/p1598896345006200 New DR2 repo containing the Y1 Run3.1i raw files for the visits with propID==54 /global/cscratch1/sd/descdm/DC2/DR2/repo
Let's move the Run3.1i Y2 items until after the DR2 coadd/multiband processing is done so that we don't need any special handling of the data in the DDF for producing DR2.
I've run processCcd.py
on the Y1 Run3.1 raw files using the shifter image lsstdesc/desc-drp-stack:v19-dc2-run2.2-v5
. I compared the as-run configs against ones for the CC-IN2P3 processing of the Run2.2i data, and they are identical.
Out of 3427 sensor-visits, there were 41 processing failures. 39 of those sensor-visits have no raw file counterpart in the Run2.2i data. Looking at one of them, it appears to have been generated without an initial checkpoint file, which is consistent with these sensor-visits not having been generated for Run2.2i. I think it's safe to ignore these 39.
The two remaining sensor-visits failed with error messages:
lsst_a_193861_R14_S12_r.fits
:
TaskError: Fit failed: median scatter on sky = 14.279 arcsec > 10.000 config.maxScatterArcsec
lsst_a_203610_R30_S01_i.fits
:
RuntimeError: No matches to use for photocal
Since it's just these two out of 3388, I'm not inclined to try follow-up on these, so we should neglect them as well.
To compare the visit-level results for the Run3.1i and Run2.2i versions, I ran the single frame processing validation script in sims_ci_pipe
on the DR2/Run3.1i outputs and on the same visits for Run2.2i. The results look consistent with each other for the photometric and astrometric accuracy and for the PSF size and m5 values. (I'll post a plot in the DR2 wiki entry.)
Based on those results, I think we're good to go for adding the Run2.2i visits to the DR2 data repo registry and sym-linking the CC-IN2P3-generated processCcd.py
data products. We will just need to omit the data products for lsst_a_193861_R14_S12_r.fits
and lsst_a_203610_R30_S01_i.fits
.
I'll added some items related to sky correction in the task list above.
Just out of curiosity - were these processCcd runs on Haswell or KNL and are there some average run times?
On Haswell. Most runtimes are between 2 and 3 minutes. The logs can be grepped:
grep ^real /global/cscratch1/sd/descdm/DC2/DR2/logging/processCcd*.log
Concerning the Run2.2i Y1 warps, I have a list of all expected warps (though some were removed at CC). Working on transferring the existing ones over to NERSC and I hope that will be completed in the next day or so.
The Run2.2i processCcd.py
outputs have been symlinked from
/global/cscratch1/sd/descdm/DC2/Run2.2i-parsl/v19.0.0-v1/rerun/run2.2i-calexp-v1-copy
into the DR2 repo.
I've run skyCorrection.py
on a visit in each band that contains Run3.1i data and differenced the resultng images with the coresponding Run2.2i skyCorr
images and found that they differ in pixel values by less than 0.2 ADU at most, with the mean and median pixel values of the differenced images typically much less than 0.02 ADU. Here are histograms showing the distribution of minimum, maximum, mean, and median values of the pixels in those per-CCD difference images:
Since the Run3.1i data just have a relatively small number of point sources added versus the Run2.2i versions, we'd expect the Run3.1i skyCorr
image to be essentially the same as for Run2.2i. So for DR2, we should simply use the Run2.2i skyCorr
data.
The skyMap has been symlinked from:
/global/cscratch1/sd/descdm/DC2/Run2.2i-parsl/v19.0.0-v1/rerun/run2.2i-calexp-v1-copy/deepCoadd
to
/global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-calexp/deepCoadd
I needed the skyMap to be able to run tract2visit_mapper.py
to produce the tract2visit sqlite3 database as discussed at this week's DESC DM meeting. The tract2visit_mapper is going now and I'll add the sqlite db to the /global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-calexp/
directory when it's finished.
Slack discussions concerning configuration parameters and handling the pre-existing warps in the workflow.
As discussed on Slack, there are 3 visits which are not in the simulated Run2.2i data and include just a couple of sensors in Run3.1i. These are:
find . -xtype l
./00191341-z
./00183810-g
./00207760-y
Removing these symlinks in /global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-calexp/
.
Ran ImageProcessingPipelines/python/util/tract2visit_mapper.py
from the dr2/run2.2
branch to produce the tracts_mapping.sqlite3 DB that will be used for the coadd processing. The input visit list was constructed using the list of visits in the dr2-calexp/calexp directory. The resulting sqlite3 DB contains 3018 visits, which matches the number of calexp visits in /global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-calexp/
The sqlite3 file has been copied to this directory as well as to /global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-coadd
which is exactly what CC does for its processing.
I moved the copy of CC's Run2.2i warps into /global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-coadd/deepCoadd
Due to the magic of Globus, these files are owned by desc
but I set the ACLs to allow descdm
full access to all the files/directories - just let me know if you see any problems.
There was already a /global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-coadd/deepCoadd
directory that I moved aside and renamed /global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-coadd/deepCoadd-testing
- not that we need it, but it did contain what looked like a test warp for a visit for a particular patch in tract 5064 i-band
As I noted in slack, we can just delete that warp data for tract 5064.
Certain initial conditions prior to beginning the Parsl-based DR2 processing.
Run account | descdm |
Butler (Gen2) repo | /global/cscratch1/sd/descdm/DC2/DR2/repo |
/rerun naming | dr2-{calexp,coadd,multiband,metadata} |
Workflow code | /global/cscratch1/sd/descdm/ParslRun/ImageProcessingPipelines |
git repo | https://github.com/LSSTDESC/ImageProcessingPipelines/tree/dc2/run2.1 |
Run directory | /global/cscratch1/sd/descdm/ParslRun/dr2 |
Within the Butler repo: | ||
---|---|---|
# participating visits (Y1 WFD) | 3018 | |
# of pre-existing warp*.fits files | 448,966 | |
Space occupied by pre-existing warp files | 73 TB |
State of $SCRATCH space:
FILESYSTEM SPACE_USED SPACE_QUOTA SPACE_PCT INODE_USED INODE_QUOTA INODE_PCT
cscratch1 5.33TiB 250.00TiB 2.1% 7.74M 20.00M 38.7% ```
Saturday morning report.
An initial test of the Parsl DR2 workflow, processing tract 5063 (in the DDF region), started last night and continues to run but has already revealed a number of failures.
makeCoaddTempExp.py /global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-calexp --output /global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-coadd --id tract=5063 patch=3,1 filter=y --selectId visit=188998 --configfile /opt/lsst/software/stack/obs_lsst/config//makeCoaddTempExp.py --calib /global/cscratch1/sd/descdm/DC2/DR2/repo/CALIB
which generated this error,
makeCoaddTempExp FATAL: Failed on dataId=DataId(initialdata={'tract': 5063, 'patch': '3,1', 'filter': 'y'}, tag=set()): NoResults: No locations for get: datasetType:skyCorr dataId:DataId(initialdata={'visit': 188998, 'filter': 'y', 'raftName': 'R22', 'detectorName': 'S02', 'detector': 92, 'tract': 5063}, tag=set())
Ref: /global/cscratch1/sd/descdm/ParslRun/dr2/runinfo/000/dm-logs/coadd_for_tract_5063_patch_4-1_filter_y-visit-188998.{stdout,stderr}
Error list for makeCoaddTempExp: tract_5063_patch_3-1_filter_y-visit-188998 tract_5063_patch_3-2_filter_y-visit-188998 tract_5063_patch_4-1_filter_y-visit-188998 tract_5063_patch_4-2_filter_y-visit-188998 tract_5063_patch_6-0_filter_i-visit-196476
An example:
forcedPhotCoadd.py /global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-multiband --output /global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-multiband --id tract=5063 patch=6,6 filter=u --configfile /opt/lsst/software/stack/obs_lsst/config//forcedPhotCoadd.py
which generated this error:
forcedPhotCoadd FATAL: Failed on dataId=DataId(initialdata={'tract': 5063, 'patch': '6,6', 'filter': 'u'}, tag=set()): NoResults: No locations for get: datasetType:deepCoadd_meas dataId:DataId(initialdata={'tract': 5063, 'patch': '6,6', 'filter': 'u'}, tag=set())
Ref: /global/cscratch1/sd/descdm/ParslRun/dr2/runinfo/000/dm-logs/multiband_for_tract_5063_patch_6-6-filter-u-forced_phot_coadd.{stdout,stderr}
Just looking at the first error message:
makeCoaddTempExp FATAL: Failed on dataId=DataId(initialdata={'tract': 5063, 'patch': '3,1', 'filter': 'y'}, tag=set()): NoResults: No locations for get: datasetType:skyCorr dataId:DataId(initialdata={'visit': 188998, 'filter': 'y', 'raftName': 'R22', 'detectorName': 'S02', 'detector': 92, 'tract': 5063}, tag=set())
visit 188998 is one of the visits from Run3.1i but was also in Run2.2i, but there is no skyCorr data for R22 S02:
/global/cscratch1/sd/descdm/DC2/DR2/repo/rerun/dr2-calexp/skyCorr> ls 00188998-y/R22/
skyCorr_00188998-y-R22-S00-det090.fits skyCorr_00188998-y-R22-S12-det095.fits
skyCorr_00188998-y-R22-S01-det091.fits skyCorr_00188998-y-R22-S20-det096.fits
skyCorr_00188998-y-R22-S10-det093.fits skyCorr_00188998-y-R22-S21-det097.fits
skyCorr_00188998-y-R22-S11-det094.fits skyCorr_00188998-y-R22-S22-det098.fits
so we may need to go back and check which visits now from Run3.1 have raft/sensor combinations that need specific skyCorr data generated
Checking for all of the Run3.1i calexps, that skyCorr file, i.e., the one for 00188998-y-R22-S02
, is the only one that's missing. However, when I run skyCorrection.py
for that sensor-visit, it fails with
skyCorr FATAL: Failed on dataId={'visit': 188998, 'raftName': 'R22', 'detectorName': 'S02', 'filter': 'y', 'detector': 92}: InvalidParameterError:
File "src/math/LeastSquares.cc", line 421, in void lsst::afw::math::LeastSquares::_factor(bool)
Number of columns of design matrix (1) must be smaller than number of data points (0) {0}
lsst::pex::exceptions::InvalidParameterError: 'Number of columns of design matrix (1) must be smaller than number of data points (0)'
Traceback (most recent call last):
File "/opt/lsst/software/stack/stack/miniconda3-4.7.10-4d7b902/Linux64/pipe_base/19.0.0/python/lsst/pipe/base/cmdLineTask.py", line 388, in __call__
result = self.runTask(task, dataRef, kwargs)
File "/opt/lsst/software/stack/stack/miniconda3-4.7.10-4d7b902/Linux64/pipe_base/19.0.0/python/lsst/pipe/base/cmdLineTask.py", line 447, in runTask
return task.runDataRef(dataRef, **kwargs)
File "/opt/lsst/software/stack/stack/miniconda3-4.7.10-4d7b902/Linux64/pipe_drivers/19.0.0+2/python/lsst/pipe/drivers/skyCorrection.py", line 229, in runDataRef
scale = self.sky.solveScales(measScales)
File "/opt/lsst/software/stack/stack/miniconda3-4.7.10-4d7b902/Linux64/pipe_drivers/19.0.0+2/python/lsst/pipe/drivers/background.py", line 346, in solveScales
return solve(mask)
File "/opt/lsst/software/stack/stack/miniconda3-4.7.10-4d7b902/Linux64/pipe_drivers/19.0.0+2/python/lsst/pipe/drivers/background.py", line 334, in solve
afwMath.LeastSquares.DIRECT_SVD).getSolution()
lsst.pex.exceptions.wrappers.InvalidParameterError:
File "src/math/LeastSquares.cc", line 421, in void lsst::afw::math::LeastSquares::_factor(bool)
Number of columns of design matrix (1) must be smaller than number of data points (0) {0}
lsst::pex::exceptions::InvalidParameterError: 'Number of columns of design matrix (1) must be smaller than number of data points (0)'
which would be consistent with the skyCorr
file being missing in the Run2.2i data. There are probably other missing skyCorr
files in the Run2.2i data like this, so I think we'll have to deal with these missing skyCorr
files in a similar fashion as the SRS pipeline does.
I was trying to see how the SRS pipeline deals with this. What I do see is this Slack conversation. and more recently this one which indicates Johann's implementation. Here's the link to the code in IPP's setup_coaddDriver.
This has been discussed several times, in several channels, including in #desc-dc2-workflows : https://lsstc.slack.com/archives/CFL9N02MR/p1593503722274800
In finishing off the last patch that failed in the multiband processing for DR2, Tom encountered the following error:
measureCoaddSources.propagateFlags INFO: Propagating flags dict_keys(['calib_psf_candidate', 'calib_psf_used', 'calib_psf_reserved', 'calib_astrometry_used', 'calib_photometry_used', 'calib_photometry_reserved']) from inputs
measureCoaddSources FATAL: Failed on dataId=DataId(initialdata={'tract': 5062, 'patch': '0,2', 'filter': 'u'}, tag=set()):
NoResults: No locations for get: datasetType:src dataId:DataId(initialdata={'visit': 217577, 'detector': 132}, tag=set())
Tue Nov 17 15:53:40 PST 2020 wrap-shifter: executable finished with return code 1
Here the measureCoaddSources.py
task is looking for flags from the src
catalog outputs from the single frame processing (sfp) of visit 217577, detector 132 (R31_S20), but that file is not present in the DR2 repo. This sensor-visit is in one of the rafts that straddle the DDF boundary in that visit, so the sfp output folders for that raft would contain data products for both Run3.1i and Run2.2i sensor-visits. The Run3.1i data were ingested into the DR2 directly, sfp was run on those data, and so the sfp outputs have physical locations in the DR2 repo. For Run2.2i data, rather than ingesting everything from scratch and re-doing all the processing, we planned to symlink the existing sfp data products into the desired folders. This script was run to make those symlinks.
Unfortunately, after looking at other raft-visits that straddle DDF boundary, I found a number of other cases where there are missing Run2.2i sensors. Here is a plot of all of the missing sensor-visits (shown in red) for raft-visits that overlap with the DDF:
The two missing sensor-visits within the DDF boundary are the ones I noted here and so are expected to be missing.
The reason that those other missing sensor-visits didn't trigger the same error as the one noted above is that the warps for those patches were based on the data in the DR2 repo, so the warps (and associated coadds) didn't expect to find those sensor-visits. The coadd that triggered the error included a pre-existing warp file that was copied over from CC-IN2P3, where that sensor-visit was present.
The next step is to make those missing symlinks, regenerate the warps for the affected patches, and then redo the coadd and multiband processing for those patches.
I've attached a file with the list of the 137 affected patches.
We have plans to produce a Data Release 2 (DR2) covering the entire 300 square degrees of DC2 using the first year (Y1) of data. I think the nominal plan is to use the Run2.2i data, and process all patches as-is, including those in the DDF at full depth.
After discussing some of the needs of the strong lensing (SL) studies with @jiwoncpark, the option arose of combining the Run2.2i and Run3.1i data for this release. In the DDF, we'd use the Run3.1i sensor-visits instead of the Run2.2i versions, and furthermore, we'd down-select the DC2 visits that overlap with the DDF so that we obtain a uniform WFD-like cadence across the entire DC2 region. The SL group could then use the coadd/mulitband results in the DDF for machine learning studies to find strongly lensed systems in WFD regions. I don't think this change would adversely affect the usefulness of DR2 for other Science Working Groups, but they should comment here if this poses problems.
In addition, since SL would like at least 2-year depth at a WFD cadence, I propose that we do a DR3 just for the Run3.1i DDF data, i.e., make warps for Y2 Run3.1i data and combine them with the Y1 data to have two year depth coadds and multiband results in the DDF.
We can itemize the various to-do steps in this issue, but I'd like to document the work, e.g., visit down-selection, etc.. in the DC2-production wiki.