Closed wmwv closed 5 years ago
I agree that we need to do this at some point but I would like to have DM's opinion (maybe @TallJimbo) on the status of the code. When I tried it on CFHT ~1 year ago it was not yet really usable.
Trying forcedPhotCcd.py . --rerun coadd-v4:forced-test --id visit=179996 filter=u^g^r^i^z^y
based on the 1.2p setup (w39 etc...) I get crashes with the following message :
forcedPhotCcd FATAL: Failed on dataId=DataId(initialdata={'visit': 179996, 'filter': 'u', 'raftName': 'R01', 'detectorName': 'S10', 'detector': 3, 'tract': 5065}, tag=set()): FatalAlgorithmError: CModel forced measurement currently requires the measurement image to have the same Wcs as the reference catalog (this is a temporary limitation).
Traceback (most recent call last):
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2018_39/stack/miniconda3-4.5.4-fcd27eb/Linux64/pipe_base/16.0-12-g726f8f3+6/python/lsst/pipe/base/cmdLineTask.py", line 388, in __call__
result = self.runTask(task, dataRef, kwargs)
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2018_39/stack/miniconda3-4.5.4-fcd27eb/Linux64/pipe_base/16.0-12-g726f8f3+6/python/lsst/pipe/base/cmdLineTask.py", line 447, in runTask
return task.runDataRef(dataRef, **kwargs)
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2018_39/stack/miniconda3-4.5.4-fcd27eb/Linux64/meas_base/16.0-13-gd9b1b71+8/python/lsst/meas/base/forcedPhotImage.py", line 151, in runDataRef
forcedPhotResult = self.run(measCat, exposure, refCat, refWcs, exposureId=exposureId)
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2018_39/stack/miniconda3-4.5.4-fcd27eb/Linux64/meas_base/16.0-13-gd9b1b71+8/python/lsst/meas/base/forcedPhotImage.py", line 168, in run
self.measurement.run(measCat, exposure, refCat, refWcs, exposureId=exposureId)
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2018_39/stack/miniconda3-4.5.4-fcd27eb/Linux64/meas_base/16.0-13-gd9b1b71+8/python/lsst/meas/base/forcedMeasurement.py", line 355, in run
beginOrder=beginOrder, endOrder=endOrder)
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2018_39/stack/miniconda3-4.5.4-fcd27eb/Linux64/meas_base/16.0-13-gd9b1b71+8/python/lsst/meas/base/baseMeasurement.py", line 283, in callMeasure
self.doMeasurement(plugin, measRecord, *args, **kwds)
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2018_39/stack/miniconda3-4.5.4-fcd27eb/Linux64/meas_base/16.0-13-gd9b1b71+8/python/lsst/meas/base/baseMeasurement.py", line 302, in doMeasurement
plugin.measure(measRecord, *args, **kwds)
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2018_39/stack/miniconda3-4.5.4-fcd27eb/Linux64/meas_modelfit/16.0-13-g4c33ca5+8/python/lsst/meas/modelfit/cmodel/cmodelContinued.py", line 105, in measure
"CModel forced measurement currently requires the measurement image to have the same"
lsst.pex.exceptions.wrappers.FatalAlgorithmError: CModel forced measurement currently requires the measurement image to have the same Wcs as the reference catalog (this is a temporary limitation).
forcedPhotCcd WARN: Could not persist metadata for dataId=DataId(initialdata={'visit': 179996, 'filter': 'u', 'raftName': 'R01', 'detectorName': 'S10', 'detector': 3, 'tract': 5065}, tag=set()): Template is not defined for the forcedPhotCcd_metadata dataset type, it must be set before it can be used.
For the final WARN comment, I can see
forcedPhotCcd_metadata:
persistable: PropertySet
storage: YamlStorage
python: lsst.daf.base.PropertySet
tables: raw
template: ''
in /cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2018_39/stack/miniconda3-4.5.4-fcd27eb/Linux64/obs_base/16.0-14-g71e547a+4/policy/datasets.yaml
This is what I was worried about in my previous comment. Apparently we cannot run CModel
in forcedPhotCcd
yet.
Here is a comment extracted from cmodelContinued.py
:
The CModel algorithm currently cannot be run in forced mode when the measurement WCS is different
from the reference WCS (as is the case in CCD forced photometry). This is a temporary limitation
that will be addressed on DM-5405.
@jchiang87 Jim, this is also something you wanted to inquire about I reckon.
We can skip running CModel
for the forced photometry. The DPDD only calls for
objectId
ccdVisitId
psFlux
psFluxErr
psDiffFlux
psDiffFluxErr
flags
As a minor side point,
forcedPhotCcd.py . --rerun coadd-v4:forced-test --id visit=179996 filter=u^g^r^i^z^y
can just be
forcedPhotCcd.py . --rerun coadd-v4:forced-test --id visit=179996 filter=u
Because visit 179996 is a u-band visit. And, you really don't have to specify filter at all.
--id visit=179996
is sufficient.
But indeed to help my own thinking I do find it useful to specify both visit and filter in test scripts to remind myself what filter I'm looking at.
indeed, filter
can be left out entirely. On the other hand before the crash I see a lot of
forcedPhotCcd WARN: Skipping reference 22276114168708587 (child of 22276114168678429) with bad Footprint
Is it time to post this to the #dm-lsstCam channel to get some additional information from DM? Jim Bosch is out on paternity leave, so we need to depend on others to offer details.
forcedPhotCcd WARN: Skipping reference 22276114168708587 (child of 22276114168678429) with bad Footprint
That's likely telling you that the coadd had an object that is on the edge of the detector at the per-visit level. The footprint is bad because it overlaps an edge (or otherwise has too many masked pixels).
The obs_base
setting of:
forcedPhotCcd_metadata:
persistable: PropertySet
storage: YamlStorage
python: lsst.daf.base.PropertySet
tables: raw
template: ''
means the template
value needs to be defined in an obs_*
package before it can be used. E.g., in obs_subaru
, for HSC it's defined here:
https://github.com/lsst/obs_subaru/blob/master/policy/HscMapper.yaml#L458
forcedPhotCcd_metadata:
template: '%(pointing)05d/%(filter)s/tract%(tract)d/forcedPhotCcd_metadata/%(visit)07d-%(ccd)03d.yaml'
We need to define the same kind of template for obs_lsst
.
@wmwv that would make sense for this particular visit, but it is quite noisy
@wmwv that would make sense for this particular visit, but it is quite noisy
I would suggest filtering them out, then. You can parse the log files with some greps to take out common lines that you don't care about.
To record on this thread a suggestion I made to @johannct privately
Remove the line that loads the cModel profile in the following config:
https://github.com/lsst/obs_lsst/blob/master/config/forcedPhotCcd.py
config.load(os.path.join(getPackageDir("obs_lsst"), "config", "cmodel.py"))
A few quick thoughts:
I probably won't be able to follow up on that, whatever it is, but some combination of @RobertLuptonTheGood, @yalsayyad, and @laurenam probably could.
ok switching off cModel in the config and adding a template for forcedPhotCcd_metadata in lsstCamMapper.yaml seems to do the job.
https://jira.lsstcorp.org/browse/DM-18185 for the chatty level
argh, no one more serious warning I think :
forcedPhotCcd WARN: Could not persist metadata for dataId=DataId(initialdata={'visit': 179996, 'filter': 'u', 'raftName': 'R01', 'detectorName': 'S01', 'detector': 1, 'tract': 5066}, tag=set()): no such column: ccd
somewhere the code needs a renaming of the sensor naming
What's your template string?
I can see fits files being persisted nevertheless, but I presume that this needs fixing. It may already have been downstream of w39... but the incorrect example is still there in https://github.com/lsst/meas_base/blob/8acd1882360c4f83412889cb90121961e36c024a/python/lsst/meas/base/forcedPhotCcd.py#L387
Michael got it right again, I had a bad string formatting with ccd in the template, and got sidetracked by the example in the code
http://srs.slac.stanford.edu/Pipeline-II/exp/LSST-DESC/task.jsp?refreshRate=60&task=52615118&refreshIsOn=true&refreshCount=95 Run1.2p : stream 0 of task DC2DM_4_FORCEDCCD version 0.1 1 job per visit, using -j 8 as this is not a pipe_driver task.... not sure what the result will be in terms of parallelism
task completed : 21 failed jobs with a fatal msg. See below total wall time : 58h22m with 125 jobs using 8 cores each (option -j as forcedPhotCcd is not a pipe_driver task; need to study this config) ; total 1000 cores available. Global summary:
[tanugi@cca009 ~]$ qacct -A forcedCcd
Total System Usage
WALLCLOCK UTIME STIME CPU MEMORY IO IOW
================================================================================================================
24590329 134260071.856 1095555.924 1490966006.525 444121207.284 488336.806 45837.740
@airnandez files to transfer under /sps/lsst/dataproducts/desc/DC2/Run1.2p/w_2018_39/rerun/coadd-v4/forced/
(when the 21 failures are understood)
@wmwv I would suggest if at all possible to script the DPDD part for these files separately from the coadd catalogue part, so that I can run it after each visit is completed.
The FATAL have an error message of this kind :
forcedPhotCcd FATAL: Failed on dataId=DataId(initialdata={'visit': 193901, 'filter': 'r', 'raftName': 'R23', 'detectorName': 'S22', 'detector': 107, 'tract': 5065}, tag=set()): TaskError: Reference {'tract': 5065, 'patch': '3,6'} doesn't exist
A total of 6 tracts and 9 patches are concerned:
5065 3,6
5066 3,6 2,5
4636 1,6
5064 4,6 6,6
4429 4,0 5,1
5062 1,6
THese are all ptches outside of the footprint. These tract/patches do exist under rerun/coadd-v4/deepCoadd-results/merged/
. I for several of those and they are not in my visit to tract/patch mapping DB, so I am afraid that we have some sort of mapping inconsistency...
Note that the stack code goes on and complete the other detectors, and clearly the forcedPhotCcd results on these failed patches are likely useless.... So for the sake of moving ahead with testing the DPDD creation from these I would contend that transfer is OK. @wmwv ?
@rearmstr Bob do you have any suggestion as to what could go bad?
@airnandez files to transfer under /sps/lsst/dataproducts/desc/DC2/Run1.2p/w_2018_39/rerun/coadd-v4/forced/ (when the 21 failures are understood)
Please let me know when you are ready for us to transfer the data to NERSC.
@airnandez Yes, please transfer to NERSC.
@johannct Yes, the DPDD extraction will be run after the coadds. It will probably be in two steps.
@airnandez Bob just gave me a hint as to how to roll the failed stream back for clean living, so I will let you know as soon as they are done : hint is to add -c references.skipMissing=True
to the command line. Thiq fixed the FATAL occurrences in the 21 visits. Transfer occurring asap.
Preparing to run the same code on 1.2i. I want to use this opportunity to test two things :
-j ${NSLOTS}
to make sure that it is usable identically as the --cores ${NSLOTS}
of the pipe_driver tasks. @rearmstr said he would inquire on his side as well.This is the summary of a transfer campaign of data products of Run1.2i from IN2P3 to NERSC, as requested by @johannct above:
@airnandez files to transfer under
/sps/lsst/dataproducts/desc/DC2/Run1.2p/w_2018_39/rerun/coadd-v4/forced/
(when the 21 failures are understood)
Description | Value |
---|---|
Location at CC-IN2P3 (sources) | /sps/lsst/dataproducts/desc/DC2/Run1.2p/w_2018_39/rerun/coadd-v4 |
Location at NERSC (destination) | /global/projecta/projectdirs/lsst/global/in2p3/Run1.2i/w_2018_39/rerun/coadd-v4 |
Number of files transferred | 248,303 |
Data volume transferred | 338 GB |
The list of transferred files is located at NERSC at:
/global/projecta/projectdirs/lsst/global/in2p3/Run1.2i/logs/2019-03-11-in2p3-to-nersc.txt
@airnandez The location above at NERSC is the incorrect location. That was the Run1.2i directory tree.
Please put these Run1.2p forced files in
/global/projecta/projectdirs/lsst/global/in2p3/Run1.2p/w_2018_39/rerun/coadd-v4/
A move on the NERSC filesystem rather than re-transfer is likely the most efficient thing to do:
mv /global/projecta/projectdirs/lsst/global/in2p3/Run1.2i/w_2018_39/rerun/coadd-v4/forced /global/projecta/projectdirs/lsst/global/in2p3/Run1.2p/w_2018_39/rerun/coadd-v4/
I tried but do not have permission.
@wmwv and @airnandez Let's wait a moment before making this move.
We already have /global/projecta/projectdirs/lsst/global/in2p3/Run1.2p/w_2018_39/rerun/coadd-v4/
which is the last Run1.2p reprocessing completed back in January. Admittedly this transfer should just entail adding the subdirectory forced
, but do we want to maintain some sense that this is an addition to the "old" data or are we content to just add the forced photometry data?
Going forward I'd probably want to have some indication that this is an addition and update the version of the DRP output to be coadd-v5
.. but obviously we want to match IN2P3 and leaving this as coadd-v4
is probably ok. Next time, I'd probably use a different rerun
when the data is produced which chained coadd-v4
so it is more clear that this is a follow-on processing.
I can complete this move - assuming we are all in agreement and understand what we're doing.
forced
has been run with the same codebase. There is no DRP version as it is a separate task that has been run. It was already the case for calexp-v4 and coadd-v4. So I am not sure that it is critical to track this info here.
@heather999 I agree with the spirit of your concern. But in this particular case, I think just putting it in coadd-v4
is the right thing to do.
@wmwv @johannct @airnandez
I just moved the forced
directory to /global/projecta/projectdirs/lsst/global/in2p3/Run1.2p/w_2018_39/rerun/coadd-v4/forced
I"ll also update the copy of the data on $CSCRATCH/../desc/DC2/data/Run1.2p
thanks Heather, I am now waiting for some feedback before I launch the 1.2i version (assuming we want it)
I think that we decided to move forward : test forcedPhotCcd on top of coadd and move on with 2.1i, so I am closing this.
I suggest that we run one more step at the end of the coadd pipeline:
Right now we run the forced photometry on the per-filter coadds -- this provides the measurements for the
Object
table.The next step is to run the forced photometry on each individual visit image base don the defined objects from the coadds - these measurements are what will fill the DPDD
ForcedSource
table.I suggest that is worth doing on the current latest Run 1.2p and Run 1.2i. It could be done at either IN2P3 or NERSC.