Closed sbailey closed 1 year ago
@sbailey this may not be ultimately relevant to the resolution of this issue but, for the record, the first three arc failures listed above used pre-#1818 dark models from DESI_SPECTRO_CALIB:
20220913 00142170
daily -- /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/ccd/dark-sm8-z7-20211124.fits.gz
p1 -- /global/cfs/cdirs/desi/spectro/desi_spectro_dark/v2209/dark_frames/dark-sm8-z7-20220913.fits.gz
20220922 00143900
daily -- /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/ccd/dark-sm8-z7-20211124.fits.gz
p1 -- /global/cfs/cdirs/desi/spectro/desi_spectro_dark/v2209/dark_frames/dark-sm8-z7-20220913.fits.gz
20220924 00144161
daily -- /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/ccd/dark-sm8-z7-20211124.fits.gz
p1 -- /global/cfs/cdirs/desi/spectro/desi_spectro_dark/v2209/dark_frames/dark-sm8-z7-20220913.fits.gz
since the changes in #1818 were merged on Oct 3 2022.
@sbailey @julienguy I ran desi_psf_fit
for camera z7 of the arc exposures listed in the first comment above, using the main desi environment on Perlmutter but with the input files preproc-z7-EXPID.fits.gz and shifted-input-psf-z7-EXPID.fits each from either daily or p1. This is a total of 4 runs of desi_psf_fit
for each exposure.
For example, when using the preproc from daily and the shifted-input-psf from p1, for exposure 153520, the commands run were:
source /global/common/software/desi/desi_environment.sh main
desi_psf_fit \
-a /dvs_ro/cfs/cdirs/desi/spectro/redux/daily/preproc/20221115/00153520/preproc-z7-00153520.fits.gz \
--in-psf /dvs_ro/cfs/cdirs/desi/spectro/redux/p1/exposures/20221115/00153520/shifted-input-psf-z7-00153520.fits \
--out-psf /pscratch/sd/m/malvarez/pre_daily-psf_p1-fit-psf-z7-00153520_19.fits \
--lamp-lines /global/common/software/desi/cori/desiconda/20211217-2.0.0/code/specex/main/py/specex/data/specex_linelist_desi.txt --first-bundle 19 --last-bundle 19 --first-fiber 475 --last-fiber 499 --legendre-deg-wave 3 --fit-continuum --broken-fibers 20,87,134,198,199,256,257,304,319,320,414
Here are the results:
night expid preproc inputpsf error status
======== ======== ======== ======== ======== ========
20220913 00142170 daily daily - 0
20220913 00142170 p1 daily - 0
20220913 00142170 daily p1 - 0
20220913 00142170 p1 p1 brent 1
======== ======== ======== ======== ======== ========
20220922 00143900 daily daily - 0
20220922 00143900 p1 daily - 0
20220922 00143900 daily p1 cholesky 1
20220922 00143900 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20220924 00144161 daily daily - 0
20220924 00144161 p1 daily - 0
20220924 00144161 daily p1 cholesky 1
20220924 00144161 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20221004 00145881 daily daily - 0
20221004 00145881 p1 daily - 0
20221004 00145881 daily p1 cholesky 1
20221004 00145881 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20221013 00148067 daily daily - 0
20221013 00148067 p1 daily - 0
20221013 00148067 daily p1 cholesky 1
20221013 00148067 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20221016 00148718 daily daily - 0
20221016 00148718 p1 daily - 0
20221016 00148718 daily p1 cholesky 1
20221016 00148718 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20221020 00149258 daily daily - 0
20221020 00149258 p1 daily - 0
20221020 00149258 daily p1 - 0
20221020 00149258 p1 p1 brent 1
======== ======== ======== ======== ======== ========
20221021 00149387 daily daily - 0
20221021 00149387 p1 daily - 0
20221021 00149387 daily p1 - 0
20221021 00149387 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20221021 00149389 daily daily - 0
20221021 00149389 p1 daily - 0
20221021 00149389 daily p1 - 0
20221021 00149389 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20221031 00150967 daily daily - 0
20221031 00150967 p1 daily - 0
20221031 00150967 daily p1 cholesky 1
20221031 00150967 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20221031 00150969 daily daily - 0
20221031 00150969 p1 daily - 0
20221031 00150969 daily p1 - 0
20221031 00150969 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20221105 00152007 daily daily - 0
20221105 00152007 p1 daily - 0
20221105 00152007 daily p1 cholesky 1
20221105 00152007 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20221110 00152780 daily daily - 0
20221110 00152780 p1 daily - 0
20221110 00152780 daily p1 - 0
20221110 00152780 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20221111 00152933 daily daily - 0
20221111 00152933 p1 daily - 0
20221111 00152933 daily p1 - 0
20221111 00152933 p1 p1 cholesky 1
======== ======== ======== ======== ======== ========
20221115 00153520 daily daily - 0
20221115 00153520 p1 daily - 0
20221115 00153520 daily p1 brent 1
20221115 00153520 p1 p1 cholesky 1
The stdout and stderr corresponding to each row above is in
$CFS/desi/users/malvarez/testscripts/pre_[daily,p1]-psf_[daily,p1]-fit-psf-z7-[expid].log
Note that when the input psf from daily is used, specex never crashes. This indicates that the differences that are causing the specex failures in p1 and not daily for the arc exposures listed in the first comment above are very likely upstream from specex, in the processing that generated the input PSF to specex, and not in specex itself.
One can see easily with plot_fiber_traces -i /global/cfs/cdirs/desi/spectro/redux/p1/exposures/20220913/00142170/shifted-input-psf-z7-00142170.fits
that fiber 3987 (#487) has a bad input trace so all PSF fits that use this input trace will fail. This happens because this fiber has a very low transmission. It is masked out as a bad fiber. The code knows about this and does not try to fit it. The issue is that this incorrect fiber trace overlaps with others and that's a problem.
The config from DESI_SPECTRO_CALIB is version V20220907 only valid from 20220907 to 20221120. Its input PSF is $DESI_SPECTRO_CALIB/spec/sm8/psf-sm8-z7-20220907-20220925.fits and has the incorrect fiber trace.
The next calibration config V20221121 has an updated PSF where this has been fixed, it is $DESI_SPECTRO_CALIB/spec/sm8/psf-z7-20220907-average.fits . I suggest we simply update the default PSF in the config file for V20220907 because the optics is the same as for V20221121.
I did this change and updated DESI_SPECTRO_CALIB at nersc. Can you please verify that this solved the problem?
@julienguy thanks for finding the solution. It would have taken me quite some time on my own to (re-)discover the usefulness of plot_fiber_traces
to diagnose this problem.
I have verified that desi_fit_psf
succeeds when using the shifted-input-psf file output from desi_compute_trace_shifts
when psf-z7-20220907-average.fits
is provided as an input to desi_compute_trace_shifts
(config=current in the table below), but fails whenpsf-sm8-z7-20220907-20220925.fits
is provided instead (config=preJul17 in the table below).
night expid preproc config error status
=================================================================
20220913 00142170 p1 pre-Jul7 brent 1
20220913 00142170 p1 current - 0
-----------------------------------------------------------------
20220922 00143900 p1 pre-Jul7 cholesky 1
20220922 00143900 p1 current - 0
-----------------------------------------------------------------
20220924 00144161 p1 pre-Jul7 cholesky 1
20220924 00144161 p1 current - 0
-----------------------------------------------------------------
20221004 00145881 p1 pre-Jul7 cholesky 1
20221004 00145881 p1 current - 0
-----------------------------------------------------------------
20221013 00148067 p1 pre-Jul7 cholesky 1
20221013 00148067 p1 current - 0
-----------------------------------------------------------------
20221016 00148718 p1 pre-Jul7 cholesky 1
20221016 00148718 p1 current - 0
-----------------------------------------------------------------
20221020 00149258 p1 pre-Jul7 brent 1
20221020 00149258 p1 current - 0
-----------------------------------------------------------------
20221021 00149387 p1 pre-Jul7 cholesky 1
20221021 00149387 p1 current - 0
-----------------------------------------------------------------
20221021 00149389 p1 pre-Jul7 cholesky 1
20221021 00149389 p1 current - 0
-----------------------------------------------------------------
20221031 00150967 p1 pre-Jul7 cholesky 1
20221031 00150967 p1 current - 0
-----------------------------------------------------------------
20221031 00150969 p1 pre-Jul7 cholesky 1
20221031 00150969 p1 current - 0
-----------------------------------------------------------------
20221105 00152007 p1 pre-Jul7 cholesky 1
20221105 00152007 p1 current - 0
-----------------------------------------------------------------
20221110 00152780 p1 pre-Jul7 cholesky 1
20221110 00152780 p1 current - 0
-----------------------------------------------------------------
20221111 00152933 p1 pre-Jul7 cholesky 1
20221111 00152933 p1 current - 0
-----------------------------------------------------------------
20221115 00153520 p1 pre-Jul7 cholesky 1
20221115 00153520 p1 current - 0
-----------------------------------------------------------------
For example, first running
source /global/common/software/desi/desi_environment.sh main
desi_compute_trace_shifts \
-i /dvs_ro/cfs/cdirs/desi/spectro/redux/p1/preproc/20221115/00153520/preproc-z7-00153520.fits.gz \
--psf /dvs_ro/cfs/cdirs/desi/spectro/desi_spectro_calib/0.4.0/spec/sm8/psf-z7-20220907-average.fits \
--outpsf $SCRATCH/shifted-input-psf.fits \
--degxx 0 \
--degxy 0 \
--degyx 0 \
--degyy 0 \
--arc-lamps
and then
desi_psf_fit -a /dvs_ro/cfs/cdirs/desi/spectro/redux/p1/preproc/20221115/00153520/preproc-z7-00153520.fits.gz \
--in-psf $SCRATCH/shifted-input-psf.fits \
--out-psf $SCRATCH/fit-psf.fits \
--lamp-lines /global/common/software/desi/perlmutter/desiconda/20230111-2.1.0/code/specex/0.8.5/lib/python3.10/site-packages/specex/data/specex_linelist_desi.txt \
--first-bundle 19 \
--last-bundle 19 \
--first-fiber 475 \
--last-fiber 499 \
--legendre-deg-wave 3 --fit-continuum --broken-fibers 20,87,134,198,199,256,257,304,319,320,414
results in desi_psf_fit
completing successfully and returning status=0 (which can be verified by running echo $?
immediately following desi_psf_fit
and corresponds to the row with preproc=p1 and config=current in the table above), but first running
source /global/common/software/desi/desi_environment.sh main
desi_compute_trace_shifts \
-i /dvs_ro/cfs/cdirs/desi/spectro/redux/p1/preproc/20221115/00153520/preproc-z7-00153520.fits.gz \
--psf /dvs_ro/cfs/cdirs/desi/spectro/desi_spectro_calib/0.4.0/spec/sm8/psf-sm8-z7-20220907-20220925.fits \
--outpsf $SCRATCH/shifted-input-psf.fits \
--degxx 0 \
--degxy 0 \
--degyx 0 \
--degyy 0 \
--arc-lamps
results in desi_psf_fit
reporting a FATAL ERROR
instead and returning status=1 (which can be verified by running echo $?
immediately following desi_psf_fit
and corresponds to the row with preproc=p1 and config=preJul7 in the table above).
@sbailey currently for software environment 23.6
DESI_SPECTRO_CALIB=/global/cfs/cdirs/desi/spectro/desi_spectro_calib/0.5.0
and thus if 23.6 continued to be used as-is for p1 then this issue would be resolved by updating the configuration for z7 over the time interval from 20220907 to 20221120 in /global/cfs/cdirs/desi/spectro/desi_spectro_calib/0.5.0
in the same way that @julienguy has already done for /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk
, as described in the comment preceding this one.
Generally, in order for these PSFs to succeed for any future processing of p1, the update to the configuration for z7 over the time interval from 20220907 to 20221120 made in
/global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk
would have to be reflected in the particular location pointed to by DESI_SPECTRO_CALIB at the time of processing.
In case it helps, this is why these arcs currently fail for p1 using 23.6 with desi_spectro_calib/0.5.0
while they would succeed (as of the changes @julienguy made earlier today) for desi_spectro_calib/trunk
instead:
% diff /global/cfs/cdirs/desi/spectro/desi_spectro_calib/0.5.0/spec/sm8/sm8-z.yaml \
/global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm8/sm8-z.yaml
140c140
< PSF: spec/sm8/psf-sm8-z7-20220907-20220925.fits
---
> PSF: spec/sm8/psf-z7-20220907-average.fits
One can see easily with
plot_fiber_traces -i /global/cfs/cdirs/desi/spectro/redux/p1/exposures/20220913/00142170/shifted-input-psf-z7-00142170.fits
that fiber 3987 (#487) has a bad input trace so all PSF fits that use this input trace will fail. This happens because this fiber has a very low transmission. It is masked out as a bad fiber. The code knows about this and does not try to fit it. The issue is that this incorrect fiber trace overlaps with others and that's a problem.
I don't see fiber 3987 listed in any of the entries in
/global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm8/sm8-z.yaml
For V20220907
(valid from 20220907 to 20221120), the fibers listed are:
BROKENFIBERS: 3520,3587,3634,3914
BADCOLUMNFIBERS: 3698,3699,3756,3757,3804,3819,3820
and these correspond to those passed to specex in the p1 production, e.g.
% grep desi_psf /global/cfs/cdirs/desi/spectro/redux/p1/run/scripts/night/20220922/arc-20220922-00143901-a0123456789-10557679.log | grep psf\-z7 | grep "last-bundle 19" | awk '{print $(NF-1)" "$NF}'
--broken-fibers 20,87,134,198,199,256,257,304,319,320,414
does not contain 487 in the list of broken fibers.
@sbailey @julienguy is there a different way from the --broken-fibers
option that specex knows fiber 3987 (#487) is a bad fiber, or is it this information external to specex?
Looking at the daily fiberflat for z7, several fibers lost transmission during the 2021 summer shutdown. See for instance.
ds9 $DESI_SPECTRO_REDUX/iron/calibnight/{20210709,20210917}/fiberflatnight-z7-*.fits
(You can see the same variation in b and r so it's a fiber issue, not a CCD issue). 487 is the worse with a transmission of ~0.1 after the shutdown and should be added to the list of 'broken' fibers even though it is not broken.
I see also fibers with sharp reduction of transmission in other cameras!
Actually this had been studied and reported at the time: see https://desi.lbl.gov/DocDB/cgi-bin/private/ShowDocument?docid=6460 . I am adding fiber 3987 to the list of 'broken' fibers because of its low transmission (0.1 of the average).
I believe this ticket has been addressed through a combination of updated calibration files and PR #70 . The originally reported failures have been regenerated. Closing.
The following PSFs failed the p1 test production (software environment 23.6 = specex/0.8.5 + desispec/0.59.1), but succeeded when they were initially run on daily:
It is suspicious that they are all z7. Although we expect some level of arc fitting failure, we don't expect daily successes to become failures in later productions.
From the logs:
That output file /global/cfs/cdirs/desi/spectro/redux/p1/exposures/20220913/00142170/fit-psf-z7-00142170_19.fits exists but rerunning that command with
--out-psf $SCRATCH/fit-psf-z7-00142170_19.fits
instead writes the file but crashes withPossibly related: z7 is one of the CCDs that sometimes has wavelength solutions that extend slightly off the edge of the CCD (which lead to EFFTIME_SPEC NaN in desihub/desispec#1978). If that's the case here, a slight consistency prior could help.
Grepping the logs for "using DARK" and "Using MASK", it appears that same calibrations were used in both cases (daily vs. p1), so this might be a Cori vs. Perlmutter edge case rather than a change in underlying calibrations not masking something etc.
@marcelo-alvarez please debug why these are newly failing and whether we can be more robust to these cases. Thanks.
Heads up @julienguy @akremin .