Closed sbailey closed 4 months ago
Other similar cases also noted by @abrodze
Tile 23551 (main-bright) exposure 20220403/128581 went from QA_EFFTIME 76 -> 0 https://data.desi.lbl.gov/desi/spectro/redux/iron/tiles/cumulative/23551/20220404/tile-qa-23551-thru20220404.png https://data.desi.lbl.gov/desi/spectro/redux/jura/tiles/cumulative/23551/20220404/tile-qa-23551-thru20220404.png
tile 42621 (main-backup) exposure 20211218/114628 went from QA_EFFTIME 41 -> 0 https://data.desi.lbl.gov/desi/spectro/redux/iron/tiles/cumulative/42621/20220114/tile-qa-42621-thru20220114.png https://data.desi.lbl.gov/desi/spectro/redux/jura/tiles/cumulative/42621/20220114/tile-qa-42621-thru20220114.png
from a quick debug on 23551:
I think it s a combination of two things, which ends up setting efftime=0 for expid=128581 (instead of ~76s):
(1) when the tile-qa has been run, the exposure-qa-00128581.fits
(from 20220403) was missing:
-rw-r----- 1 desi desi 630K May 9 12:16 /global/cfs/cdirs/desi/spectro/redux/jura/tiles/cumulative/23551/20220404/tile-qa-23551-thru20220404.fits
-rw-r----- 1 desi desi 445K May 13 11:09 /global/cfs/cdirs/desi/spectro/redux/jura/exposures/20220403/00128581/exposure-qa-00128581.fits
so the code tries to get the efftime for that exposure: https://github.com/desihub/desispec/blob/0ea2729b0bd376f641b47be17bd4097b83ff71a9/py/desispec/tile_qa.py#L123-L127
(2) then I think there s a bug in the code: in this line, it should be exposure_night
instead of night
:
https://github.com/desihub/desispec/blob/0ea2729b0bd376f641b47be17bd4097b83ff71a9/py/desispec/tile_qa.py#L127
because of that, the code looks for expid=128581 files in the 20220404 night, instead of 20220403, hence fails:
INFO:tile_qa.py:126:compute_tile_qa: running missing exposure qa
WARNING:exposure_qa.py:60:compute_exposure_qa: no /global/cfs/cdirs/desi/spectro/redux/jura/preproc/20220404/00128581/fibermap-00128581.fits
WARNING:tile_qa.py:132:compute_tile_qa: failed to compute exposure qa
please let me know if that s enough debugging.
Thanks for the debugging; that makes sense. I opened ticket #2264 with a diagnosis of how this situation happened and what we could do to avoid it in the future. For the purposes of Jura I will fix these tiles by hand, then close this ticket.
Belt-and-suspenders-and-duct-tape: zproc knows that tile-qa needs the coadd and redrock files as input:
INFO:util.py:128:runcmd: RUNNING: desispec.scripts.tileqa.main(['-g', 'cumulative', '-n', '20220404', '-t', '23551'])
Inputs
/global/cfs/cdirs/desi/spectro/redux/jura/tiles/cumulative/23551/20220404/coadd-0-23551-thru20220404.fits
/global/cfs/cdirs/desi/spectro/redux/jura/tiles/cumulative/23551/20220404/redrock-0-23551-thru20220404.fits
...
but ideally it should also know about needing the exposure-qa files so that it would stop with an informative error messages about what inputs are missing before even trying. The process of tile-qa generating the exposure-qa was primarily useful for daily when it needed to "catch up" on old exposures before exposure-qa was automatically generated. But by now, exposure-qa should really be generated by the pipeline and missing it should be an error condition.
I suggest that we still fix the night vs. exposure_night bug and leave the auto-generation in place, but not rely upon it for normal operations.
+1 on that last point. It appears that there was an oversight in committing exposure-qa from the list of inputs. We absolutely should have it there, which should resolve issues such as those encountered in Jura.
Although thinking about this more -- that will make it impossible for tile-qa to create the exposure-qa
, since daily
using this same code. So it might not have been an oversight but rather an explicit choice to allow daily
to run effectively... So including exposures-qa
in the inputs may not be as clear of a "win" as I originally thought.
Fixed in Jura for all 3 tiles mentioned here; they are now back to being QA=VALID. For the record:
TILEID=23551; NIGHT=20220404
TILEID=82265; NIGHT=20211006
TILEID=42621; NIGHT=20220114
cd $CFS/desi/spectro/redux/jura/tiles/cumulative/$TILEID/$NIGHT/
mkdir attic && mv tile-qa*.* attic
mv logs/tile-qa-$TILEID-thru$NIGHT.log logs/tile-qa-$TILEID-thru$NIGHT.log.1
desi_tile_qa -g cumulative -n $NIGHT -t $TILEID |& tee logs/tile-qa-$TILEID-thru$NIGHT.log
Closing this ticket as fixed-by-hand for Jura; let's keep other discussion in #2264 for the long-term fix.
@araichoor can you debug why tile 82265 (special-bright) in Jura has much lower QA_EFFTIME than it did in Iron:
https://data.desi.lbl.gov/desi/spectro/redux/iron/tiles/cumulative/82265/20211006/tile-qa-82265-thru20211006.png https://data.desi.lbl.gov/desi/spectro/redux/jura/tiles/cumulative/82265/20211006/tile-qa-82265-thru20211006.png
Night 20211003 expid 102826 is the only exposure with a decent EFFTIME_SPEC=191 but the QA_EFFTIME when from 195 to 0. Why?
Thanks to @abrodze for identifying this in Jura QA.