desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
36 stars 24 forks source link

Jura tile 82265 much lower qa_efftime than Iron #2263

Closed sbailey closed 4 months ago

sbailey commented 4 months ago

@araichoor can you debug why tile 82265 (special-bright) in Jura has much lower QA_EFFTIME than it did in Iron:

https://data.desi.lbl.gov/desi/spectro/redux/iron/tiles/cumulative/82265/20211006/tile-qa-82265-thru20211006.png https://data.desi.lbl.gov/desi/spectro/redux/jura/tiles/cumulative/82265/20211006/tile-qa-82265-thru20211006.png

Night 20211003 expid 102826 is the only exposure with a decent EFFTIME_SPEC=191 but the QA_EFFTIME when from 195 to 0. Why?

Thanks to @abrodze for identifying this in Jura QA.

sbailey commented 4 months ago

Other similar cases also noted by @abrodze

Tile 23551 (main-bright) exposure 20220403/128581 went from QA_EFFTIME 76 -> 0 https://data.desi.lbl.gov/desi/spectro/redux/iron/tiles/cumulative/23551/20220404/tile-qa-23551-thru20220404.png https://data.desi.lbl.gov/desi/spectro/redux/jura/tiles/cumulative/23551/20220404/tile-qa-23551-thru20220404.png

tile 42621 (main-backup) exposure 20211218/114628 went from QA_EFFTIME 41 -> 0 https://data.desi.lbl.gov/desi/spectro/redux/iron/tiles/cumulative/42621/20220114/tile-qa-42621-thru20220114.png https://data.desi.lbl.gov/desi/spectro/redux/jura/tiles/cumulative/42621/20220114/tile-qa-42621-thru20220114.png

araichoor commented 4 months ago

from a quick debug on 23551:

I think it s a combination of two things, which ends up setting efftime=0 for expid=128581 (instead of ~76s):

(1) when the tile-qa has been run, the exposure-qa-00128581.fits (from 20220403) was missing:

-rw-r----- 1 desi desi 630K May  9 12:16 /global/cfs/cdirs/desi/spectro/redux/jura/tiles/cumulative/23551/20220404/tile-qa-23551-thru20220404.fits
-rw-r----- 1 desi desi 445K May 13 11:09 /global/cfs/cdirs/desi/spectro/redux/jura/exposures/20220403/00128581/exposure-qa-00128581.fits

so the code tries to get the efftime for that exposure: https://github.com/desihub/desispec/blob/0ea2729b0bd376f641b47be17bd4097b83ff71a9/py/desispec/tile_qa.py#L123-L127

(2) then I think there s a bug in the code: in this line, it should be exposure_night instead of night: https://github.com/desihub/desispec/blob/0ea2729b0bd376f641b47be17bd4097b83ff71a9/py/desispec/tile_qa.py#L127 because of that, the code looks for expid=128581 files in the 20220404 night, instead of 20220403, hence fails:

INFO:tile_qa.py:126:compute_tile_qa: running missing exposure qa
WARNING:exposure_qa.py:60:compute_exposure_qa: no /global/cfs/cdirs/desi/spectro/redux/jura/preproc/20220404/00128581/fibermap-00128581.fits
WARNING:tile_qa.py:132:compute_tile_qa: failed to compute exposure qa

please let me know if that s enough debugging.

sbailey commented 4 months ago

Thanks for the debugging; that makes sense. I opened ticket #2264 with a diagnosis of how this situation happened and what we could do to avoid it in the future. For the purposes of Jura I will fix these tiles by hand, then close this ticket.

sbailey commented 4 months ago

Belt-and-suspenders-and-duct-tape: zproc knows that tile-qa needs the coadd and redrock files as input:

INFO:util.py:128:runcmd: RUNNING: desispec.scripts.tileqa.main(['-g', 'cumulative', '-n', '20220404', '-t', '23551'])
  Inputs
    /global/cfs/cdirs/desi/spectro/redux/jura/tiles/cumulative/23551/20220404/coadd-0-23551-thru20220404.fits
    /global/cfs/cdirs/desi/spectro/redux/jura/tiles/cumulative/23551/20220404/redrock-0-23551-thru20220404.fits
...

but ideally it should also know about needing the exposure-qa files so that it would stop with an informative error messages about what inputs are missing before even trying. The process of tile-qa generating the exposure-qa was primarily useful for daily when it needed to "catch up" on old exposures before exposure-qa was automatically generated. But by now, exposure-qa should really be generated by the pipeline and missing it should be an error condition.

I suggest that we still fix the night vs. exposure_night bug and leave the auto-generation in place, but not rely upon it for normal operations.

akremin commented 4 months ago

+1 on that last point. It appears that there was an oversight in committing exposure-qa from the list of inputs. We absolutely should have it there, which should resolve issues such as those encountered in Jura.

Although thinking about this more -- that will make it impossible for tile-qa to create the exposure-qa, since daily using this same code. So it might not have been an oversight but rather an explicit choice to allow daily to run effectively... So including exposures-qa in the inputs may not be as clear of a "win" as I originally thought.

sbailey commented 4 months ago

Fixed in Jura for all 3 tiles mentioned here; they are now back to being QA=VALID. For the record:

TILEID=23551; NIGHT=20220404
TILEID=82265; NIGHT=20211006
TILEID=42621; NIGHT=20220114

cd $CFS/desi/spectro/redux/jura/tiles/cumulative/$TILEID/$NIGHT/
mkdir attic && mv tile-qa*.* attic
mv logs/tile-qa-$TILEID-thru$NIGHT.log logs/tile-qa-$TILEID-thru$NIGHT.log.1
desi_tile_qa -g cumulative -n $NIGHT -t $TILEID |& tee logs/tile-qa-$TILEID-thru$NIGHT.log

Closing this ticket as fixed-by-hand for Jura; let's keep other discussion in #2264 for the long-term fix.