desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
36 stars 24 forks source link

cross-night ztile -> tilenight dependency tracking #2264

Closed sbailey closed 1 month ago

sbailey commented 4 months ago

Followup to #2263:

ztile jobs require the cframe files from all nights/expids to exist to make the combined spectra files, and also the exposure-qa files from all nights/expids to make the tile-qa file. We do not track cross-night dependencies for ztile -> tilenight jobs, but instead just let them run and crash if they don't find the cframes they need, and then resubmit them. However, this procedure doesn't work if the cframes exist but the exposure-qa does not. An example is the Jura processing of tile 23551 was observed on 20220403 and 20220404:

Action items:

For the purposes of Jura, I'm going to remove the tile-qa files and rerun tile-qa for the impacted tiles so that they will pick up the previously-missing-but-now-existing exposure-qa files.

Adding this to the Kilimanjaro dashboard.

sbailey commented 4 months ago

I meant to post this here instead of #2263; reposting here to keep the comment with the ticket that will be open until we fix the underlying problem.

Belt-and-suspenders-and-duct-tape: zproc knows that tile-qa needs the coadd and redrock files as input:

INFO:util.py:128:runcmd: RUNNING: desispec.scripts.tileqa.main(['-g', 'cumulative', '-n', '20220404', '-t', '23551']) Inputs /global/cfs/cdirs/desi/spectro/redux/jura/tiles/cumulative/23551/20220404/coadd-0-23551-thru20220404.fits /global/cfs/cdirs/desi/spectro/redux/jura/tiles/cumulative/23551/20220404/redrock-0-23551-thru20220404.fits ... but ideally it should also know about needing the exposure-qa files so that it would stop with an informative error messages about what inputs are missing before even trying. The process of tile-qa generating the exposure-qa was primarily useful for daily when it needed to "catch up" on old exposures before exposure-qa was automatically generated. But by now, exposure-qa should really be generated by the pipeline and missing it should be an error condition.

I suggest that we still fix the night vs. exposure_night bug and leave the auto-generation in place, but not rely upon it for normal operations.

To which @akremin replied

+1 on that last point. It appears that there was an oversight in committing exposure-qa from the list of inputs. We absolutely should have it there, which should resolve issues such as those encountered in Jura.

Although thinking about this more -- that will make it impossible for tile-qa to create the exposure-qa, since daily using this same code. So it might not have been an oversight but rather an explicit choice to allow daily to run effectively... So including exposures-qa in the inputs may not be as clear of a "win" as I originally thought.

sbailey commented 1 month ago

Fixed in PR #2306; closing.