desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
35 stars 24 forks source link

Introduce desi_proc_night to unify and simplify processing scripts #2187

Closed akremin closed 5 months ago

akremin commented 6 months ago

Overview

The main purpose of this code is to reduce redundancy and improve maintainability by having one script that can submit data as it comes in (like desi_daily_proc_manager) and data already acquired (like desi_run_night). This resolves issue #2175. Because those are functioning scripts and useful for testing against the new code, they have been retained in this branch and a new script has been added that can replace both of them. After vetting and with time, those other scripts can be removed in a future update to desispec.

The new script is called desi_proc_night (meant to intentionally parallel desi_proc_tilenight and desi_proc). The script has many of the same options as the old scripts, but with some removed that I no longer thought were necessary and a few new ones added. The major new flag to the script is --daily. That is a meta-flag in that when set it sets several other values to the expected default for daily processing. The code is smart enough to know whether it is actually running during the night in question or not. If it is, it will keep updating the exposure tables, won't run unless cte flats exist, and won't submit the last science tile observed. Once morning has been reached, it is run in daily mode after the night has ended, or it isn't in daily mode it will process the last science tile and process whatever calibrations it can find.

This also adds the machinery to create and submit batch a script for desi_link_calibnight introduced in PR #2165 . This resolved issue #2174 . It also creates a new "JOBDESC" in the processing tables called "linkcal" that indicates the linking job for tracking and dependencies. The information is stored in per-night files parallel to the exposure_table_NIGHT.csv files in SVN. The naming scheme is override-NIGHT.yaml. If the linking information isn't defined in an override file and there aren't sufficient flats for calibrations, then the code will raise an error.

Here is a minimal example of the override file I have been using for testing (since I don't provide an example in desispec). Note that this format is likely to change and evolve as the files are used for more things:

## DESI override file
calibration:
    linkcal:
        refnight: 20231010
        #include: []
        #exclude: []
        #camword  # not implemented
    #autocalib_fiberflat: # not implemented
        #solve_gradient: True 

#science: # not implemented
    #bkgsub-for-science: # not implemented
        #exposures: # not implemented

There is also a new script to update the exposure table as a standalone script since the function was necessary for the refactor and a script wrapper seemed useful. To create the function and script I had to rename an existing set of code that reformatted the exposure_tables to add/remove columns. That old code is now called reformat_exptables / desi_reformat_exposure_tables.

This also renames desispec.workflow.procfuncs to desispec.workflow.processing for more clarity and adds exposure_table_NIGHT.csv, processing_table_SPECPROD-NIGHT.csv, and unprocessed_table_SPECPROD-NIGHT.csv to the desispec.io.meta.findfile().

Tests

1) Using dry_run_level=3 showed it can run with --daily flag set for a complete night. 1) Using dry_run_level=3 showed it can run without --daily flag set for a complete night given an exposure_table is available. 1) Using dry_run_level=1 showed that it can run and produce scripts for a nominal reprocessing night. 1) Using dry_run_level=1 showed that it can run and produce scripts using --daily. 1) Modified exposure table to be missing flats and using dry_run_level=1 showed that it will submit up to arcs but then raise an error because there are no flats and not in --daily mode. 1) Using dry_run_level=1 and --daily mode showed that it will submit what is available before exiting without submitting an incomplete set of flats. 1) Using dry_run_level=1, a modified version of --daily mode after the actual date, and a modified exposure table missing some flats and showed that it will submit what is available before raising an error 1) Modified exposure table to be missing flats, added an override file to link to a different night's calibrations, and using dry_run_level=1 showed that it will link to the previous night and not submit any other calibrations for processing. 1) Modified exposure table to be missing flats, added an override file to link to a different night's calibrations, and allowed the linking job to be submitted and run. The new script successfully ran and produced the desired links. 1) Using dry_run_level=1 showed that it can run and produce scripts using --daily.

There are still more tests to be performed, primarily to run the code without dry_run_level to ensure that outputs are equivalent to those in daily or iron. I will update this PR as those tests are run.

sbailey commented 6 months ago

Thanks! I have run workflow.calibration_selection.find_best_arc_flat_sets on all nights and have started sorting them into various cases. I just added unit tests for two non-standard cases that I think we should try to support. If they are tricky to support right now, we could also flag tests as expected failures and address them in a future PR.

UPDATE: these are fixed in the logic of commit #21f5641 if we further increase the allowable arcflattimediff (current default 20 minutes). TBD.

sbailey commented 5 months ago

We don't have to support every non-standard case in this PR, but FWIW 20211207 is an interesting case. Something went wrong during the desi-calib-03 flats, but the observers were able to retake just that station immediately afterward with minor delay:

EXPID  OBSTYPE SEQNUM SEQTOT EXPTIME  LASTSTEP BADCAMWORD            PROGRAM                    DATETIME       
------ ------- ------ ------ -------- -------- ---------- ----------------------------- -----------------------
112753    flat      1      3 120.0274      all            calib desi-calib-00 leds only 2021-12-07 22:53:27.566
112754    flat      2      3 120.0253      all            calib desi-calib-00 leds only 2021-12-07 22:56:35.603
112755    flat      3      3 120.0282      all            calib desi-calib-00 leds only 2021-12-07 22:59:43.737
112758    flat      1      3 120.0284      all            calib desi-calib-01 leds only 2021-12-07 23:03:18.145
112759    flat      2      3 120.0279      all            calib desi-calib-01 leds only 2021-12-07 23:06:26.175
112760    flat      3      3 120.0292      all            calib desi-calib-01 leds only 2021-12-07 23:09:33.804
112763    flat      1      3 120.0272      all            calib desi-calib-02 leds only 2021-12-07 23:13:09.235
112764    flat      2      3 120.0297      all            calib desi-calib-02 leds only 2021-12-07 23:16:17.282
112765    flat      3      3 120.0293      all            calib desi-calib-02 leds only 2021-12-07 23:19:25.341
112768    flat      1      3 120.0298   ignore            calib desi-calib-03 leds only 2021-12-07 23:23:01.245
112769    flat      2      3 120.0295   ignore            calib desi-calib-03 leds only 2021-12-07 23:26:09.327
112770    flat      3      3 120.0267   ignore            calib desi-calib-03 leds only 2021-12-07 23:29:16.974
112771    flat      1      1    1.002   ignore                 led03 flat for cte check 2021-12-07 23:32:24.615
112777    flat      1      3 120.0296      all            calib desi-calib-03 leds only 2021-12-07 23:48:45.527
112778    flat      2      3 120.0297      all            calib desi-calib-03 leds only 2021-12-07 23:51:53.601
112779    flat      3      3 120.0278      all            calib desi-calib-03 leds only 2021-12-07 23:55:02.063
112780    flat      1      1    1.002      all                 led03 flat for cte check 2021-12-07 23:58:10.332

If the parsing started by removing LASTSTEP='ignore' cases, it probably would have identified this as a normal set of flats. Instead it only includes the EXPID= 112780 one second flat. But perhaps in other case it is helpful for it to distinguish "exposure exists but should be ignored" from "exposure doesn't exist".

UPDATE: this is fixed in the next commit "Re-implement the flat selection..."

sbailey commented 5 months ago

Summary of non-standard calibration cases

Time gaps and/or reversed arc/flat order

Recoverable by adjusting find_best_arc_flat_sets(..., arcflattimediff=20.) threshold, but need to pick what that should be.

UPDATE: arcflattimediff updated to 80 minutes, recovering all of these.

No cals, need to link everything in calibnight

UPDATE: entire night is flagged as LASTSTEP=ignore, so we don't need to process it at all.

Cals partially missing

NIGHT ZERO DARK ARC FLAT Comment
20211128 no no ok ok A: link bias,badcol but not psf,fiberflat
20220923 ok ok no no B: link psf,fiberflat, but not bias,badcol
20221222 ok ok no no B: link psf,fiberflat, but not bias,badcol
20221228 ok ok no no B: link psf,fiberflat, but not bias,badcol
20231216 ok ok no no B: link psf,fiberflat, but not bias,badcol
20210130 ok ok ok no C: link fiberflat, but not bias,badcol,psf
20210218 ok ok ok no C: link fiberflat, but not bias,badcol,psf
20211009 ok ok ok no C: link fiberflat, but not bias,badcol,psf
20221120 ok ok ok no C: link fiberflat, but not bias,badcol,psf
20231018 ok ok ok no C: link fiberflat, but not bias,badcol,psf

IIUC, override include_prefixes and exclude_prefixes are not yet supported. I was surprised to find that these "partial cals" were the dominant case, not "link everything".

Other special cases

Caveat: 20211212 is a case with some cameras flagged as bad butdetermine_calibrations_to_proc still returned a complete set of exposures, which is valid for b and z but not r. There might be other cases with missing cameras that I haven't caught yet.

Already fixed by this PR

Summary

It's ok to punt some of these to future PRs as long as the current PR design doesn't make that needlessly hard. Anything that we do punt to future PRs should be added as new focused tickets when we close this one.

sbailey commented 5 months ago

Time gaps between arcs and flats: the core issue is that changes in the humidity and/or temperature between the arcs and the flats can cause trace shifts that invalidate the ability to extract the flats with psfs made from the arcs. Shifts in y cannot be directly measured because the flats don't have sharp features to calibrate to, thus the need to take arcs and flats back to back before anything changes.

I compared the temperature and humidity differences between the last arc and the first flat for ~80 nights (blue), and compared those to the dT and dH for the 4 nights with larger than normal time differences (orange). The humidity difference is a little on the large side, but not too unusual and it is fine to accept these as normal arc/flat pairs. I'll update the arcflattimediff parameter to 80 minutes to capture all of these cases as normal.

image

sbailey commented 5 months ago

Looks good! It is especially helpful to have good unit tests for this workflow logic. Anthony also ran a night processing daily data with real job submissions, and other non-unittest dry run tests also pass. Merging now.