desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
33 stars 24 forks source link

20221209, tile = 83202, ztile slurm file not being created/submitted #2102

Open abhi0395 opened 10 months ago

abhi0395 commented 10 months ago

I am trying to re-process the tile 83202 (Night: 20221209, EXPID: 157138 - 157139), but there appears to be an issue in creating and submitting the ztile-83202-thru20221209.slurm file for this tile.

I purged the tile using desi_purge_tilenight and then submitted the tile for reprocessing using desi_run_night. However, only the tilenight-20221209-83202.slurm file is created and submitted. desi_purge_tilenight doesn't create or submit the ztile-83202-thru20221209.slurm. I don't understand why?

Below are the commands that I use:

To purge the tile: desi_purge_tilenight -n 20221209 -t 83202 --not-dry-run

To re-process the tile: desi_run_night -n 20221209 --tiles="83202" --z-submit-types=cumulative --laststeps="all,skysub,fluxcal" --all-tiles --append-to-proc-table -q realtime &>>${HOME}/daily/logfiles/daily-20221209_4.log &

I also tried desi_run_night with --dry-run-level 1 flag, but it also doesn't create the ztile-83202-thru20221209.slurm file. Below is the output:

>> desi_run_night -n 20221209 --tiles="83202" --z-submit-types=cumulative --laststeps="all,skysub,fluxcal" --all-tiles --append-to-proc-table -q realtime --dry-run-level=1

Copying only the last few lines for clarity:

INFO:procfuncs.py:1204:submit_tilenight:  
INFO:procfuncs.py:1205:submit_tilenight: Running tilenight.

INFO:procfuncs.py:395:create_batch_script: Creating tilenight script for tile 83202
INFO:batch.py:74:default_system: Guessing default batch system perlmutter-gpu
Wrote /global/cfs/cdirs/desi/spectro/redux/daily/run/scripts/night/20221209/tilenight-20221209-83202.slurm
logfile will be /global/cfs/cdirs/desi/spectro/redux/daily/run/scripts/night/20221209/tilenight-20221209-83202-JOBID.log

INFO:procfuncs.py:415:create_batch_script: Outfile is: /global/cfs/cdirs/desi/spectro/redux/daily/run/scripts/night/20221209/tilenight-20221209-83202.slurm
INFO:procfuncs.py:528:submit_batch_script: ['sbatch', '--parsable', '/global/cfs/cdirs/desi/spectro/redux/daily/run/scripts/night/20221209/tilenight-20221209-83202.slurm']
INFO:procfuncs.py:529:submit_batch_script: Submitted /global/cfs/cdirs/desi/spectro/redux/daily/run/scripts/night/20221209/tilenight-20221209-83202.slurm with dependencies  and reservation=None. Returned qid: 92900550
INFO:procfuncs.py:1124:submit_redshifts:  
INFO:procfuncs.py:1125:submit_redshifts: Running redshifts.

INFO:queue.py:255:update_from_queue: qtable not provided, querying Slurm using ptable's LATEST_QID set
INFO:queue.py:185:queue_info_from_qids: Dry run, would have otherwise queried Slurm with the following: sacct -X --parsable2 --delimiter=, --format=jobid,jobname,partition,submit,eligible,start,end,elapsed,state,exitcode -j 3974489,3974490,3974494,3974495,3974496,3974497,3974498,3974499,3974502,3974503,3974504,3974505,3974506,3974507,3974508,3974509,3974510,3974511,3974512,3974514,3974515,3974519,3974520,3974521,3974522,3974523,3974524,3974525,3974526,3974527,3974535,3974538,3974539,3974540,3974541,3974542,3974543,3974544,3974549,3974550,3974551,3974552,3974553,3974554,3974555,3974557,3974560,3974561,3974562,3974563,3974564,3974565,3974568,3974569,14314381,14314387,92900550
INFO:queue.py:261:update_from_queue: Slurm returned information on 5 jobs out of 57 jobs in the ptable. Updating those now.
INFO:queue.py:268:update_from_queue: Will be verifying that the file names are consistent
Completed submission of exposures for night 20221209.
akremin commented 10 months ago

Hi @abhi0395, this is one of a number of issues in using our current reprocessing script to run on the daily prod. I am in the process of trying to mitigate this with my refactor of the pipeline scripts.

You need to add --all-cumulatives to tell it to process cumulative redshifts even though there are later nights with available data. By default we only run cumulative redshifts for the last night of data for a production run, rather than all nights with data. But for daily we do want this, which is why we have the flag.

I will update the online documentation at https://desi.lbl.gov/trac/wiki/Pipeline/DailyOps to include that flag.

**Please also note that because there are later nights with redshifts, you will need to rerun the redshifts for those later nights as well. The purge_tilenight script deletes those too because they include the data you are removing. You will need to rerun those after reprocessing the impacted exposures.

abhi0395 commented 10 months ago

Hi @akremin, thank you so much for your detailed explanation. I will work on this again with these instructions!