desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
36 stars 24 forks source link

tilenight jobs not writing their timing file #2330

Open sbailey opened 3 weeks ago

sbailey commented 3 weeks ago

From the k1 test prod, it appears that tilenight jobs are not writing their timing file. e.g. k1/run/scripts/night/20240204/tilenight-20240204-25777-29505446.log says

Running srun -N 1 -n 64 -c 2 --cpu-bind=cores desi_mps_wrapper desi_proc_tilenight -n 20240204 -t 25777 --mpi --cameras a0123456789 --mpistdstars --laststeps=all --timingfile /global/cfs/cdirs/desi/spectro/redux/k1/run/scripts/night/20240204/tilenight-20240204-25777-timing-29505446.json

and the job exited cleanly, but /global/cfs/cdirs/desi/spectro/redux/k1/run/scripts/night/20240204/tilenight-20240204-25777-timing-29505446.json doesn't exist (nor any other tilenight*.json files, though other timing files do exist).

akremin commented 3 weeks ago

I took a look into this. tilenight jobs have never had timing files, as the tilenight implementation never included them. It accepts the --timingfile argument because it shares an argument parser with desi_proc, but doesn't use it. Under the hood, it makes function calls to desispec.scripts.proc and desispec.scripts.proc_joint_fit which have their own timing file generation code, but they aren't given a --timingfile argument and therefore don't write out json files. Even if they were passed the input timingfile I think we'd have issues because each function call would overwrite that one json file with its own output.

To do a true tilenight timing file we'd need to write all of these individual jsons out, or add an argument to have desi_proc and desi_proc_joint_fit return the json-like dictionary and combine all of the outputs. Alternatively, we could have tilenight define what each of the individual timing files for prestdstar, poststdstar, and stdstar jobs should be and provide the appropriate arguments such that the individual files get written out. That would be less new code development and would only require defining the file names and passing them on to each proc() and proc_joint_fit() function inside the proc_tilenight.py script.