Closed sbailey closed 1 month ago
I also looked at flat job runtimes since those sometimes timeout. Our current limit of 20 minutes is already pretty far out on the tail so I think it is better to sometimes let them timeout rather than extend the limit further and waste even more compute time when a job hangs.
This PR includes 3 updates to make running Jura just a bit easier:
write_traces_in_psf
uses an intermediate temporary filenamequeue_info_from_qids
to work in batches of 100 qids at a time when calling sacct. I don't know what the upper limit is, but emperically 8*100 works but 800 doesn't.Details:
write_traces_in_psf
tested withthat also normally worked before, but checks that I don't have typos.
tilenight job runtimes from jura so far: The orange line gives the current job runtime limit, which is why the dots don't exceed that line. I wanted to give a little more time while still keeping the nexp=1 case under the 30 minute debug queue limit (now 26 minutes).
queue_info_from_qids
also tested with real-life usage parsing jura jobs, e.g. $CFS/desi/users/sjbailey/dev/jura/ccdcalib_runtime.pySpeaking of ccdcalib runtimes, I also considered increasing those job runtimes since they sometimes timeout. However, the current limit of 15 minutes is already pretty far into the tail of the regular distribution so I left it as is:
@akremin after review, I suggest that we merge this, create a new incremental tag, and continue with Jura with this version.