desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
33 stars 24 forks source link

desi_night_qa should use temporary output filenames #2268

Closed sbailey closed 6 days ago

sbailey commented 1 month ago

desi_night_qa should use temporary output filenames when initially writing files, and then rename them to the final versions only after successfully writing the entire file. When generating nightqa files for Jura, we had problems with jobs timing out while writing the final html file, but because it was writing directly to the final filename, job resubmissions didn't recognize that the file needed to be regenerated unless we used --recompute, which started all over from the beginning.

e.g. in desispec.night_qa.write_night_qa_html, it should do something like

from desispec.io.util import get_tempfilename
tmpfile = get_tempfilename(outfns["html"])
html = open(tmpfile, "w")    # instead of open(outfns["html"], "w")
html.write("<html><body>\n")
...
html.close()
# rename to final output filename only upon successful write
os.rename(tmpfile, outfns["html"])

Please also check+update all similar cases for writing png/pdf/jpeg files.

araichoor commented 1 month ago

sorry for that!

few questions I can think of, for how to code that up:

sbailey commented 1 month ago

Everywhere else in the pipeline we do the tempfile + rename immediately at the location of the write, not by outer level scripts, so I suggest that we keep that same pattern here. i.e. inside each desispec.night_qa function that does a write, and not at the outer desi_night_qa script level.

You are correct that a crash leaves a tmp file behind in the final folder which requires hand cleanup. That's a minor pain, but easily identifiable with find . | grep tmp on a prod. The advantage of this approach is that the tempfile is written to the same filesystem as the final files, so the final os.rename is a nearly instantaneous low-risk operation. Using tempfile.mkdtemp() generates a temporary directory on a different file system such that the final move into place takes a non-trivial amount of time and itself risks leaving a partially written "final" file in place.

araichoor commented 1 month ago

ok perfect, I ll proceed with those guidelines, thanks!

sbailey commented 6 days ago

Fixed by PR #2273; closing.