AlecThomson / arrakis

BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Lock threads for cutouts #78

Closed AlecThomson closed 1 month ago

AlecThomson commented 1 month ago

We have repeatedly hit errors like the following

  File "/datasets/work/sa-mhongoose/work/mambaforge/envs/arrakis310/lib/python3.10/site-packages/arrakis/cutout.py", line 360, in worker
    image_update = cutout_image(
  File "/datasets/work/sa-mhongoose/work/mambaforge/envs/arrakis310/lib/python3.10/site-packages/arrakis/cutout.py", line 204, in cutout_image
    fits.writeto(
  File "/datasets/work/sa-mhongoose/work/mambaforge/envs/arrakis310/lib/python3.10/site-packages/astropy/io/fits/convenience.py", line 464, in writeto
    hdu.writeto(
  File "/datasets/work/sa-mhongoose/work/mambaforge/envs/arrakis310/lib/python3.10/site-packages/astropy/io/fits/hdu/base.py", line 412, in writeto
    hdulist.writeto(name, output_verify, overwrite=overwrite, checksum=checksum)
  File "/datasets/work/sa-mhongoose/work/mambaforge/envs/arrakis310/lib/python3.10/site-packages/astropy/io/fits/hdu/hdulist.py", line 1031, in writeto
    fileobj = _File(fileobj, mode=mode, overwrite=overwrite)
  File "/datasets/work/sa-mhongoose/work/mambaforge/envs/arrakis310/lib/python3.10/site-packages/astropy/io/fits/file.py", line 218, in __init__
    self._open_filename(fileobj, mode, overwrite)
  File "/datasets/work/sa-mhongoose/work/mambaforge/envs/arrakis310/lib/python3.10/site-packages/astropy/io/fits/file.py", line 640, in _open_filename
    self._overwrite_existing(overwrite, None, True)
  File "/datasets/work/sa-mhongoose/work/mambaforge/envs/arrakis310/lib/python3.10/site-packages/astropy/io/fits/file.py", line 521, in _overwrite_existing
    os.remove(self.name)
FileNotFoundError: [Errno 2] No such file or directory: '/scratch3/projects/spiceracs/processing/56614_arrakis/cutouts/RACS_1837-09_3951/RACS_1837-09_3951.cutout.image.restored.q.RACS_1837-09.contcube.beam14.conv.fits

Even though an ls shows the file does exist, or ahead of runtime the file didn't exist. Whether this is a quirk of the Lustre filesystem, a lack of thread-safety in astropy.io, both, or something else is not clear.

What has solved the issue, though, is to hold a threadlock around the fits.write functions. Great success.