Image compression - Githubissues

LS4 is going to use a lot of disk and HPSS space. We really need to compress our images to mitigate this.

By default, we can use fpack, unless we soon find something better. It seems to work well, but there are a few caveats.

(1) We want to do lossless compression on flags images. They should compress very well. It is possible to do lossless compression with fpack. (2) When reading a weight image from a fpacked image, we should do a pass of setting all values < 0 to 0. Reason: it's possible that the lossy compression will turn 0 into -1e-16 or something stupid like that. Since weight is 1/σ², negative values don't make sense. (Neither does 0, strictly speaking, but 0 weight means infinite noise, means masked pixel, effectively.) (In previous pipelines, I've had a "tinyweight" parameter, and set everything positive smaller than that to 0, to catch 0 being turned into +1e-16. A reasonable tinyweight is one over (the saturation level times sqrt(the e-/ADU gain)), as that's the maximum reasonable variable value we'll ever get. Then divide that value by 2 or 3 for an ever safer tinyweight.)

We might want to try thinking about the parameters for fpack to see if we can avoid screwing up our weight images. In the past, I've found that the defaults usually do better than when I try to fine-tune it, so I don't muck with it. But, we should do some serious testing on both image and weight FITS files to make sure things don't get screwed up.

Here's what I envision:

When we save an image, we use subprocess.run with fpack to turn the .fits file into a .fits.fz file. (I don't think there's a library version of fpack, which is why we resort to subprocess.run.) The saving routine should verify that the .fits.fz file exists, and then delete the .fits file.

All filenames we save in the database should end in .fits.fz. This will probably require a lot of fiddly work, because we probably have .fits all throughout the code base. (Consider a refactor to make a routine for recognizing fits extensions. We could then make whether we save images as '.fits' or '.fits.fz' a configurable option.)

Regardless of how we're configured to save images, we need to be able to read both .fits and .fits.fz files. If we get a file that's not fpacked, it's a bit nutty to have to compress it before we can read it. Note that when you read a .fits.fz file with a single extension, HDU 0 has the fpack header and HUD 1 has what would have been the single HDU 0 if it was just a .fits file. The reader routine should handle this, it should be transparent to the rest of the code (which shouldn't need to know about compression).

The archive should be able to handle this transparently, as files are just files to it. Whatever we save to disk gets pushed to the archive (with the filename and extensions in the database matching the files pushed to the archive).

c3-time-domain / SeeChange

Image compression #357