CMB-S4 / spt3g_software

Public Subset of Analysis Software Developed by the SPT-3G Collaboration
https://cmb-s4.github.io/spt3g_software/
BSD 2-Clause "Simplified" License
11 stars 11 forks source link

GZIP_2 compression of a flat-sky map seems lossy in some cases #135

Closed Wei-Q closed 9 months ago

Wei-Q commented 9 months ago

When we save a flat-sky map in a FITS file via maps.fitsio.save_skymap_fits, the default option for the argument compress is 'GZIP_2'. This compression scheme is supposed to be lossless, but that does not seem to be the case when I tested the compression scheme on an SPT-3G winter field coadd.

Here is an example:

In[4]: frame = core.G3File(path_to_a_coadd).next()

# Default save

In [5]: maps.fitsio.save_skymap_fits(output_path, frame["T"], Q=frame["Q"], U=frame["U"])

In [6]: frame_debug = maps.fitsio.load_skymap_fits(output_path)

In [7]: numpy.allclose(frame["T"], frame_debug["T"])
Out[7]: False

# No-compression save

In [18]: maps.fitsio.save_skymap_fits(output_path, frame["T"], Q=frame["Q"], U=frame["U"], compress=False, overwrite=True)

In [19]: frame_debug = maps.fitsio.load_skymap_fits(output_path)

In [20]: numpy.allclose(frame["T"], frame_debug["T"])
Out[20]: True
arahlin commented 9 months ago

I think this isn't surprising, since floating-point data are difficult to compress without loss. See for example, the astropy fits documentation:

Floating point FITS images (which have BITPIX = -32 or -64) usually contain too much ‘noise’ in the least significant bits of the mantissa of the pixel values to be effectively compressed with any lossless algorithm. Consequently, floating point images are first quantized into scaled integer pixel values (and thus throwing away much of the noise) before being compressed with the specified algorithm (either GZIP, RICE, or HCOMPRESS). This technique produces much higher compression factors than simply using the GZIP utility to externally compress the whole FITS file, but it also means that the original floating point value pixel values are not exactly preserved. When done properly, this integer scaling technique will only discard the insignificant noise while still preserving all the real information in the image. The amount of precision that is retained in the pixel values is controlled by the quantize_level parameter. Larger values will result in compressed images whose pixels more closely match the floating point pixel values, but at the same time the amount of compression that is achieved will be reduced. Users should experiment with different values for this parameter to determine the optimal value that preserves all the useful information in the image, without needlessly preserving all the ‘noise’ which will hurt the compression efficiency.

One solution would be to expose the quantize_level compression option to allow the user to manually adjust the compression algorithm.

Wei-Q commented 9 months ago

Thank you, Sasha, for looking into this. Adding new options sounds good to me, or perhaps we can just edit some of the things said in the docstring unless most people want to save compressed maps. For example, the docstring says FlatSkyMaps are optionally compressed, but the default value of compress isn't False. I think it may be good to change the default to False unless most people want to save compressed maps. The docstring also says GZIP_1 and GZIP_2 are lossless, but it sounds like that's actually not true, then (unless a FlatSkyMap to be saved has only integer values, I guess).

arahlin commented 9 months ago

Thank you, Sasha, for looking into this. Adding new options sounds good to me, or perhaps we can just edit some of the things said in the docstring unless most people want to save compressed maps. For example, the docstring says FlatSkyMaps are optionally compressed, but the default value of compress isn't False. I think it may be good to change the default to False unless most people want to save compressed maps. The docstring also says GZIP_1 and GZIP_2 are lossless, but it sounds like that's actually not true, then (unless a FlatSkyMap to be saved has only integer values, I guess).

Yeah that's a good point, definitely should update the documentation. @menanteau I believe you changed the default compression for save_skymap_fits initially. Would you be ok with setting the default compression to False, given the above discussion?

menanteau commented 9 months ago

Thanks @arahlin for keeping me in the loop. I do not remembering changing the default compression in save_skymap_fits. But in case it's fine to set the default compression to False. All of my calls to save_skymap_fits actually have set compress explicitly.