aperiosoftware / astropy

Repository for the Astropy core package
www.astropy.org
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Implement FITS Tile Compression Library #17

Closed Cadair closed 1 year ago

Cadair commented 1 year ago

This pull request contains phase one of the work proposed in the cycle 3 funding call.

Background

The FITS file format provides a way of compressing multi-dimensional images to reduce their size on disk. This functionality works by splitting the image up into "tiles" compressing them using one of a selection of compression algorithms and then saving the compressed bytes into a cell in a FITS binary table.

In Astropy this functionality is exposed by CompImageHDU. When loading the data associated with a compressed image, the whole HDU is passed through to a function implemented in C called decompress_hdu (the same happens on compress when writing). This HDU object is then used to pass the whole binary table (and it's heap) to the cfitsio library which returns a complete decompressed array of the image. (Again the same happens for compress but a binary table is returned).

This functionality is the only reason astropy uses cfitsio, yes all 168,643 lines of it. Implementing this functionality this way also means that it is impossible to load and decompress only part of the image, the whole image is always fully loaded.

Description of this PR

This pull request is a big step towards implementing all the FITS tiled image compression in Python. All the functionality of decompress_hdu and compress_hdu currently provided in compressionmodule.c is replaced by equivalent Python code. The compression and decompression of the individual tiles are implemented in the style of numcodecs Codec classes (more on this below). The implementation of the compression and decompression algorithms for RICE_1, PLIO_1 and HCompress are still used from the cfitsio code, but only a small number of files from the cfitsio library are now needed.

While in this PR we do not expose any functionality for (de)compressing individual tiles, the primitives for this work now exist, as it is possible to use these Codec classes to (de)compress the bytes of an individual tile, and the C wrappers for the cfitsio functions only handle (de)compressing a single tile at a time.

Testing

To validate the implementation in this PR we needed to be able to test against FITS files not (de)compressed with Astropy. We approached this in two ways.

1) Some canonical test files downloaded from the FITS compression pages have been added to the repo. These are small, but provide a useful "known good" file for each compression algorithm. 2) A new heavily parametrised test_fitsio.py file has been added. This file uses the fitsio package to write or read files read or written by Astropy and then the data is compared. Running this test file (obviously) needs the fitsio package to be installed, which requires a system install of cfitsio to compile the sdist against. This test file is automatically skipped if fitsio can not be imported, but a tox factor has been added to force it to be run and to require fitsio to ensure it is always run on the CI. 2a) For extra credit and to hunt segfaults :gun:, all these tests have been run with valgrind and no memory leaks in our code (just fitsio) were found.

New Functionality or Bug Fixes in this PR

This PR aimed to change nothing as far as users of the library are concerned. Unless someone was importing directly from astropy.io.fits.compression (which I really hope they weren't) then this change should go unnoticed. However, in the process of rewriting the compression functionality we found some things which were missing from Astropy or the implementations in cfitsio these things are:

Next Steps

This PR could have removed all of cextern/cfitsio but we decided that we shouldn't for the initial review (the diff is obnoxious). We can choose to either do this as the last thing before merging this PR, or in a follow up PR. We did have to already remove some code from imcompress.c here in order to get things to work.

Related Issues

fixe astropy/astropy#11212 fixe astropy/astropy#9715 Will fix astropy/astropy#541 and astropy/astropy#9625 when cfitsio removed This pull request addresses the first checklist item in astropy/astropy#3895 and opens the door to some (or all) of the others.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

github-actions[bot] commented 1 year ago

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?