Jash-Shah / pyGnuastro

A package with module functions for performing astronomical calculations, data manipulation and analysis.
https://jash-shah.github.io/pyGnuastro/index.html
GNU General Public License v3.0
9 stars 2 forks source link

Preserving the WCS #3

Open Jash-Shah opened 1 year ago

Jash-Shah commented 1 year ago

Each FITS extension/HDU contains a raw dataset which can either be a table or an image along with some header keywords. The keywords can be used to store meta-data about the actual dataset.

Currently the pygnuastro.fits.img_write() functions only writes the raw pixel data to the output image. This means that important keywords (for eg: the wcs headers like CRPIX, CRVAL, PC, CDELT, etc) are not written. Gnuastro uses custom data structures for storing these keywords(FITS header keywords) and then passes it as one of the arguments to the gal_fits_img_write function.

In pyGnuastro we can try to emulate the same by defining our own custom Extension type

mohammad-akhlaghi commented 1 year ago

I just changed the title of this to be "Preserve WCS". In summary, we don't want all keywords! Except the WCS keywords, we don't need to keep any of those keyword values. This is also done in the Gnuastro programs, and from the Modularity principle of the Unix philosophy ("Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features"). All the extra headers will just slow down the job and are rarely necessary! If a user wants to preserve certain set of them in their output, they can call the respective function to add them manually.

In the particular case of WCS, we don't want to keep the low-level raw keywords! We want the wcsprm structure of WCSLIB. Upon reading a file, we can also read its WCS (using gal_wcs_read of Gnuastro wcs.h and preserve the resulting wcsprm in the structure that we pass back to the Python environment.

@Jash-Shah: Does Numpy's C structure have any extra pointers that are left untouched for high-level users?

If not, we will need to define a higher-level structure that has a Numpy structure as a sub-element. Within that higher-level data structure, we can keep the WCS and any other thing we may need. Python users can also access Numpy's structure with something like below (calling Numpy's sum operator):

img=pygnuastro.fits.img_read(filename="input.fits", hdu="1")
img.np.sum()

But the good thing with Gnuastro's special structure is that it can also keep the WCS of the input, and put it in the output, or use/modify it in operators like Warping or Cropping and etc, behind the scenes (at a low-level), so a normal user won't be bothered by having to worry about it :wink:!

What do you think?

Jash-Shah commented 1 year ago

I agree with going the route of having a custom data structure, which will of course use Numpy to store the pixel data, and also have something similar to a wcsprm.

The Numpy structure (PyArrayObejct) only has attributes that have meta-data about the array itself.

So thinking along the route of a custom high-level data type, we should try to emulate a general case data-type like gal_data_t which can start with just the NumPy array and the wcsprm-esque structure. We can then work on a converter between this and gal_data_t so that in cases where a gal_data_t type output is expected from a function, we don't restrict ourselves to the attributes of a NumPy array.

mohammad-akhlaghi commented 1 year ago

Very nice summary :+1:! Once that translator is ready, we can simply add it at the start of every wrapper function and simply use gal_data_t for calling Gnuastro's library functions (many functions can benefit, and offer extra features by knowing the WCS :smiley:).

pedramardakani commented 1 year ago

So in a nutshell, numpy only carries the raw array of the image. We must add the wcsprm (and potentially other properties in future) ourselves right?

mohammad-akhlaghi commented 1 year ago

Exactly!