astropy / astropy

Astronomy and astrophysics core library
https://www.astropy.org
BSD 3-Clause "New" or "Revised" License
4.37k stars 1.75k forks source link

BSCALE disappears when copying ImageHDUs #15255

Closed weaverba137 closed 8 months ago

weaverba137 commented 1 year ago

Description

ImageHDUs that contain BZERO and BSCALE are written out with BSCALE and possibly BZERO missing. Internally it appears that ImageHDU.copy() does not write the BZERO and BSCALE headers to the copy.

This happens even if the HDUList is immediately written out, without even making a copy.

Expected behavior

ImageHDU.copy() should make an exact copy. This is preventing files opened with astropy.io.fits.open() from being written out as an exact copy.

How to Reproduce

I'm using public data from DESI here, but basically this is a very simple ImageHDU that contains BZERO and BSCALE.

>>> from astropy.io import fits
>>> f1 = 'https://data.desi.lbl.gov/public/edr/spectro/redux/fuji/healpix/sv3/dark/99/9994/coadd-sv3-dark-9994.fits'
>>> h1 = fits.open(f1, mode='readonly', memmap=False, cache=True, lazy_load_hdus=False, uint=False, disable_image_compression=True, do_not_scale_image_data=True, character_as_bytes=True, scale_back=True)
>>> h1['Z_MASK'].header
XTENSION= 'IMAGE   '           / Image extension                                
BITPIX  =                   32 / array data type                                
NAXIS   =                    2 / number of array dimensions                     
NAXIS1  =                 2881                                                  
NAXIS2  =                 2058                                                  
PCOUNT  =                    0 / number of parameters                           
GCOUNT  =                    1 / number of groups                               
BSCALE  =                    1                                                  
BZERO   =           2147483648                                                  
EXTNAME = 'Z_MASK  '           / extension name                                 
CHECKSUM= 'f3SAf2S5f2SAf2S5'   / HDU checksum updated 2022-02-15T23:09:33       
DATASUM = '2995782 '           / data unit checksum updated 2022-02-15T23:09:33 
>>> z_mask = h1['Z_MASK'].copy()
>>> z_mask.header
XTENSION= 'IMAGE   '           / Image extension                                
BITPIX  =                   32 / array data type                                
NAXIS   =                    2 / number of array dimensions                     
NAXIS1  =                 2881                                                  
NAXIS2  =                 2058                                                  
PCOUNT  =                    0 / number of parameters                           
GCOUNT  =                    1 / number of groups                               
EXTNAME = 'Z_MASK  '           / extension name                                 
CHECKSUM= 'f3SAf2S5f2SAf2S5'   / HDU checksum updated 2022-02-15T23:09:33       
DATASUM = '2995782 '           / data unit checksum updated 2022-02-15T23:09:33 
>>> 

In this case, when z_mask is written out, the data payload is unchanged from the original (DATASUM is still valid), but because BZERO and BSCALE are missing, the HDU checksum doesn't match.

Another example, showing an attempt to just write out a HDUList to a different file.

>>> from astropy.io import fits
>>> f1 = 'https://data.desi.lbl.gov/public/edr/spectro/redux/fuji/healpix/sv3/dark/99/9994/coadd-sv3-dark-9994.fits'
>>> h1 = fits.open(f1, mode='readonly', memmap=False, cache=True, lazy_load_hdus=False, uint=False, disable_image_compression=True, do_not_scale_image_data=True, character_as_bytes=True, scale_back=True)
>>> h1.writeto('./coadd-sv3-dark-9994.fits', output_verify='warn', overwrite=True, checksum=False)

In the second case, BSCALE is missing from the output file, and the data payload is modified.

Versions

I also checked astropy 5.3.1 with the same result.

>>> import platform; print(platform.platform())
macOS-13.5.1-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.10.12 (main, Jun 10 2023, 10:51:02) [Clang 14.0.3 (clang-1403.0.22.14.1)]
>>> import astropy; print("astropy", astropy.__version__)
astropy 5.2.1
>>> import numpy; print("Numpy", numpy.__version__)
Numpy 1.22.4
>>> import erfa; print("pyerfa", erfa.__version__)
pyerfa 2.0.0.3
>>> import scipy; print("Scipy", scipy.__version__)
Scipy 1.8.1
>>> import matplotlib; print("Matplotlib", matplotlib.__version__)
Matplotlib 3.6.2
>>> 
weaverba137 commented 1 year ago

Even if I manually add BZERO and BSCALE back into the copied ImageHDU header, it is still written out without those keywords.

jdavies-st commented 1 year ago

The fits.open(f, uint=False, do_not_scale_image_data=True) has the effect of ignoring the BZERO and BSCALE on read, which means you get the wrong datatype in h1['Z_MASK'].data. Instead of np.uint32 which is the correct datatype, you get np.int32.

In [13]: hdulist = fits.open("coadd-sv3-dark-9994.fits", memmap=False, lazy_load_hdus=False)

In [19]: hdulist["Z_MASK"].data.dtype.name
Out[19]: 'uint32'

but

In [22]: h1 = fits.open("coadd-sv3-dark-9994.fits", memmap=False, lazy_load_hdus=False, uint=False, do_not_scale_image_data=True)

In [23]: h1['Z_MASK'].data.dtype.name
Out[23]: 'int32'

At this point, all bets are off.

Is there a reason to be ignoring BZERO and BSCALE on read, especially when some of the data HDUs are clearly unsigned integers?

$ fitsinfo coadd-sv3-dark-9994.fits | grep Z_MASK
 16  Z_MASK        1 ImageHDU        12   (2881, 2058)   int32 (rescales to uint32)
weaverba137 commented 1 year ago

The point is to not allow fits.open() to do any "helpful" transformations of the data. I want the low-level bytes, not any interpretation of what the bytes actually mean.

weaverba137 commented 1 year ago

After further testing, I now realize I may have over-thought the options to fits.open(), and that fewer options may actually do what I need. A bit more testing is needed, because #15256 seems to be a genuine bug, and I need to isolate that from other tests.

weaverba137 commented 8 months ago

I'm closing this. I think I over-thought the options to fits.open() and when I use fewer options and take care with HDUList.copy() the problem goes away.