Use TableHDUs - Githubissues

kadrlica commented 3 years ago

Summarizing our conversations on Slack in preparation for an eventual PR...

Currently we store metadata in several (N,M) arrays stored as IMAGE_HDUs

  4      IMAGE_HDU       IMG_MIN
  5      IMAGE_HDU       IMG_SCALE
  6      IMAGE_HDU       PSF_MIN
  7      IMAGE_HDU       PSF_SCALE

In addition, we have one TableHDU

  1      BINARY_TBL      CUTOUT_ID

There was a proposal to combine these into one TableHDU called INFO or something that contained (N,M) columns for IMG_MIN, IMG_SCALE, PSF_MIN, and PSF_SCALE. This would be similar in format to how MOF/SOF band-by-band information is stored. I argue that it would be more natural to the user to find this metadata in a single TableHDU rather than separate ImageHDUs. I've written a converter script, but it would be better to integrate this natively.

#!/usr/bin/env python
__author__ = "Alex Drlica-Wagner"

import numpy as np
import pylab as plt
import fitsio

filename = 'gladders_cutouts_20210114/cutouts.fits'
filename = 'schechter_cutouts_20210114/cutouts.fits'

f = fitsio.FITS(filename)
nrows  = f['CUTOUT_ID'].get_nrows()
nbands = 4

# Create and fill the table
dtypes =[('CUTOUT_ID','S14'),
        ('IMG_MIN',(float,4)),
        ('IMG_SCALE',(float,4)),
        ('PSF_MIN',(float,4)),
        ('PSF_SCALE',(float,4))
]
table = np.zeros(nrows,dtype=dtypes)
for name,dt in dtypes:
    table[name] = f[name].read()

# Write a new FITS file
outfile = 'new_cutouts.fits'
out = fitsio.FITS(outfile,'rw',clobber=True)

print("Writing HDU: %s"%"INFO")
out.write(table,extname="INFO")

for extname in ['IMAGE','PSF','MASK']:
    print("Writing HDU: %s "%extname)
    out.write(f[extname].read(),header=f[extname].read_header(),extname=extname)

out.close()

out_table = fitsio.read(outfile,ext='INFO')
print(out_table['IMG_SCALE'][-1,:])
print(out_table['IMG_SCALE'][-1,2])

rmorgan10 commented 3 years ago

We can certainly integrate this. Another aspect of this issue is updating the Redmine documentation to reflect the slightly adjusted format

rmorgan10 commented 3 years ago

@kadrlica Quick question before I start implementing this. Could you provide the lines of code that a user would use to access the IMG_SCALE and IMG_MIN information if they wanted to rescale the pixel values of all the images?

Currently, they would do:

img_arr = hdu_list[2].data    
img_min = hdu_list[4].data    
img_scale = hdu_list[5].data  
recovered_arr = img_arr / 65535 * img_scale[:,:,np.newaxis,np.newaxis] + img_min[:,:,np.newaxis,np.newaxis]

How would this process work with the new (N,M) format you're proposing?

kadrlica commented 3 years ago

Should be the same. My pyfits is a bit rusty, but I think it is this:

img_arr = f['IMAGE'].data
img_min = f['INFO'].data['IMG_MIN']
img_scale = f['INFO'].data['IMG_SCALE']
recovered_arr = img_arr / 65535 * img_scale[:,:,np.newaxis,np.newaxis] + img_min[:,:,np.newaxis,np.newaxis]

I'll send you a link to a script on the DES cluster.

kadrlica commented 3 years ago

Another suggestion that I mentioned on Slack was using the conventional BZERO, BSCALE instead of IMG_MIN and IMG_SCALE definitions, since the conversion would be more intuitive to the long-time FITS user.

Here is the conversion function that is currently in stack2image.py

def make_bvalues(amin,ascale):
    """Convert stack scale and min values to FITS BSCALE and BZERO.                                                               

    Parameters                                                                                                                    
    ----------                                                                                                                    
    amin   : array minimum                                                                                                        
    ascale : array scale value                                                                                                    

    Returns                                                                                                                       
    -------                                                                                                                       
    bscale, bzero : FITS header keyword values                                                                                    
    """
    bscale0 = 1.0
    bzero0  = 32768
    amax0   = 65535
    bscale = ascale/amax0 * bscale0
    bzero  = bzero0 * (ascale/amax0) + amin
    return bscale, bzero

rmorgan10 commented 3 years ago

@kadrlica Regarding the new TableHDU format (I haven't made it to the BZERO BSCALE stuff yet), how is this looking to you?

(des20a) [rmorgan@des81 full_tile_test]$ python
Python 3.7.7 (default, Mar 26 2020, 15:48:22) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from astropy.io import fits
>>> hdu = fits.open('/data/des81.b/data/stronglens/Y6_CUTOUT_IMAGES/DES2228+0001.fits')
>>> hdu.info()
Filename: /data/des81.b/data/stronglens/Y6_CUTOUT_IMAGES/DES2228+0001.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU       4   ()      
  1  IMAGE         1 ImageHDU        12   (45, 45, 4, 24845)   int16 (rescales to uint16)   
  2  PSF           1 ImageHDU        16   (25, 25, 4, 24845)   int16 (rescales to uint16)   
  3  INFO          1 BinTableHDU     19   24845R x 5C   [14A, 4D, 4D, 4D, 4D]   
>>> hdu['INFO'].columns
ColDefs(
    name = 'ID'; format = '14A'
    name = 'IMG_MIN'; format = '4D'
    name = 'IMG_SCALE'; format = '4D'
    name = 'PSF_MIN'; format = '4D'
    name = 'PSF_SCALE'; format = '4D'
)
>>>

rmorgan10 commented 3 years ago

Actually I think I answered my own question by looking in the directory you sent me and comparing to the files you produced. Looks like a match!

rmorgan10 commented 3 years ago

Regarding switching to signed integers for the pixel values in the IMG and PSF HDUs and using BSCALE and BZERO for the recovery information (now that I've had a chance to look into it), my personal preference is to keep things as they are. We certainly could make the change, but I am leaning more towards the 0-65535 pixel values as opposed to the negative pixel values, since the 0-65535 pixel values will be aligned with what deep learning analyses are expecting. I'm happy to discuss the benefits / drawbacks in more depth though!

kadrlica commented 3 years ago

I think the only advantage is that the unwashed masses will "intuitively" understand what BSCALE and BZERO mean. Since these stacks are designed for ML, I'm fine keeping it that way.

rmorgan10 commented 3 years ago

That's was my understanding too. I think using different keywords like IMG_SCALE and IMG_MIN will make people aware of the scaling being done differently. I could write up an example for the Redmine page to show how to use the stacks from a BZERO / BSCALE approach (I'll be updating it anyways so it's no trouble)

DES-SL / Y6_Bulk_Coadd_Cutouts

Use TableHDUs #24