desihub / redrock

Redshift fitting for spectroperfectionism
BSD 3-Clause "New" or "Revised" License
21 stars 13 forks source link

rrdesi option to output best-fit models #283

Closed abhi0395 closed 2 months ago

abhi0395 commented 4 months ago

This pull request is for https://github.com/desihub/redrock/issues/270

The PR also adds support for redshift scan for coadded spectra (already coadded across the cameras) in the archetype mode. I also added support to get bands from target objects. It's very useful in case we want to match wavehashes with band names and save the data using band keywords, particularly in model files.

Sanity checks:

Archetype mode (main branch):

srun -n 64 -c 2 rrdesi_mpi -i /global/cfs/cdirs/desi/spectro/redux/iron/tiles/cumulative/80605/20210205/coadd-6-80605-thru20210205.fits -o main_test.fits -d main_test.h5 --archetypes /global/homes/a/abhijeet/software/desisoft/new-archetypes/rrarchetype-galaxy.fits

on this branch:

srun -n 64 -c 2 rrdesi_mpi -i /global/cfs/cdirs/desi/spectro/redux/iron/tiles/cumulative/80605/20210205/coadd-6-80605-thru20210205.fits -o branch_test.fits -d branch_test.h5 --archetypes /global/homes/a/abhijeet/software/desisoft/new-archetypes/rrarchetype-galaxy.fits

comparison

tt_main = Table.read('main_test.fits', hdu=1)
tt_branch = Table.read('branch_test.fits', hdu=1)
np.count_nonzero(tt_main['Z']-tt_branch['Z'])
Out[1]: 0
np.count_nonzero(tt_main['CHI2']-tt_branch['CHI2'])
Out[2]: 0

Along with this, we can now run rrdesi --model for both coadded and uncoadded spectra

rrdesi --model option

I have added rrdesi --model option in the current redrock. If run with this option, the code now also returns the best-fit models for each target in the input coadd file and stores them in a separate file. The data format is similar to the coadd file (meaning if the coadd file has B, R, Z_FLUX, then the model file also has the B, R, Z_MODEL and corresponding wavelengths. It also saves the redrock file's fits header in the model file, which includes information such TARGETID, Z, ZWARN, SPECTYPE,... and COEFFTYPE (which stores if the model is based on PCA, NMF, or ARCHETYPE). The resolution matrix is applied while estimating the best-fit model.

If coadded (across cameras) spectrum is provided (including resolution matrix) then also code should work.

The code adds another ~1s in no archetype mode and ~4s in archetype mode (I am sure can be optimized) to estimate/save the final models.

Test runs (uncoadded spectra):

Without archetypes:

srun -n 64 -c 2 rrdesi_mpi -i /global/cfs/cdirs/desi/spectro/redux/iron/tiles/cumulative/80605/20210205/coadd-6-80605-thru20210205.fits -o test.fits -d test.h5 --model test_model.fits

With archetypes:

srun -n 64 -c 2 rrdesi_mpi -i /global/cfs/cdirs/desi/spectro/redux/iron/tiles/cumulative/80605/20210205/coadd-6-80605-thru20210205.fits -o test.fits -d test.h5 --model test_model.fits --archetypes $ARCHETYPES

The headers of model file look like :

TTYPE9  = 'SUBTYPE '                                                            
TFORM9  = '20A     '                                                            
TTYPE10 = 'NCOEFF  '                                                            
TFORM10 = 'K       '                                                            
TTYPE11 = 'DELTACHI2'                                                           
TFORM11 = 'D       '                                                            
TTYPE12 = 'COEFFTYPE'                                                           
TFORM12 = '3A      '                                                            
EXTNAME = 'REDSHIFTS'                                                           

# HDU 2 in test_model.fits:
XTENSION= 'IMAGE   '           / Image extension                                
BITPIX  =                  -64 / array data type                                
NAXIS   =                    1 / number of array dimensions                     
NAXIS1  =                 2751                                                  
PCOUNT  =                    0 / number of parameters                           
GCOUNT  =                    1 / number of groups                               
EXTNAME = 'B_WAVELENGTH'       / extension name                                 
BUNIT   = 'Angstrom'                                                            

# HDU 3 in test_model.fits:
XTENSION= 'IMAGE   '           / Image extension                                
BITPIX  =                  -64 / array data type                                
NAXIS   =                    2 / number of array dimensions                     
NAXIS1  =                 2751                                                  
NAXIS2  =                   62                                                  
PCOUNT  =                    0 / number of parameters                           
GCOUNT  =                    1 / number of groups                               
EXTNAME = 'B_MODEL '           / extension name                                 

# HDU 4 in test_model.fits:
XTENSION= 'IMAGE   '           / Image extension                                
BITPIX  =                  -64 / array data type                                
NAXIS   =                    1 / number of array dimensions                     
NAXIS1  =                 2326                                                  
PCOUNT  =                    0 / number of parameters                           
GCOUNT  =                    1 / number of groups                               
EXTNAME = 'R_WAVELENGTH'       / extension name         

for coadded (across cameras) headers of the model file look like:

TFORM8  = '6A      '                                                            
TTYPE9  = 'SUBTYPE '                                                            
TFORM9  = '20A     '                                                            
TTYPE10 = 'NCOEFF  '                                                            
TFORM10 = 'K       '                                                            
TTYPE11 = 'DELTACHI2'                                                           
TFORM11 = 'D       '                                                            
TTYPE12 = 'COEFFTYPE'                                                           
TFORM12 = '3A      '                                                            
EXTNAME = 'REDSHIFTS'                                                           

# HDU 2 in pca_coadd_model.fits:
XTENSION= 'IMAGE   '           / Image extension                                
BITPIX  =                  -64 / array data type                                
NAXIS   =                    1 / number of array dimensions                     
NAXIS1  =                 7781                                                  
PCOUNT  =                    0 / number of parameters                           
GCOUNT  =                    1 / number of groups                               
EXTNAME = 'BRZ_WAVELENGTH'     / extension name                                 
BUNIT   = 'Angstrom'                                                            

# HDU 3 in pca_coadd_model.fits:
XTENSION= 'IMAGE   '           / Image extension                                
BITPIX  =                  -64 / array data type                                
NAXIS   =                    2 / number of array dimensions                     
NAXIS1  =                 7781                                                  
NAXIS2  =                  500                                                  
PCOUNT  =                    0 / number of parameters                           
GCOUNT  =                    1 / number of groups                               
EXTNAME = 'BRZ_MODEL'          / extension name       

Example run and model spectra:

from desispec.io import read_spectra, write_spectra
spec = read_spectra('/global/cfs/cdirs/desi/spectro/redux/iron/tiles/cumulative/80605/20210205/coadd-6-80605-thru20210205.fits')

modelfile = '/global/homes/a/abhijeet/redrock_archetype_project/pca_coadd_model.fits'

tt = fits.open(modelfile)
z = tt["REDSHIFTS"].data['Z']
np.testing.assert_array_equal(tt['REDSHIFTS'].data['TARGETID'], spectra.fibermap["TARGETID"])
model = {}
model['Z'] = z.copy()
for band in spectra.bands:
    model[band] = tt[f'{band.upper()}_MODEL'].data

def plot_flux_and_model(model, spec, tid=None):
    if tid is None:
        tid = np.random.choice(spec.fibermap["TARGETID"])
    ii = np.where(spec.fibermap["TARGETID"]==tid)[0]
    z = model['Z'][ii]
    fig = plt.figure(figsize=(12, 4))
    plt.grid(True)
    for band in spec.bands:
        plt.plot(spec.wave[band], spec.flux[band][ii][0], 'k')
        plt.plot(spec.wave[band], model[band][ii][0], label=band+'-model')
    plt.title('TID = %s, z = %.4f'%(spec.fibermap["TARGETID"][ii].data, z))
    plt.legend()
    plt.xlabel('Obs wave [ang]')
    plt.show()

plot_flux_and_model(model, spec, tid=None)
Screenshot 2024-02-22 at 3 41 59 PM
coveralls commented 4 months ago

Coverage Status

coverage: 38.787% (-0.7%) from 39.528% when pulling 4c6bd9829724b2fe902afac95a2932f084fc2e97 on best_fit_model into 04653df87a123c920446bbf4380f68bfdf190457 on main.

sbailey commented 2 months ago

There are a few merge conflicts to get this branch back in sync with other changes from main. I'm going to try to resolve them, but will want to do some more testing with fresh eyes before merging so heads up -- this branch might become temporarily broken.

sbailey commented 2 months ago

I reran with and without archetypes, and with multiprocessing and MPI, and verified that the output models look good when plotting on top of the data, while also being different between archetypes vs. PCA templates as expected. Looks good.