desihub / redrock

Redshift fitting for spectroperfectionism
BSD 3-Clause "New" or "Revised" License
22 stars 14 forks source link

--archetype_nnearest crashing on main branch #272

Closed abhi0395 closed 8 months ago

abhi0395 commented 8 months ago

When running rrdesi in archetype mode with --archetype-nnearest argument, it fails citing array mismatch error:

srun -n 32 -c 4 rrdesi_mpi -i /global/cfs/cdirs/desi/spectro/redux/daily/tiles/cumulative/10752/20240117/coadd-2-10752-thru20240117.fits -o archetype-test-2-10752-thru20240117_1.fits -d details.h5 --archetypes /global/homes/a/abhijeet/software/desisoft/new-archetypes/rrarchetype-galaxy.fits --archetype-nnearest 3

Error:

Proc 3:   File "/global/homes/a/abhijeet/software/desisoft/redrock/py/redrock/zscan.py", line 311, in per_camera_coeff_with_least_square_batch
    zzchi2[j], zzcoeff[j] = calc_zchi2_batch(spectra, tdata2, weights, flux, wflux, 1, nbasis, solve_matrices_algorithm=method.upper(), solver_args=solver_args, prior=prior, use_gpu=use_gpu)
Proc 3:   File "/global/homes/a/abhijeet/software/desisoft/redrock/py/redrock/zscan.py", line 739, in calc_zchi2_batch
    M += prior
Proc 3: ValueError: operands could not be broadcast together with shapes (7,7) (9,9) (7,7) 

Explanation and solution

The key point is that the archetype method first solves the matrix for each archetype in the loop and selects the nearest (N-best) neighbors based on the final chi2 array and then solves again, including those nearest archetypes. The error comes because the prior matrix is defined based on the --archetype-nearest argument at the start of the algorithm, which is larger than expected when it solves the linear equation for the single archetype.

I have now corrected the logic in case --archetype-nnearest is ever used in archetype mode in archetypes.get_best_archetype. I also made necessary changes in archetypes.nearest_neighbour_model and zscan.per_camera_coeff_with_least_square_batch to correct this issue. Additionally, I have added a print statement in desi.py to print the statement for clarity in case --archetype-nnearest is used.

Test run

Running rrdesi from new branch abhijeet_archetypes_nnearest succeeds without any error:

srun -n 32 -c 4 rrdesi_mpi -i /global/cfs/cdirs/desi/spectro/redux/daily/tiles/cumulative/10752/20240117/coadd-2-10752-thru20240117.fits -o /global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/archetype_nnearest/archetype_nearest_nbh-test-2-10752-thru20240117.fits -d details.h5 --archetypes /global/homes/a/abhijeet/software/desisoft/new-archetypes/rrarchetype-galaxy.fits --archetype-nnearest 3

Sanity checks

On new branch I ran with and without archetype (NO --archetype-nnearest was used) and checked the outputs from main branch.

1) abhijeet_archetypes_nnearest branch::

Without archetype: /global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/archetype_nnearest/no_archetype-test-2-10752-thru20240117.fits

With archetype (no nearest neighbours): /global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/archetype_nnearest/archetype_no_nearest_nbh-test-2-10752-thru20240117.fits

2) main branch::

Without archetype: /global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/main/no_archetype-test-2-10752-thru20240117.fits

With archetype (no nearest neighbours): /global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/main/archetype-test-2-10752-thru20240117.fits

I compared these output files with each other, and the answers did not change.

Checks:

ref_file1 = '/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/main/no_archetype-test-2-10752-thru20240117.fits'
ref_file2='/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/archetype_nnearest/no_archetype-test-2-10752-thru20240117.fits'
tt1 = Table.read(ref_file1, hdu=1)
tt2 = Table.read(ref_file2, hdu=1)
np.allclose(tt1['Z'], tt2['Z'])
np.count_nonzero(tt1['Z']-tt2['Z']) 
ref_file1 = '/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/main/archetype-test-2-10752-thru20240117.fits'
ref_file2='/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/archetype_nnearest/archetype_no_nearest_nbh-test-2-10752-thru20240117.fits'
tt1 = Table.read(ref_file1, hdu=1)
tt2 = Table.read(ref_file2, hdu=1)
np.allclose(tt1['Z'], tt2['Z'])
np.count_nonzero(tt1['Z']-tt2['Z']) 
coveralls commented 8 months ago

Coverage Status

coverage: 33.114% (-0.08%) from 33.193% when pulling d20a481e2844ba841392a0961c641b01eec85d87 on abhijeet_archetypes_nneareast into f1a1d06d5199b482382b9cecf67be7c9c3dde761 on main.