Proc 3: File "/global/homes/a/abhijeet/software/desisoft/redrock/py/redrock/zscan.py", line 311, in per_camera_coeff_with_least_square_batch
zzchi2[j], zzcoeff[j] = calc_zchi2_batch(spectra, tdata2, weights, flux, wflux, 1, nbasis, solve_matrices_algorithm=method.upper(), solver_args=solver_args, prior=prior, use_gpu=use_gpu)
Proc 3: File "/global/homes/a/abhijeet/software/desisoft/redrock/py/redrock/zscan.py", line 739, in calc_zchi2_batch
M += prior
Proc 3: ValueError: operands could not be broadcast together with shapes (7,7) (9,9) (7,7)
Explanation and solution
The key point is that the archetype method first solves the matrix for each archetype in the loop and selects the nearest (N-best) neighbors based on the final chi2 array and then solves again, including those nearest archetypes. The error comes because the prior matrix is defined based on the --archetype-nearest argument at the start of the algorithm, which is larger than expected when it solves the linear equation for the single archetype.
I have now corrected the logic in case --archetype-nnearest is ever used in archetype mode in archetypes.get_best_archetype. I also made necessary changes in archetypes.nearest_neighbour_model and zscan.per_camera_coeff_with_least_square_batch to correct this issue. Additionally, I have added a print statement in desi.py to print the statement for clarity in case --archetype-nnearest is used.
Test run
Running rrdesi from new branch abhijeet_archetypes_nnearest succeeds without any error:
On new branch I ran with and without archetype (NO --archetype-nnearest was used) and checked the outputs from main branch.
1) abhijeet_archetypes_nnearest branch::
Without archetype:
/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/archetype_nnearest/no_archetype-test-2-10752-thru20240117.fits
With archetype (no nearest neighbours):
/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/archetype_nnearest/archetype_no_nearest_nbh-test-2-10752-thru20240117.fits
2) main branch::
Without archetype:
/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/main/no_archetype-test-2-10752-thru20240117.fits
With archetype (no nearest neighbours):
/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/main/archetype-test-2-10752-thru20240117.fits
I compared these output files with each other, and the answers did not change.
coverage: 33.114% (-0.08%) from 33.193%
when pulling d20a481e2844ba841392a0961c641b01eec85d87 on abhijeet_archetypes_nneareast
into f1a1d06d5199b482382b9cecf67be7c9c3dde761 on main.
When running
rrdesi
in archetype mode with--archetype-nnearest
argument, it fails citing array mismatch error:srun -n 32 -c 4 rrdesi_mpi -i /global/cfs/cdirs/desi/spectro/redux/daily/tiles/cumulative/10752/20240117/coadd-2-10752-thru20240117.fits -o archetype-test-2-10752-thru20240117_1.fits -d details.h5 --archetypes /global/homes/a/abhijeet/software/desisoft/new-archetypes/rrarchetype-galaxy.fits --archetype-nnearest 3
Error:
Explanation and solution
The key point is that the archetype method first solves the matrix for each archetype in the loop and selects the nearest (N-best) neighbors based on the final chi2 array and then solves again, including those nearest archetypes. The error comes because the prior matrix is defined based on the
--archetype-nearest
argument at the start of the algorithm, which is larger than expected when it solves the linear equation for the single archetype.I have now corrected the logic in case
--archetype-nnearest
is ever used in archetype mode inarchetypes.get_best_archetype
. I also made necessary changes inarchetypes.nearest_neighbour_model
andzscan.per_camera_coeff_with_least_square_batch
to correct this issue. Additionally, I have added a print statement indesi.py
to print the statement for clarity in case--archetype-nnearest
is used.Test run
Running rrdesi from new branch
abhijeet_archetypes_nnearest
succeeds without any error:srun -n 32 -c 4 rrdesi_mpi -i /global/cfs/cdirs/desi/spectro/redux/daily/tiles/cumulative/10752/20240117/coadd-2-10752-thru20240117.fits -o /global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/archetype_nnearest/archetype_nearest_nbh-test-2-10752-thru20240117.fits -d details.h5 --archetypes /global/homes/a/abhijeet/software/desisoft/new-archetypes/rrarchetype-galaxy.fits --archetype-nnearest 3
Sanity checks
On new branch I ran with and without archetype (NO
--archetype-nnearest
was used) and checked the outputs frommain
branch.1)
abhijeet_archetypes_nnearest
branch::Without archetype:
/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/archetype_nnearest/no_archetype-test-2-10752-thru20240117.fits
With archetype (no nearest neighbours):
/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/archetype_nnearest/archetype_no_nearest_nbh-test-2-10752-thru20240117.fits
2)
main
branch::Without archetype:
/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/main/no_archetype-test-2-10752-thru20240117.fits
With archetype (no nearest neighbours):
/global/cfs/cdirs/desi/users/abhijeet/test_archetype_runs/main/archetype-test-2-10752-thru20240117.fits
I compared these output files with each other, and the answers did not change.
Checks: