desihub / redrock

Redshift fitting for spectroperfectionism
BSD 3-Clause "New" or "Revised" License
21 stars 13 forks source link

Legendre nnls merge with main #307

Open craigwarner-ufastro opened 1 month ago

craigwarner-ufastro commented 1 month ago

Created this branch to sort out the differences between main and legendre_nnls. Many changed had been made to main since legendre_nnls was branched off and some conflicts had to be manually resolved, particularly with bands in archetypes.py.

coveralls commented 1 month ago

Coverage Status

coverage: 38.513% (-0.5%) from 39.022% when pulling 425d3fc9a373a4c9dd00053cb2d73eec45e85f07 on legendre_nnls_merge_with_main into ca7838158ab7da085ae14b6da909a32bbdcf9dba on main.

craigwarner-ufastro commented 1 month ago

@sbailey Filling in more context, this branch was created to rectify conflicts between legendre_nnls and main that had popped up due to changes to main after branching off legendre_nnls. The changes to the legendre_nnls branch included refactoring code such that everything related to the trick of using negative legendre terms and NNLS instead of BVLS was in per_camera_coeff_with_least_square_batch in zscan.py. Additionally, the output was changed to remove the 0 coefficients in the solution - when we create negative copies of the legendre terms and fit 2*nleg + narch terms, half of the coefficients in the fit are 0 - either the positive or negative version of each legendre term. If the negative legendre term has a non-zero coefficent, we multiply that coefficient by -1. Thus our output is exactly the same as if BVLS had been used.

When this was initially pushed out, there were conflicts, which I created a new branch to resolve.

  1. The addition by Abhijeet of bands (list): list of wavelength bands

instead of ncam as a parameter to per_camera_coeff_with_least_square_batch.

  1. Moving the prior calculation out of fitz.py and into zscan.py which is where we want it. The best way to proceed moving the legendre-specific manipulations out of archetypes is what I sent you last Monday, that instead of passing tdata to per_camera_coeff_with_least_square_batch, we pass binned to per_camera_coeff_with_least_square_batch and assemble the tdata array there. This consolidates everything related to the BVLS/NNLS trick within that single method per_camera_coeff_with_least_square_batch.

  2. The output COEFF should be collapsed to be the same size as if we had not added the negative legendre terms by removing the zero terms and if a negative term was nonzero, multiply that by -1.E.g. if we have a row [r_i, l1_c1_i, l2_c1_i, -l1_c1_i, -l2_c1_i, l1_c2_i, l2_c2_i, -l1_c2_i, -l2_c2_i, l1_c3_1, l2_c3_i, -l1_c3_i, -l2_c3_i] Where r_i is the rebinned data for that wavelength and ln_cm is the nth legendre term for the mth camera for wavelength index i and let's say it equals [1, 2, 0, 0, 3, 0, 0, 4, 5, 6, 7, 0, 0] then we want that row in the output to be collapsed to [1, 2, -3, -4, -5, 6, 7]

After attempting to resolve conflicts, and realizing I needed to update to a newer desi environment, everything ran happily but the results did not match those from legendre_nnls previously so I created this new branch to handle the merge.

After a lot of playing around with various branches in various desi environments, I have finally satisfied myself that everything is ok with my branch to merge into main. There were major output differences between running in environment 23.1 vs 24.4 AND major differences between my legendre_nnls and legendre_nnls_merge_with_main branches running under 24.4. I have finally concluded all of these are due to template changes and code changes unrelated to anything in my updates.If I run from the main branch without archetypes, I now get an identical output to legendre_nnls_merge_with_main. If I run with archetypes, I get an np.allclose agreement between them:

cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/abhijeet-nnls-merge-244.fits /pscratch/sd/c/cdwarner/abhijeet-main-244.fits
col='Z'  ndiff=0  std=0.0
col='ZERR'  ndiff=0  std=0.0
col='CHI2'  ndiff=416  std=1.7036060042842667e-10

FYI here are the differences in my legendre_nnls branch with exact same code and environment 23.1 vs 24.4:

cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/redrock-nnls-231.fits /pscratch/sd/c/cdwarner/redrock-nnls-244.fits 
col='Z'  ndiff=421  std=2.0492627134013167e-05
col='ZERR'  ndiff=446  std=4.637423724708316e-07
col='CHI2'  ndiff=412  std=42.88757255960495

And even bigger the difference between legendre_nnls and legendre_nnls_merge_with_main

cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/redrock-nnls-244.fits /pscratch/sd/c/cdwarner/redrock-nnls-merge-244.fits 
col='Z'  ndiff=355  std=0.2950015931780382
col='ZERR'  ndiff=411  std=1.406718431201316e-05
col='CHI2'  ndiff=351  std=5.980265673243861e+83

Keep in mind that this is without archetypes! - so not touching anything I recently did. Clearly one file had a 9e99 flag set in one code version vs the other.With --archetypes,

cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/abhijeet-nnls-244.fits /pscratch/sd/c/cdwarner/abhijeet-nnls-merge-244.fits 
col='Z'  ndiff=311  std=0.6753668041725044
col='ZERR'  ndiff=362  std=0.00024064833100371678
col='CHI2'  ndiff=447  std=3206.6975053943816

But again all of these differences seem to come from the templates and code independent of anything I did and the main branch seems to agree with my legendre_nnls_merge_with_main branch.

I did notice that there were changes in the templates between the various DESI environments and between the main/merged code versus legendre_nnls in the print statements on reading the templates.

Finally there is still a significant slowdown in the new version in the fine redshift scan that I am investigating that is independent of archetypes and any code I had touched in the legendre_nnls branch that I attribute to a change in templates as a null hypothesis.