Open craigwarner-ufastro opened 1 month ago
@sbailey
Filling in more context, this branch was created to rectify conflicts between legendre_nnls
and main
that had popped up due to changes to main
after branching off legendre_nnls
. The changes to the legendre_nnls
branch included refactoring code such that everything related to the trick of using negative legendre terms and NNLS instead of BVLS was in per_camera_coeff_with_least_square_batch
in zscan.py. Additionally, the output was changed to remove the 0 coefficients in the solution - when we create negative copies of the legendre terms and fit 2*nleg + narch terms, half of the coefficients in the fit are 0 - either the positive or negative version of each legendre term. If the negative legendre term has a non-zero coefficent, we multiply that coefficient by -1. Thus our output is exactly the same as if BVLS had been used.
When this was initially pushed out, there were conflicts, which I created a new branch to resolve.
bands (list): list of wavelength bands
instead of ncam
as a parameter to per_camera_coeff_with_least_square_batch
.
Moving the prior calculation out of fitz.py and into zscan.py which is where we want it. The best way to proceed moving the legendre-specific manipulations out of archetypes is what I sent you last Monday, that instead of passing tdata to per_camera_coeff_with_least_square_batch, we pass binned to per_camera_coeff_with_least_square_batch and assemble the tdata array there. This consolidates everything related to the BVLS/NNLS trick within that single method per_camera_coeff_with_least_square_batch.
The output COEFF should be collapsed to be the same size as if we had not added the negative legendre terms by removing the zero terms and if a negative term was nonzero, multiply that by -1.E.g. if we have a row [r_i, l1_c1_i, l2_c1_i, -l1_c1_i, -l2_c1_i, l1_c2_i, l2_c2_i, -l1_c2_i, -l2_c2_i, l1_c3_1, l2_c3_i, -l1_c3_i, -l2_c3_i]
Where r_i is the rebinned data for that wavelength and ln_cm is the nth legendre term for the mth camera for wavelength index i and let's say it equals
[1, 2, 0, 0, 3, 0, 0, 4, 5, 6, 7, 0, 0]
then we want that row in the output to be collapsed to
[1, 2, -3, -4, -5, 6, 7]
After attempting to resolve conflicts, and realizing I needed to update to a newer desi environment, everything ran happily but the results did not match those from legendre_nnls
previously so I created this new branch to handle the merge.
After a lot of playing around with various branches in various desi environments, I have finally satisfied myself that everything is ok with my branch to merge into main. There were major output differences between running in environment 23.1 vs 24.4 AND major differences between my legendre_nnls and legendre_nnls_merge_with_main branches running under 24.4. I have finally concluded all of these are due to template changes and code changes unrelated to anything in my updates.If I run from the main branch without archetypes, I now get an identical output to legendre_nnls_merge_with_main. If I run with archetypes, I get an np.allclose agreement between them:
cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/abhijeet-nnls-merge-244.fits /pscratch/sd/c/cdwarner/abhijeet-main-244.fits
col='Z' ndiff=0 std=0.0
col='ZERR' ndiff=0 std=0.0
col='CHI2' ndiff=416 std=1.7036060042842667e-10
FYI here are the differences in my legendre_nnls branch with exact same code and environment 23.1 vs 24.4:
cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/redrock-nnls-231.fits /pscratch/sd/c/cdwarner/redrock-nnls-244.fits
col='Z' ndiff=421 std=2.0492627134013167e-05
col='ZERR' ndiff=446 std=4.637423724708316e-07
col='CHI2' ndiff=412 std=42.88757255960495
And even bigger the difference between legendre_nnls and legendre_nnls_merge_with_main
cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/redrock-nnls-244.fits /pscratch/sd/c/cdwarner/redrock-nnls-merge-244.fits
col='Z' ndiff=355 std=0.2950015931780382
col='ZERR' ndiff=411 std=1.406718431201316e-05
col='CHI2' ndiff=351 std=5.980265673243861e+83
Keep in mind that this is without archetypes! - so not touching anything I recently did. Clearly one file had a 9e99 flag set in one code version vs the other.With --archetypes,
cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/abhijeet-nnls-244.fits /pscratch/sd/c/cdwarner/abhijeet-nnls-merge-244.fits
col='Z' ndiff=311 std=0.6753668041725044
col='ZERR' ndiff=362 std=0.00024064833100371678
col='CHI2' ndiff=447 std=3206.6975053943816
But again all of these differences seem to come from the templates and code independent of anything I did and the main branch seems to agree with my legendre_nnls_merge_with_main branch.
I did notice that there were changes in the templates between the various DESI environments and between the main/merged code versus legendre_nnls
in the print statements on reading the templates.
Finally there is still a significant slowdown in the new version in the fine redshift scan that I am investigating that is independent of archetypes and any code I had touched in the legendre_nnls
branch that I attribute to a change in templates as a null hypothesis.
Created this branch to sort out the differences between main and legendre_nnls. Many changed had been made to main since legendre_nnls was branched off and some conflicts had to be manually resolved, particularly with bands in archetypes.py.