desihub / gpu_specter

Scratch work for porting spectroperfectionism extractions to GPUs
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

return xflux in ex2d_patch and use xflux for chi2pix #76

Closed dmargala closed 1 year ago

dmargala commented 1 year ago

Here is a PR to use the unconvolved extracted flux (xflux) instead of the reconvolved flux (flux = R * xflux) in the calculation of chi2pix (see #75 for details).

Summary of changes:

cc: @sbailey @marcelo-alvarez

sbailey commented 1 year ago

@dmargala thanks. The changes look good (though I haven't tried actually running them yet). The test failures are due to latest numpy dropping "asscalar" which is still used by astropy 5.0. See https://github.com/desihub/desispec/commit/19630cbdc766a602d1fd957d44cc6dc9424d8123 for an example test config update to pin the numpy version.

The code test I would (will?) do is run a GPU extraction with main and this branch and verify that in the output file, and verify that the FLUX, IVAR, WAVELENGTH, RESOLUTION, FIBERMAP HDUs are bitwise identical, that CHI2PIX did change, and spot check the chi2pix-related MASK. I'm guessing that changed masking would also change SCORES but I'd have to check more closely to know exactly what pieces.

If you have already done that, please point us to before/after files at NERSC.

dmargala commented 1 year ago

I generated the following two files with gpu_specter from main desi environment and this branch:

/global/cfs/cdirs/desi/users/dmargala/frame-z5-00128680-main.fits.gz
/global/cfs/cdirs/desi/users/dmargala/frame-z5-00128680-xflux-chi2pix.fits.gz

here is code test I ran (desi_extract_spectra command based on the one @marcelo-alvarez shared with me in slack):


# checkout this branch of gpu_specter
git clone git@github.com:desihub/gpu_specter.git
cd gpu_specter
git checkout dmargala/use-xflux-for-chi2pix

# start an interactive session on a perlmutter gpu node
salloc -A nstaff -C gpu -N 1 -q interactive -t 30

# extract frame using gpu_specter from desi environment
source /global/common/software/desi/desi_environment.sh main
srun -n 20 -c 2 --cpu-bind=cores -G 4 desi_mps_wrapper desi_extract_spectra -w 7520.0,9824.0,0.8 -i /global/cfs/cdirs/desi/users/malvarez/spectro/redux/perlval-perlmutter-gpu/preproc/20220404/00128680/preproc-z5-00128680.fits.gz -p /global/cfs/cdirs/desi/users/malvarez/spectro/redux/perlval-perlmutter-gpu/exposures/20220404/00128680/psf-z5-00128680.fits -o /global/cfs/cdirs/desi/users/dmargala/frame-z5-00128680-main.fits.gz --psferr 0.1 --gpu-specter --nsubbundles 5 --mpi --use-gpu --barycentric-correction

# extract frame using this branch of gpu_specter
module unload gpu_specter
export PYTHONPATH=$(pwd)/py:$PYTHONPATH
export PATH=$(pwd)/bin:$PATH
srun -n 20 -c 2 --cpu-bind=cores -G 4 desi_mps_wrapper desi_extract_spectra -w 7520.0,9824.0,0.8 -i /global/cfs/cdirs/desi/users/malvarez/spectro/redux/perlval-perlmutter-gpu/preproc/20220404/00128680/preproc-z5-00128680.fits.gz -p /global/cfs/cdirs/desi/users/malvarez/spectro/redux/perlval-perlmutter-gpu/exposures/20220404/00128680/psf-z5-00128680.fits -o /global/cfs/cdirs/desi/users/dmargala/frame-z5-00128680-xflux-chi2pix.fits.gz --psferr 0.1 --gpu-specter --nsubbundles 5 --mpi --use-gpu --barycentric-correction

Here is the fitsdiff output:

dmargala@nid001640:~/source/desihub/gpu_specter> fitsdiff /global/cfs/cdirs/desi/users/dmargala/frame-z5-00128680-xflux-chi2pix.fits.gz /global/cfs/cdirs/desi/users/dmargala/frame-z5-00128680-main.fits.gz

 fitsdiff: 5.0
 a: /global/cfs/cdirs/desi/users/dmargala/frame-z5-00128680-xflux-chi2pix.fits.gz
 b: /global/cfs/cdirs/desi/users/dmargala/frame-z5-00128680-main.fits.gz
 Maximum number of different data values to be reported: 10
 Relative tolerance: 0.0, Absolute tolerance: 0.0

Primary HDU:

   Headers contain differences:
     Keyword CHECKSUM has different values:
        a> HRB3JQ92HQA2HQ72
        b> IQD2KQ92IQA2IQ92
     Keyword CHECKSUM has different comments:
        a> HDU checksum updated 2022-09-19T16:47:55
         ?                                     ^ ^^
        b> HDU checksum updated 2022-09-19T16:45:36
         ?                                     ^ ^^
     Keyword DATASUM  has different comments:
        a> data unit checksum updated 2022-09-19T16:47:55
         ?                                           ^ ^^
        b> data unit checksum updated 2022-09-19T16:45:36
         ?                                           ^ ^^

Extension HDU 1 (IVAR, 1):

   Headers contain differences:
     Keyword CHECKSUM has different values:
        a> CAp6E4m6C9m6C9m6
        b> DAq6F4n6D9n6D9n6
     Keyword CHECKSUM has different comments:
        a> HDU checksum updated 2022-09-19T16:47:56
         ?                                     ^ ^
        b> HDU checksum updated 2022-09-19T16:45:36
         ?                                     ^ ^
     Keyword DATASUM  has different comments:
        a> data unit checksum updated 2022-09-19T16:47:56
         ?                                           ^ ^
        b> data unit checksum updated 2022-09-19T16:45:36
         ?                                           ^ ^

Extension HDU 2 (MASK, 1):

   Headers contain differences:
     Keyword CHECKSUM has different values:
        a> 6EiH7BZF6BfF6BZF
        b> 6Dhm8AZl6Afl6AZl
     Keyword CHECKSUM has different comments:
        a> HDU checksum updated 2022-09-19T16:47:56
         ?                                     ^ ^
        b> HDU checksum updated 2022-09-19T16:45:36
         ?                                     ^ ^
     Keyword DATASUM  has different values:
        a> 755007
        b> 779839
     Keyword DATASUM  has different comments:
        a> data unit checksum updated 2022-09-19T16:47:56
         ?                                           ^ ^
        b> data unit checksum updated 2022-09-19T16:45:36
         ?                                           ^ ^

   Data contains differences:
     Data differs at [1429, 11]:
        a> 0
        b> 128
     Data differs at [1838, 35]:
        a> 0
        b> 128
     Data differs at [175, 91]:
        a> 0
        b> 128
     Data differs at [176, 91]:
        a> 1
        b> 129
     Data differs at [177, 91]:
        a> 1
        b> 129
     Data differs at [178, 91]:
        a> 1
        b> 129
     Data differs at [1145, 96]:
        a> 129
        b> 1
     Data differs at [1146, 96]:
        a> 129
        b> 1
     Data differs at [2167, 98]:
        a> 1
        b> 129
     Data differs at [2171, 98]:
        a> 0
        b> 128
     ...
     200 different pixels found (0.01% different).

Extension HDU 3 (WAVELENGTH, 1):

   Headers contain differences:
     Keyword CHECKSUM has different values:
        a> hbaThaXRhaaRhaWR
        b> ibaTiaZRiaaRiaYR
     Keyword CHECKSUM has different comments:
        a> HDU checksum updated 2022-09-19T16:47:56
         ?                                     ^ ^
        b> HDU checksum updated 2022-09-19T16:45:36
         ?                                     ^ ^
     Keyword DATASUM  has different comments:
        a> data unit checksum updated 2022-09-19T16:47:56
         ?                                           ^ ^
        b> data unit checksum updated 2022-09-19T16:45:36
         ?                                           ^ ^

Extension HDU 4 (RESOLUTION, 1):

   Headers contain differences:
     Keyword CHECKSUM has different values:
        a> QODJTMD9QMDGQMD9
        b> RNEIUME9RMEGRME9
     Keyword CHECKSUM has different comments:
        a> HDU checksum updated 2022-09-19T16:47:56
         ?                                      ---
        b> HDU checksum updated 2022-09-19T16:45:37
         ?                                     +++
     Keyword DATASUM  has different comments:
        a> data unit checksum updated 2022-09-19T16:47:56
         ?                                            ---
        b> data unit checksum updated 2022-09-19T16:45:37
         ?                                           +++

Extension HDU 5 (FIBERMAP, 1):

   Headers contain differences:
     Keyword CHECKSUM has different values:
        a> 0a2a2a2W0a2a0a2W
        b> 1a3a3a3W1a3a1a3W
     Keyword CHECKSUM has different comments:
        a> HDU checksum updated 2022-09-19T16:47:59
         ?                                     ^ ^
        b> HDU checksum updated 2022-09-19T16:45:39
         ?                                     ^ ^
     Keyword DATASUM  has different comments:
        a> data unit checksum updated 2022-09-19T16:47:59
         ?                                           ^ ^
        b> data unit checksum updated 2022-09-19T16:45:39
         ?                                           ^ ^

Extension HDU 6 (CHI2PIX, 1):

   Headers contain differences:
     Keyword CHECKSUM has different values:
        a> 3Aed61ec38ec38ec
        b> BWkAEVh7BVhABVh7
     Keyword CHECKSUM has different comments:
        a> HDU checksum updated 2022-09-19T16:47:59
         ?                                     ^ ^
        b> HDU checksum updated 2022-09-19T16:45:39
         ?                                     ^ ^
     Keyword DATASUM  has different values:
        a> 4254562950
        b> 2207126867
     Keyword DATASUM  has different comments:
        a> data unit checksum updated 2022-09-19T16:47:59
         ?                                           ^ ^
        b> data unit checksum updated 2022-09-19T16:45:39
         ?                                           ^ ^

   Data contains differences:
     Data differs at [1, 1]:
        a> 1.5359675
        b> 2.1983817
     Data differs at [2, 1]:
        a> 1.6271236
        b> 1.8243557
     Data differs at [3, 1]:
        a> 1.4719399
        b> 1.7377713
     Data differs at [4, 1]:
        a> 1.097299
        b> 1.4232846
     Data differs at [5, 1]:
        a> 0.98081577
        b> 1.3819599
     Data differs at [6, 1]:
        a> 0.7840894
        b> 1.6679285
     Data differs at [7, 1]:
        a> 0.586903
        b> 2.0428832
     Data differs at [8, 1]:
        a> 0.42576614
        b> 3.7747188
     Data differs at [9, 1]:
        a> 0.28300866
        b> 5.3522816
     Data differs at [10, 1]:
        a> 0.31326902
        b> 4.3887677
     ...
     1440500 different pixels found (100.00% different).

Extension HDU 7 (SCORES, 1):

   Headers contain differences:
     Keyword CHECKSUM has different values:
        a> Eh8LFe5KEe5KEe5K
        b> Fh9LGe6KFe6KFe6K
     Keyword CHECKSUM has different comments:
        a> HDU checksum updated 2022-09-19T16:47:59
         ?                                     ^ ^
        b> HDU checksum updated 2022-09-19T16:45:39
         ?                                     ^ ^
     Keyword DATASUM  has different comments:
        a> data unit checksum updated 2022-09-19T16:47:59
         ?                                           ^ ^
        b> data unit checksum updated 2022-09-19T16:45:39
         ?                                           ^ ^
marcelo-alvarez commented 1 year ago

@dmargala thanks for this. I have compared both of the frame files from your test ("gp76-main" and "gp76-xflux-chi2pix"), as well as the original run I did on perlmutter GPUs with main for the same exposure ("pgpu"), to the same frame processed with CPUs on cori ("cori", which is nearly identical to that with CPUs on perlmutter).

As expected, "pgpu" and "gp76-main" agree, showing consistency between our use of gpu_specter with the main branch. Your new changes result in a chi2pix that is indeed different, but there is still at least one bin for which chi2pix>100 on cori, but not with gpu_specter.

bit7-20220404-00128680-z5-fiber-181gpu_v_cori bit7-20220404-00128680-z5-fiber-181gs76-main_v_cori bit7-20220404-00128680-z5-fiber-181gs76-xflux-chi2pix_v_cori

marcelo-alvarez commented 1 year ago

Given this passed the tests @sbailey suggested, should we go ahead and merge?

sbailey commented 1 year ago

Thanks, yes, let's merge this now as a step in the right direction and then continue debugging why gpu_specter and specter don't have closer agreement in chi2pix given that they do have close agreement in reconvolved flux and ivar.