desihub / specex

DESI spectrograph PSF fitting
BSD 3-Clause "New" or "Revised" License
0 stars 4 forks source link

floating point exception while merging bundles of PSFs #29

Closed sbailey closed 5 years ago

sbailey commented 5 years ago

Using specex master as installed at /global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/code/specex/master:

time srun -n 20 -c 2 desi_compute_psf_mpi \
     --input-image /global/project/projectdirs/desi/spectro/redux/sjbailey/preproc/20191024/00020662/preproc-b3-00020662.fits \
     --input-psf /global/project/projectdirs/desi/spectro/desi_spectro_calib/trunk/spec/sp3/psf-sm4-b-science-slit-20191015.fits \
     --output-psf /global/project/projectdirs/desi/spectro/redux/sjbailey/exposures/20191024/00020662/psf-b3-00020662.fits
...
INFO wrote PSF in /global/project/projectdirs/desi/spectro/redux/sjbailey/exposures/20191024/00020662/psf-b3-00020662_11.fits
INFO:specex.py:260:merge_psf: Will merge 20 PSFs in /global/project/projectdirs/desi/spectro/redux/sjbailey/exposures/20191024/00020662/psf-b3-00020662.fits
INFO:specex.py:265:merge_psf: merging /global/project/projectdirs/desi/spectro/redux/sjbailey/exposures/20191024/00020662/psf-b3-00020662_01.fits into /global/project/projectdirs/desi/spectro/redux/sjbailey/exposures/20191024/00020662/psf-b3-00020662_00.fits
srun: error: nid00009: task 0: Floating point exception
srun: Terminating job step 25399083.1
slurmstepd: error: *** STEP 25399083.1 ON nid00009 CANCELLED AT 2019-10-24T20:52:04 ***
srun: error: nid00009: tasks 1-19: Terminated
srun: Force Terminated job step 25399083.1

That specex was compiled with:

source /project/projectdirs/desi/software/desi_environment.sh master
cd /global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/code/specex/master
SPECEX_PREFIX=$(pwd) make -j 16 install
julienguy commented 5 years ago

I had this Floating point exception on another prod. And I got no error when calling the same python merge routine called from a different script on the same data ...

julienguy commented 5 years ago

It is not a specex issue but a desispec issue related to the mpi run. I added to desispec a simple script: desi_merge_psf and it works fine on the above example, so I close this issue in specex and open a similar one on desispec.

desi_merge_psf -i /global/project/projectdirs/desi/spectro/redux/sjbailey/exposures/20191024/00020662/psf-b3-00020662_*.fits -o $SCRATCH/psf.fits
...
INFO:specex.py:311:merge_psf: Wrote PSF /global/cscratch1/sd/jguy/psf.fits