fermi-lat / Likelihood

BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Floating point exception in gtsrcmaps #83

Closed ldivenere closed 3 years ago

ldivenere commented 4 years ago

I have a "Floating point exception" in gtsrcmaps. After some debugging I noted that the error is raised when calculating the source map for a point source falling in a pixel with a null exposure (I am running the analysis in small time intervals, so there might be regions of the sky with null exposure).

I reproduced the error with the files in this tar archive: https://istnazfisnucl-my.sharepoint.com/:u:/g/personal/ldivenere_infn_it/EfjUVcirg4hNimnYSqv1FtQB3qTvWQguiqygT9G8oFq7Qw?e=juPBSh

This is the output I get:

gtsrcmaps scfile=ft2_merged_month_141.fits sctable="SC_DATA" expcube=ltcube_00.fits \
cmap=ccube_00.fits srcmdl=srcmdl1_00.xml  bexpmap=bexpmap_00.fits wmap=none \
outfile=srcmap1_00.fits irfs="P8R3_SOURCE_V2" evtype=3 convol=yes resample=yes rfactor=2 \
minbinsz=0.05 ptsrc=yes psfcorr=no emapbnds=no edisp_bins=-1 copyall=no chatter=4 \
clobber=yes debug=yes gui=no mode="ql"

This is gtsrcmaps version HEAD
Using evtype=3 (i.e., FRONT/BACK irfs)
ResponseFunctions::load: IRF used: P8R3_SOURCE_V2
  event_types:  0  1
Creating source named 4FGL J1253.3-5816
Creating source named 4FGL J1250.9-4943
Creating source named isodiff
Creating source named galdiff
Generating SourceMap for 4FGL J1250.9-4943 37....................!
Floating point exception

The error seems to come from point-like source "4FGL J1253.3-5816" (which should follow the 4FGL J1250.9-4943) . The "bexpmap_roi_00.fits" file shows that the exposure in the pixel of this source is 0. Any hint on how to solve this issue?

dagorym commented 4 years ago

I can confirm that this is still happening in the current version of the tools (although I had to remove the isodiff sources as the source map file is not a standard one and wasn't included in the tarfile).

That said, I'm not sure what the fix should be. If there is no exposure in a pixel being used, then the source map cannot be generated as there is no way to compute the value of the flux in that pixel. Therefore failing is the correct thing to do. It might be able to fail a bit more gracefully, but I don't believe it should continue and try to create an unusable product. Maybe Eric, who is much more familiar with the workings of the tools and the science, can comment more.

ldivenere commented 4 years ago

A null exposure for a point source should result in zero model counts, in principle without affecting the final result of the analysis. This could be reported with a warning or with a more detailed exception to help the user identifying the source which is causing the error.

dagorym commented 3 years ago

Posted the following to the C&A group for input:

I'm throwing this out to the C&A group for input as the fix to this bug: https://github.com/fermi-lat/Likelihood/issues/83 involves the science analysis and not just how the software is coded.

Currently, when running gtsrcmaps, if the program encounters a pixel in the source map with an exposure value of zero, it crashes with a floating point exception. I haven't dug through the code yet but I suspect the crash is caused by a divide by zero error. The question is, what is the proper way to handle this from a science perspective?

As far as I can tell, there are two main options. First, as is happening now, stop the analysis and don't continue. It should probably be more graceful than a simple crash but there may be performance reasons for that option (I'll be checking). The other option seems to be to just set the flux value at that point to zero and let the analysis continue. I guess a third option would be to take an average of the surrounding pixels but that only really works if it’s a single pixel. If it is an extended region of zero exposure, that isn't going to work (and the analysis should probably exit).

Given the amount of data, this only happens when analyzing short time intervals, but it does come up. What does the C&A group recommend as the solution?

dagorym commented 3 years ago

Okay, I've tracked this down and fixed the code so that gtsrcmaps no longer crashes when it encounters a pixel with zero exposure. A test version is available on the development channel (add -c fermi/label/dev to your conda command) as version 2.0.4.

The problem was only actually occurring if the entire PSF had zero values. This was causing problems in two places. In the first case, the code would try to normalize the PSF by the integrating the values and dividing by the integral, which was zero. In another case (revealed when I fixed the first one), it would do a normalization by the peak pixel value (also zero). Both cases produced a divide by zero value. This was only a problem if the entire PSF was zero as in any other case the integral and peak value would be non-zero.

After reviewing the responses from the C&A group it seemed reasonable to allow the analysis to continue with the zero-valued PSF so I added guards to the code to check for the zero value in the peak pixel or integral and not do the normalization step in that case. Since the values were already zero, leaving them as such didn't actually change anything.

I have run several tests, including the unit tests and the binned likelihood tutorial from the analysis threads and verified that the output source map is the same. after the fix as before. In addition, the example provided with this issue now runs instead of crashing.

The commit that adds these changes is https://github.com/fermi-lat/Likelihood/commit/2c7e1694ad1265cf28fb0e3c6ccfe295710d2a66