Closed me-manu closed 3 years ago
The first thing you can try is to update to the latest version of the tools we released this week, 1.2.23. Unfortunately, I don't think any of the changes in that release would have fixed this type of issue, but it's worth a shot.
We have another open issue with a core dump by gtrsrcmaps that we never resolved. I wonder if they have the same cause. Please point us to the location of the files, so we can test with builds that have extra debug information turned on.
Thanks for the answer. I just tried with 1.2.23 without success. The input ft1 file is located here:
/nfs/slac/g/ki/ki20/cta/mmeyer/projects/VarFSRQs/Output/Crab/Crab-Sync-All/100MeV-1GeV/weekly/t001/ft1_00.fits
The spacecraft file is here:
/nfs/slac/g/ki/ki20/cta/mmeyer/projects/VarFSRQs/Output/Crab/ft2/30s.fits
And the xml file I'm using resides here:
/nfs/slac/g/ki/ki20/cta/mmeyer/projects/VarFSRQs/Output/Crab/Crab-Sync-All/100MeV-1GeV/weekly/t001/avgspec_00.xml
Everything else you need should be in the par files I posted earlier. If not please let me know!
I grabbed your files and ran gtsrcmaps on the command line like you did above. It ran to completion on my mac and a local RHEL6 system I tried. Which OS are you using on the batch farm, CentOS7?
Sorry for my slow response. I log into the ki-ls machines, which are RHEL6, I think. I submit my batch jobs to RHEL6 machines only. Did you use the same time window as in my par file? For most light curve bins the analysis works and only for some it fails.
Hi, I have run into this issue many times as well. When computing e.g. a few hundreds lightcurves, 1% of the time-bins crash and most problems are related to this srcmap issue. I made several tests. I run with ki-ls machines, both command line and submitting jobs to other hosts. I tried also e.g. rhel6-64f command line and still get "Floating point exception".
I checked carefully one such time bins. The ft1 file looks good, I computed an exposure map (gtexpmap) and it looks good. I tried to run the analysis with the unbinned method and it worked successfully, completing the likelihood fit. So there is valuable scientific information that is missed. This is really problematic.
@sarabuson Do you have a set of test files I could use to recreated this issue?
This may be the same problem as this issue: https://github.com/fermi-lat/Likelihood/issues/83 but would like to verify. The files referenced by @me-manu are not longer present on the SLAC computers for me to grab.
But given the error me-manu posted from fermipy, it may be that the source map file is not being created properly as that fits error is very specific.
I'm going to mark this one as resolved unless it comes up again with the new version of gtsrcmaps and someone can provide sample files that reproduce the issue.
Hi Tom, this is not solved. It comes up any time I run light curves.. I have been very busy with other work but will provide you with the files and more info shortly (I have many of them..).
Okay, once you send me some files, I'll track this one down. Thanks.
I've provided Tom with a (not-)working example and post here few additional useful info just for future reference.
What I found after five full days of investigation:
Okay, the issue was a divide by zero (as expected) in a part of the code that was generating a scaling factor by dividing by the integral over the PSF. For sources with no exposure, this PSF is zero, hence the error. I added a guard that checked for the zero valued integral and just set the scaling factor to zero in that case since zero time zero will still be zero, this doesn't affect the analysis but prevents the crash.
Testing between the current version and the fixed version so that they produce the identical source maps on "good" data with no zero exposure sources but the fixed version handles the zero exposure objects gracefully simply providing zero valued source maps.
Built and tagged as version 2.0.5. You can install with the following command for testing:
conda create -n fermitest -c conda-forge -c fermi -c fermi/label/dev fermitools=2.0.5
These newest versions seem to have resolved the issue and Sara has reported that it is working now.
I'm trying to generate a weekly light curve for the Crab Nebula, so I'm running a standard analysis for each week separately. For 4 time bins I get a strange error when the srcmap is computed. Within fermipy I get (I already included a handling of the exception when the setup fails and I check for the number of events in the ft1 file. It's small but should still be ok):
So then, I tried to run gtsrcmaps from the command line. I get the following:
I'm working on the SLAC batch farm with the developer's version of the fermitools 1.2.2.
I also checked the mission time line, https://fermi.gsfc.nasa.gov/ssc/observations/timeline/posting/, but did not find anything weird in mission week 520 which my time bin corresponds to.
I'm also attaching my par files for the analysis so that you can try to recreate the error. The necessary ft1 and xml files are all at SLAC, I can point you to the directories if you like. par_files.zip