Fermipy multiprocessing memory consumption

jeget commented 6 years ago

I create a GTAnalysis instance with the config

data: evfile : ft1.txt scfile : ft2.txt

fileio: logfile : fermipy.log

binning: roiwidth : 20.0 binsz : 0.2 binsperdec : 10

selection : emin : 100 emax : 100000 zmax : 90 evclass : 128 evtype : 3 target : '3FGL J0721.9+7120'

gtlike: edisp : True irfs : 'P8R2_SOURCE_V6' edisp_disable : ['isodiff','galdiff']

lightcurve: free_params : ['norm', 'alpha'] shape_ts_threshold : 1e6

model: src_roiwidth : 30.0 galdiff : '$FERMI_DIFFUSE_DIR/gll_iem_v06.fits' isodiff : '$FERMI_DIFFUSE_DIR/iso_P8R2_SOURCE_V6_v06.txt' catalogs :

'3FGL'

then optimize ROI after which I delete sources with low npred and there remains 15 point sources + galdiff + isodiff. Then I run lightcurve with weekly time bins and multithread=True. I have 5.2 GB available. I can compute the light curve with 2 threads maximum. With more threads, I run out of memory. With gt_apps and multiprocessing packages and the same analysis region and model, I can run 4 threads. Is there some specific memory consumption in Fermipy or do I do something wrong with it?

dimauromattia commented 6 years ago

Hi @jeget the multithread option splits the analysis of time bins across multiple cores. So probably you run out of memory when you use more threads because you can use at most two cores. The main memory consumption is the creation of the source map and during the light curve process there is source map is generated for each time step. I suggest you to use the option use_scaled_srcmap that generates an approximate source map for each time bin by scaling the source map of the baseline analysis by the relative exposure.

jeget commented 6 years ago

Thank you, @dimauromattia. I tried to use use_scaled_srcmap and it works good, but sometimes I get an error

Analysis failed in time range 436722283 437327083 <type 'exceptions.OSError'> Traceback (most recent call last):

File "", line 1, in runfile('/home/bulat/work/fermi/tools/APML/500GeV/S2_0109+22/fpy3/run3.py', wdir='/home/bulat/work/fermi/tools/APML/500GeV/S2_0109+22/fpy3')

File "/home/bulat/anaconda3/envs/python2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile execfile(filename, namespace)

File "/home/bulat/anaconda3/envs/python2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 94, in execfile builtins.execfile(filename, *where)

File "/home/bulat/work/fermi/tools/APML/500GeV/S2_0109+22/fpy3/run3.py", line 16, in gta.lightcurve('3FGL J0112.1+2245', time_bins=bins, use_scaled_srcmap=True, free_sources=['3FGL J0048.0+2236', 'galdiff', 'isodiff'])

File "/home/bulat/anaconda3/envs/python2/lib/python2.7/site-packages/fermipy/lightcurve.py", line 264, in lightcurve o = self._make_lc(name, **config)

File "/home/bulat/anaconda3/envs/python2/lib/python2.7/site-packages/fermipy/lightcurve.py", line 409, in _make_lc mapo = map(wrap, itimes)

File "/home/bulat/anaconda3/envs/python2/lib/python2.7/site-packages/fermipy/lightcurve.py", line 158, in _process_lc_bin gta.setup()

File "/home/bulat/anaconda3/envs/python2/lib/python2.7/site-packages/fermipy/gtanalysis.py", line 1039, in setup c.setup(overwrite=overwrite)

File "/home/bulat/anaconda3/envs/python2/lib/python2.7/site-packages/fermipy/gtanalysis.py", line 5017, in setup self._bin_data(overwrite=overwrite, **kwargs)

File "/home/bulat/anaconda3/envs/python2/lib/python2.7/site-packages/fermipy/gtanalysis.py", line 5136, in _bin_data run_gtapp('gtbin', self.logger, kw, loglevel=loglevel)

File "/home/bulat/anaconda3/envs/python2/lib/python2.7/site-packages/fermipy/gtanalysis.py", line 198, in run_gtapp stdin, stdout = gtapp.runWithOutput(print_command=False)

File "/home/bulat/ScienceTools-v10r0p5/x86_64-unknown-linux-gnu-libc2.19-0/lib/python/GtApp.py", line 90, in runWithOutput close_fds=True)

File "/home/bulat/anaconda3/envs/python2/lib/python2.7/subprocess.py", line 390, in init errread, errwrite)

File "/home/bulat/anaconda3/envs/python2/lib/python2.7/subprocess.py", line 916, in _execute_child self.pid = os.fork()

OSError: [Errno 12] Cannot allocate memory

I logged the free memory in parallel writing out free -h each second and I see that at the time the error occurred there was 2 GB of available memory. What causes the error then?

dimauromattia commented 6 years ago

Where are you running the analysis? Is in your computer? How many cpus and memory you have? The error you have is probably related to this. It is hard with the informations I have to understand what is exactly the cause of the error.

jeget commented 6 years ago

I run the analysis on my own computer with Ubuntu 16.04 64 bit, I have 8 cores (4x2) and total 8 GB RAM. Before running the analysis, I have 5 GB available. The error I reported also occurs when multithreading is off. In fact, it does not seem to be critical, that is to say I can try to split the whole analysis (which is quite long) in several parts and probably get through successfully. I just thought this error is a bit strange and should not have occurred under conditions I mentioned above.

jeget commented 6 years ago

It does work. Splitting the whole time range in several parts and running the analysis part by part avoids the error. But still it seems rather strange to me... The output lightcurve file is very small, but the memory fills up as if python had stored lots of data (I don't know what data) without freeing memory when a given time bin is completed. Again, it's probably not critical, but kind of inconvenient...

fermiPy / fermipy

Fermipy multiprocessing memory consumption #247