Add support for using a Simulator without per-camera outputs

dkirkby commented 7 years ago

This new mode is enabled via a new arg to the Simulator ctor:

    camera_output : bool
        Include per-camera output tables in simulation results when True.
        When this is False, our ``camera_output`` attribute will return an
        empty list and the ``num_source_electrons_*`` columns in our
        ``simulated`` table will not be resolution convolved.
        Setting this parameter to False will save memory and time when
        per-camera outputs are not needed.

Fixes #66.

sbailey commented 7 years ago

Changes look good, though I find that it helps with speed but not memory. If that is surprising to you, please double check before merging. Example script is below, run on a mac laptop with /usr/bin/time -lp blat.py [--camera-output] to get the memory usage ("maximum resident set size"; 16 GB physical memory on the laptop).

number of spectra	camera_output?	Create time [sec]	Simulate time [sec]	Memory [GB]
1000	yes	16.3	26.7	10.7
1000	no	14.0	8.5	10.6
5000	yes	71.0	266.1	12.5
5000	no	63.1	101.3	12.6

#!/usr/bin/env python

"""
Testing multiprocessing and functions
"""

import argparse
import time
import numpy as np
from specsim.simulator import Simulator

parser = argparse.ArgumentParser(usage = "{prog} [options]")
parser.add_argument("--camera-output", action="store_true", help="some flag")
args = parser.parse_args()

nspec = 5000
t0 = time.time()
desi = Simulator('desi', num_fibers=nspec, camera_output=args.camera_output)
time_create = time.time() - t0

fluxunits = desi.source.flux_in.unit
num_wlen = len(desi.simulated['wavelength'])
flux = np.zeros((desi.num_fibers, num_wlen)) * fluxunits
flux[:, 0::1000] += 1 * fluxunits

t1 = time.time()
desi.simulate()
time_simulate = time.time() - t1
print('Create {:.2f} sec'.format(time_create))
print('Simulate {:.2f} sec'.format(time_simulate))

dkirkby commented 7 years ago

Hmm, the memory usage looks fishy. Let me investigate...

dkirkby commented 7 years ago

Here are some timing benchmarks from my laptop: timing

Summary:

The initialization time has a constant component that dominates with <~ 100 fibers.
The simulation time scales linearly with the number of fibers.
Suppressing output tables has a minimal impact on initialization time and gives a factor of 3-4 speedup during simulation.

dkirkby commented 7 years ago

I added some memory instrumentation to the Simulator:

>>> desi = Simulator('desi', num_fibers=5000, verbose=True, camera_output=True)
...
Allocated 47994.4Mb of table data.
>>> desi = Simulator('desi', num_fibers=5000, verbose=True, camera_output=False)
...
Allocated 43259.8Mb of table data.

My conclusion from this is that the new option is indeed allocating less table memory, although the fractional savings isn't as big I thought (because ndownsample > ncameras). For 5K fibers, the memory savings is 4.7Gb. However, it looks like the table memory is being swapped very effectively, resulting in a max resident size much smaller than the tables themselves.

We might need to convert some columns to float32 and optionally drop some columns entirely to reduce the memory footprint further, but that's beyond the scope of the PR.

sbailey commented 7 years ago

Yeowch that's a lot of memory allocation. This also explains why I can only fit one simulator per edison node: although our laptops with <48GB of memory are surprisingly effective at swapping memory out, when multiple processes are trying to allocate that much memory at the same time they run into each other faster than the OS can get them out of the way and they die with memory errors.

So saving memory will effectively be saving time when running these in parallel, but I agree that is beyond the scope of this PR. Let's merge now.

desihub / specsim

Add support for using a Simulator without per-camera outputs #67