Closed dkirkby closed 7 years ago
Changes look good, though I find that it helps with speed but not memory. If that is surprising to you, please double check before merging. Example script is below, run on a mac laptop with /usr/bin/time -lp blat.py [--camera-output]
to get the memory usage ("maximum resident set size"; 16 GB physical memory on the laptop).
number of spectra | camera_output? | Create time [sec] | Simulate time [sec] | Memory [GB] |
---|---|---|---|---|
1000 | yes | 16.3 | 26.7 | 10.7 |
1000 | no | 14.0 | 8.5 | 10.6 |
5000 | yes | 71.0 | 266.1 | 12.5 |
5000 | no | 63.1 | 101.3 | 12.6 |
#!/usr/bin/env python
"""
Testing multiprocessing and functions
"""
import argparse
import time
import numpy as np
from specsim.simulator import Simulator
parser = argparse.ArgumentParser(usage = "{prog} [options]")
parser.add_argument("--camera-output", action="store_true", help="some flag")
args = parser.parse_args()
nspec = 5000
t0 = time.time()
desi = Simulator('desi', num_fibers=nspec, camera_output=args.camera_output)
time_create = time.time() - t0
fluxunits = desi.source.flux_in.unit
num_wlen = len(desi.simulated['wavelength'])
flux = np.zeros((desi.num_fibers, num_wlen)) * fluxunits
flux[:, 0::1000] += 1 * fluxunits
t1 = time.time()
desi.simulate()
time_simulate = time.time() - t1
print('Create {:.2f} sec'.format(time_create))
print('Simulate {:.2f} sec'.format(time_simulate))
Hmm, the memory usage looks fishy. Let me investigate...
Here are some timing benchmarks from my laptop:
Summary:
I added some memory instrumentation to the Simulator:
>>> desi = Simulator('desi', num_fibers=5000, verbose=True, camera_output=True)
...
Allocated 47994.4Mb of table data.
>>> desi = Simulator('desi', num_fibers=5000, verbose=True, camera_output=False)
...
Allocated 43259.8Mb of table data.
My conclusion from this is that the new option is indeed allocating less table memory, although the fractional savings isn't as big I thought (because ndownsample > ncameras). For 5K fibers, the memory savings is 4.7Gb. However, it looks like the table memory is being swapped very effectively, resulting in a max resident size much smaller than the tables themselves.
We might need to convert some columns to float32 and optionally drop some columns entirely to reduce the memory footprint further, but that's beyond the scope of the PR.
Yeowch that's a lot of memory allocation. This also explains why I can only fit one simulator per edison node: although our laptops with <48GB of memory are surprisingly effective at swapping memory out, when multiple processes are trying to allocate that much memory at the same time they run into each other faster than the OS can get them out of the way and they die with memory errors.
So saving memory will effectively be saving time when running these in parallel, but I agree that is beyond the scope of this PR. Let's merge now.
This new mode is enabled via a new arg to the Simulator ctor:
Fixes #66.