When simulating a full 5000 DESI fibers at an input resolution of 0.2 Angstroms, we can only fit one simulator per edison node due to memory usage. PR #67 identified that it is allocating >40 GB of memory even when the per-camera resolution convolved outputs are turned off. This issue documents some ideas for saving memory, which would allow us to run multiple simulators per node and thus is effectively ideas for increasing the speed of processing at scale.
Changes to specsim itself
Use float32 instead of float64 in the Simulator.simulated table. Could be done for all columns except wavelength (consistency of delta-wavelength from bin-to-bin gets problematic at single precision).
Split the num_source_electrons_{camera} etc. columns into separate Tables that only track the wavelengths that are actually covered by each camera. Currently ~2/3 of the entries for these columns are zeros.
Optionally drop some of the columns.
Make num_dark_electrons_{camera} and read_noise_electrons_{camera} 1D instead of 2D. They currently have separate but identical entries for each fiber.
Usage of specsim
Simulate only 500 fibers at a time, building up the subset of the necessary information for 5000 fibers as separate arrays external to specsim.
Simulate each camera with independent Simulator objects to avoid the many zeros from wavelengths outside the range for each camera in the Simulator.simulated table.
When simulating a full 5000 DESI fibers at an input resolution of 0.2 Angstroms, we can only fit one simulator per edison node due to memory usage. PR #67 identified that it is allocating >40 GB of memory even when the per-camera resolution convolved outputs are turned off. This issue documents some ideas for saving memory, which would allow us to run multiple simulators per node and thus is effectively ideas for increasing the speed of processing at scale.
Changes to specsim itself
Use float32 instead of float64 in the
Simulator.simulated
table. Could be done for all columns except wavelength (consistency of delta-wavelength from bin-to-bin gets problematic at single precision).Split the
num_source_electrons_{camera}
etc. columns into separate Tables that only track the wavelengths that are actually covered by each camera. Currently ~2/3 of the entries for these columns are zeros.Optionally drop some of the columns.
Make
num_dark_electrons_{camera}
andread_noise_electrons_{camera}
1D instead of 2D. They currently have separate but identical entries for each fiber.Usage of specsim
Simulate only 500 fibers at a time, building up the subset of the necessary information for 5000 fibers as separate arrays external to specsim.
Simulate each camera with independent Simulator objects to avoid the many zeros from wavelengths outside the range for each camera in the
Simulator.simulated
table.