Tighter (and Less Arbitrary) Bundling

This version of the code dramatically changes the bundling algorithm in order to change from using fairly arbitrary choices of "number of instances per node" and "number of threads per node" to instead use the combination of imSim performance and machine requirements. Specifically:

mem_per_thread = the amount of memory an imSim thread maxes out on. mem_per_instance = the amount of shared memory an imSim instance takes. mem_per_node = the total amount of memory available on a node WITH SOME BUFFER SUBTRACTED. threads_per_node = the total amount of hardware threads available for processing jobs.

With this set-up in place, we now dynamically determine the remaining memory and allowed threads on each bundle so as to pack work more optimally. Note, this does not account for any sort of multiprocessing driven refilling, which would require a considerably different architecture to this bundler (and likely would be a simpler case of each node gets one visit with all the sensors on it).

LSSTDESC / DESC_DC2_imSim_Workflow

Tighter (and Less Arbitrary) Bundling #26