When multiple GPUs are available, we should be taking advantage of that functionality. The existing framework can very nearly already support this. That is, we support multi-threading on the CPU via OpenMP; each thread requires a unique State object. So users can initialize the various GPUs in state objects (each w/an associated software thread) and control them with as many CPU threads as they want by setting OpenMP limits.
Changes required:
remove which_gpu field from CudaExpectedImprovementEvaluator along with the gpu initialization code
add which_gpu field to CudaExpectedImprovementState along with the GPU initialization code.
Note that GPU initialization is expensive so we should limit the number of times we create/destroy these objects (already happens naturally in the optimization code path).
When multiple GPUs are available, we should be taking advantage of that functionality. The existing framework can very nearly already support this. That is, we support multi-threading on the CPU via OpenMP; each thread requires a unique State object. So users can initialize the various GPUs in state objects (each w/an associated software thread) and control them with as many CPU threads as they want by setting OpenMP limits.
Changes required:
which_gpu
field from CudaExpectedImprovementEvaluator along with the gpu initialization codewhich_gpu
field to CudaExpectedImprovementState along with the GPU initialization code.Note that GPU initialization is expensive so we should limit the number of times we create/destroy these objects (already happens naturally in the optimization code path).