[C++] add multi-GPU support

When multiple GPUs are available, we should be taking advantage of that functionality. The existing framework can very nearly already support this. That is, we support multi-threading on the CPU via OpenMP; each thread requires a unique State object. So users can initialize the various GPUs in state objects (each w/an associated software thread) and control them with as many CPU threads as they want by setting OpenMP limits.

Changes required:

remove which_gpu field from CudaExpectedImprovementEvaluator along with the gpu initialization code
add which_gpu field to CudaExpectedImprovementState along with the GPU initialization code.

Note that GPU initialization is expensive so we should limit the number of times we create/destroy these objects (already happens naturally in the optimization code path).

Yelp / MOE

[C++] add multi-GPU support #398