Restrict threads in GPU workers

mrocklin commented 3 years ago

Dask benefits from extra configuration when using GPUs. We currently don't do any of this, but we should.

There are a few things to getting GPUs working right. These aren't universally desired, which makes things complex.

One worker per GPU (probably not an issue for us right now, but may be in the future
One thread per GPU (we can fix this now, easily)
Advanced networking on clouds with infiniband with UCX (probably not worth doing yet, at least not on AWS (no infiniband))
Use the RAPIDS Memory Manager and configure rapids/xgboost/whatever else uses it
Set up device-memory/host-memory/disk memory hierarchy

Some of these things are done for us if we use the dask_cuda.CUDAWorker class rather than the dask.distributed.Worker class. Short term I propose that if the software environment includes dask_cuda that we make this change by default.

necaris commented 3 years ago

Matthew Rocklin @.***> writes:

Some of these things are done for us if we use the dask_cuda.CUDAWorker class rather than the dask.distributed.Worker class. Short term I propose that if the software environment includes dask_cuda that we make this change by default.

@selshowk can you add this as a ticket to Gitlab, as part of the scoping that you're doing around GPU support?

selshowk commented 3 years ago

Sure @necaris I'll add it. I recall we used to suggest using the CUDAWorker class but at some point there was some incompatibility (it may have lagged behind a distributed release) and someone might have suggested it was no longer required so we stopped recommending it in the docs. If its best practice to use it then we should definitely go back to that (NOTE: all of this is basically just documentation now as things like the worker class are specified by the user in the cluster config rather than being triggered automatically by the GPU flag).

ntabris commented 2 years ago

I've recently made some tweaks (not yet deployed) to get CUDAWorker working with v2 clusters. After this is deploy, Ben Zaitlen is planning to play around with some multi-gpu instances and we'll plan to revise our GPU doc as appropriate.

coiled / feedback

Restrict threads in GPU workers #154