coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Multiple workers on a single GPU instance (aka fractional workers) #193

Open phobson opened 2 years ago

phobson commented 2 years ago

For certain workloads, the optimal cluster will have multiple workers on a single GPU. This currently isn't possible in Coiled.

ntabris commented 2 years ago

One big part of this (I think) is that Coiled doesn't support multiple workers on a single VM/instance. See https://github.com/coiled/product/issues/7 for some discussion.

Maybe the GPU use-case bumps up the priority of multi-worker instances (or maybe not).

ntabris commented 2 years ago

RAPIDS docs now have some info about partitioning GPUs: https://docs.rapids.ai/deployment/nightly/mig.html

This seems like a thing we could do, but I'd like more signal that this is a thing that would be worth the effort (i.e., there's non-trivial demand for this).