Open mrocklin opened 1 year ago
I'm also curious @jacobtomlinson , if you were to do this would you set up your own cloud account (presumably GCP) and give people (locked down) access to it (much like I just did with the dask
account and you) or would you point them to connect Coiled to their own cloud accounts?
I would probably point them to connect Coiled to their own accounts.
You can use GPU on scheduler but the UI isn't friendly since I didn't know if this was a thing that made sense for users. This is some signal that it does, so I'll make the UI friendlier.
For AWS, you'd just use a GPU instance type.
For GCP, there's an undocumented backend option you can use like this:
coiled.create_software_environment(
name="rapids-nightly-jupyter",container="rapidsai/rapidsai-core-nightly:22.08-cuda11.5-base-ubuntu20.04-py3.9"
)
cluster = coiled.Cluster(
software='rapids-nightly-jupyter',
n_workers=1,
scheduler_vm_types=["n1-standard-4"], # n1 family so you can add GPU
worker_gpu=1,
backend_options={
**coiled.utils.GCP_SCHEDULER_GPU # this enables scheduler GPU
}
)
I'll add a scheduler_gpu=True
kwarg to coiled.Cluster
.
We typically advise GPU clusters to have GPU schedulers anyway in case GPU things get deserialised on the scheduler. Even if they are empty objects they may try and invoke CUDA in some way.
We typically advise GPU clusters to have GPU schedulers anyway in case GPU things get deserialised on the scheduler
Is there a doc that says this? I'd love to be able to link to such a doc.
Not today but there will be soon. I'll let you know when we have it.
I was just chatting with @jacobtomlinson . He expressed that he really likes the great UX of Coiled, and the friendly way in which package sync works. He also expressed curiousity about other ways in which it could be used.
One pain that he has is getting people up and going with RAPIDS easily. For this he wants a GPU powered notebook. Currently his go-to solution is Sagemaker Studio, but this requires a lot of unfriendly startup infrastructure pain. He'd be fine with the Dask scheduler running Jupyter in a Coiled cluster if ...
If these existed then it's more likely that he would point people towards Coiled for this use case, which is common for him.
This has come up before, but I thought I'd raise this again given that I heard it again.