coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

GPU Notebooks #218

Open mrocklin opened 1 year ago

mrocklin commented 1 year ago

I was just chatting with @jacobtomlinson . He expressed that he really likes the great UX of Coiled, and the friendly way in which package sync works. He also expressed curiousity about other ways in which it could be used.

One pain that he has is getting people up and going with RAPIDS easily. For this he wants a GPU powered notebook. Currently his go-to solution is Sagemaker Studio, but this requires a lot of unfriendly startup infrastructure pain. He'd be fine with the Dask scheduler running Jupyter in a Coiled cluster if ...

  1. We could turn on GPUs for the scheduler (I think that this is not supported currently)
  2. It were more secure (he didn't actually say this, but I prompted it)

If these existed then it's more likely that he would point people towards Coiled for this use case, which is common for him.

This has come up before, but I thought I'd raise this again given that I heard it again.

mrocklin commented 1 year ago

I'm also curious @jacobtomlinson , if you were to do this would you set up your own cloud account (presumably GCP) and give people (locked down) access to it (much like I just did with the dask account and you) or would you point them to connect Coiled to their own cloud accounts?

jacobtomlinson commented 1 year ago

I would probably point them to connect Coiled to their own accounts.

ntabris commented 1 year ago

You can use GPU on scheduler but the UI isn't friendly since I didn't know if this was a thing that made sense for users. This is some signal that it does, so I'll make the UI friendlier.

For AWS, you'd just use a GPU instance type.

For GCP, there's an undocumented backend option you can use like this:

coiled.create_software_environment(
  name="rapids-nightly-jupyter",container="rapidsai/rapidsai-core-nightly:22.08-cuda11.5-base-ubuntu20.04-py3.9"
)

cluster = coiled.Cluster(
  software='rapids-nightly-jupyter',
  n_workers=1,
  scheduler_vm_types=["n1-standard-4"],  # n1 family so you can add GPU
  worker_gpu=1,
  backend_options={
    **coiled.utils.GCP_SCHEDULER_GPU  # this enables scheduler GPU
  }
)

I'll add a scheduler_gpu=True kwarg to coiled.Cluster.

jacobtomlinson commented 1 year ago

We typically advise GPU clusters to have GPU schedulers anyway in case GPU things get deserialised on the scheduler. Even if they are empty objects they may try and invoke CUDA in some way.

ntabris commented 1 year ago

We typically advise GPU clusters to have GPU schedulers anyway in case GPU things get deserialised on the scheduler

Is there a doc that says this? I'd love to be able to link to such a doc.

jacobtomlinson commented 1 year ago

Not today but there will be soon. I'll let you know when we have it.