coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Prevent GPU tasks from Interfering with eachother #183

Closed phobson closed 2 years ago

phobson commented 2 years ago

A user wrote Nat with the follow dask question:

I am able to perform our processing using GPUs on the coiled cluster. I wanted to know if is there a way to schedule jobs for the workers so that it does not run out of GPU memory.

What's happening occasionally is that the same worker gets assigned two GPU-heavy tasks and then it runs out of memory. I thought Dask will take of this while scheduling jobs but if it is not then what would be the recommended way?

I'll update this issue with more context as it comes in.

phobson commented 2 years ago

@gjoseph92 notes:

If the tasks are annotated to each require GPU: 1 resource, and the worker is configured to have 1 GPU resource, then they shouldn’t be able to run simultaneously. If they are, that’s a bug. We’ll want to see the code they’re using to create the cluster and to create the tasks.

That would look like this:

with dask.annotate(resources={'GPU': 1}):
    ...

Doc links:

  1. https://docs.dask.org/en/stable/api.html
  2. https://distributed.dask.org/en/stable/resources.html#example
PranjalSahu commented 2 years ago

@phobson Thanks I will try this while creating dask computation graph.

gjoseph92 commented 2 years ago

@PranjalSahu were you not using dask.annotate in your code currently? Were you setting coiled.Cluster(..., worker_options=dict(resources=dict(GPU=1)))?

Dask does not schedule based on memory usage whatsoever (GPU or otherwise). It will happily do things at the same time which, in total, will use too much memory. If you have tasks that you know cannot run at the same time as each other, then you need to use resources (or other mechanisms, like a dask.utils.SerializableLock) to prevent them from running concurrently.

Another option, depending on what your tasks look like, would be to just give the workers 1 thread. Then only one of any type of task can ever be running at once. This could cause underutilization though, depending on what you're doing.

PranjalSahu commented 2 years ago

I have not used dask.annotate yet. I will use it now for GPU tasks and reply here.

phobson commented 2 years ago

@PranjalSahu how are things working? Let me know if you'd like to discuss things.

PranjalSahu commented 2 years ago

I have been running into GPU memory issues. Looks like the memory does not get free once the task finishes execution. So eventually the task dies after working fine for a few patients. I have explicitly cleared GPU memory before and after execution using torch.cuda.empty_cache().

Now I am adding option for worker_class='dask_cuda.CUDAWorker'.

Recently the cluster is not able to start due "Process never phoned home" message. So I could not test the CUDAWorker option yet.

FabioRosado commented 2 years ago

I have been running into GPU memory issues. Looks like the memory does not get free once the task finishes execution. So eventually the task dies after working fine for a few patients. I have explicitly cleared GPU memory before and after execution using torch.cuda.empty_cache().

Now I am adding option for worker_class='dask_cuda.CUDAWorker'.

Recently the cluster is not able to start due "Process never phoned home" message. So I could not test the CUDAWorker option yet.

Hello @PranjalSahu I'm a Coiled Software Engineer and just looked at this issue (Process never phoned home). Could you let me know how you are trying to create the cluster? Are you specifying the worker_gpu=1 argument?

The reason why I am asking this, is because from the logs it seems that there are some drivers errors, from the logs I can see nvml error: driver not loaded.

Thank you and apologies for any issues that this may have caused you

PranjalSahu commented 2 years ago

I am creating cluster like this:

cluster = coiled.Cluster(
    name='gpucluster15',
    scheduler_vm_types=['t3.large'],
    #worker_vm_types=["g4dn.xlarge", "g4dn.2xlarge", "g4ad.xlarge", "p3.2xlarge", "p2.xlarge", "g5.2xlarge"],
    worker_vm_types=["p3.2xlarge", "p2.xlarge", "g5.2xlarge"],
    n_workers=2,
    software="pranjal-sahu/gpu-test11",
    worker_options=dict(resources=dict(GPU=1)),
    worker_class='dask_cuda.CUDAWorker',
    shutdown_on_close=True,
)
PranjalSahu commented 2 years ago

In my current pipeline I am deleting the model after inferencing and calling gc.collect(). And I had to remove the options dask_cuda.CUDAWorker and dict(resouces=dict(GPU=1)).

https://stackoverflow.com/questions/70051984/how-to-clear-gpu-memory-after-using-model

I have not run into GPU memory issues after this for a while. My expectation was that it would get cleaned up once the task finishes i.e. it gets out of scope.

ntabris commented 2 years ago

Hi, @PranjalSahu. It sounds like

  1. you got things working, but
  2. this took more trial-and-error than would be ideal.

Is that right? (Let us know if you're still having issues or need help with this!)

PranjalSahu commented 2 years ago

@ntabris Yes, It is working. I have tested it multiple times and with more number of patients. Deleting the model object after inference solves the GPU memory problem.

The DASK tutorials on GPU usage focus on examples where it performs batch processing so the GPU memory remains allocated for that duration. But in our use case, the GPU computation is done on only one sample at a time and is a heavy computation. The GPU memory needs to be freed for the next task.

phobson commented 2 years ago

@PranjalSahu thanks for following up. I'm going to mark this issue as closed. Feel free to reopen if you think otherwise.