AliyunContainerService / gpushare-scheduler-extender

GPU Sharing Scheduler for Kubernetes Cluster
Apache License 2.0
1.39k stars 308 forks source link

how to share multiple gpus? #174

Closed mengwanguc closed 2 years ago

mengwanguc commented 2 years ago

Let's say we have job A and B. And suppose we have two GPUs, each has 16GB memory.

Is it possible that we let job A occupy both GPUs, each 5GB memory. And let job B occupy both GPUs, each 10GB, memory?

I think in the configuration file, we can specify "aliyun.com/gpu-mem: 5" But how do we specify the gpu #?

Thanks, Meng

mengwanguc commented 2 years ago

Never mind it should be aliyun.com/gpu-count

mknnj commented 2 years ago

Hello,

Did you manage to make it work with aliyun.com/gpu-count? Because from #10 it seems not possible. I tried to do the same thing but I can't make it work. Do you have any advice?

Thank you very much,

Michele

mengwanguc commented 2 years ago

Hello,

Did you manage to make it work with aliyun.com/gpu-count? Because from #10 it seems not possible. I tried to do the same thing but I can't make it work. Do you have any advice?

Thank you very much,

Michele

I don't think it's possible by using their tool.

Their allocation is based on memory value only: https://github.com/mengwanguc/gpushare-scheduler-extender/blob/878b6fde505d6e11fd15ef4fdcc67091b6d6323a/pkg/cache/nodeinfo.go#L147

If we want to make it possible, we must hack their code for our needs.