Project-HAMi / volcano-vgpu-device-plugin

Device-plugin for volcano vgpu which support hard resource isolation
Apache License 2.0
44 stars 14 forks source link

Vgpu plugin does not restrict memory in container #1

Open kunal642 opened 6 months ago

kunal642 commented 6 months ago

Hi @archlitchi, Creating this issue as a continuation of the conversation we were having on the volcano issue #3384

kunal642 commented 6 months ago

@archlitchi Is the plugin version 1.9.0 compatible with volcano 1.8.2?

kunal642 commented 5 months ago

hey @archlitchi,

We got the hard isolation working by mounting the "/tmp/gpu" and "/tmp/gpulock" to the container explicitly.

Can you explain why we are not able to assign more than 4 vgpu to a single pod (we have 4 GPU cards on a single node).

archlitchi commented 5 months ago

@archlitchi Is the plugin version 1.9.0 compatible with volcano 1.8.2?

i recommend to use 1.9.0

archlitchi commented 5 months ago

hey @archlitchi,

We got the hard isolation working by mounting the "/tmp/gpu" and "/tmp/gpulock" to the container explicitly.

Can you explain why we are not able to assign more than 4 vgpu to a single pod (we have 4 GPU cards on a single node).

yes, there are only 4 devices in /dev folder, so you can use 4 gpus at most, we can't mount a non-exist gpu device into container and can be recognized by nvidia-driver

kunal642 commented 5 months ago

Does this mean that device plugin only restricts memory and not the compute resources?

If no then how can a pod use the full gpu using vgpu config?

archlitchi commented 5 months ago

Does this mean that device plugin only restricts memory and not the compute resources?

If no then how can a pod use the full gpu using vgpu config?

it can restrict compute resources by specifying volcano.sh/vgpu-cores, if you want to use the full gpu, only specify volcano.sh/vgpu-number inside task

kunal642 commented 5 months ago

got it, is there a way to check how many cores are allocated in the container? if we configure 50% cores, then we want to make sure that only 50% is allocated