awslabs / aws-virtual-gpu-device-plugin

AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads
https://aws.amazon.com/blogs/opensource/virtual-gpu-device-plugin-for-inference-workload-in-kubernetes/
Apache License 2.0
203 stars 31 forks source link

image type latest no longer supported. #15

Open pen-pal opened 3 years ago

pen-pal commented 3 years ago
Screen Shot 2021-02-06 at 10 22 31 PM

as can seen from the image, nvidia no longer keeps image with tag latest, thus creating a problem while trying to run a init container as speciied here

 38       initContainers:
 39       - name: set-compute-mode
 40         image: nvidia/cuda:latest
 41         command: ['nvidia-smi', '-c', 'EXCLUSIVE_PROCESS']
 42         securityContext:
 43           capabilities:
 44             add: ["SYS_ADMIN"]

What is the solution for this? Is it a good idea to use image based on installed cuda version in your worker node or is there some other approach ?

PS: This is more like a bug that requires fix Also my first time opening an issue, so please correct me with the tags

Jeffwan commented 3 years ago

Thanks for reporting the issue.

to use image based on installed cuda version in your worker node

Yes. container image cuda is good enough. Feel free to file a PR

sirajahmed981 commented 3 years ago
Screen Shot 2021-02-06 at 10 22 31 PM

as can seen from the image, nvidia no longer keeps image with tag latest, thus creating a problem while trying to run a init container as speciied here

 38       initContainers:
 39       - name: set-compute-mode
 40         image: nvidia/cuda:latest
 41         command: ['nvidia-smi', '-c', 'EXCLUSIVE_PROCESS']
 42         securityContext:
 43           capabilities:
 44             add: ["SYS_ADMIN"]

What is the solution for this? Is it a good idea to use image based on installed cuda version in your worker node or is there some other approach ?

PS: This is more like a bug that requires fix Also my first time opening an issue, so please correct me with the tags

@M-A-N-I-S-H-K I have used "nvidia/cuda:11.2.2-devel-centos8" instead of "nvidia/cuda:latest" to resolve this issue