Recently I just dived into k8s this popular technique. As I request the GPU of pod to implement some DL tasks but I get confused about setting GPU and scheduling GPU.
I was using the microk8s to create a cluster and pods. Microk8s is very handy for users to enable relevant packages including kubeflow, gpu, etc.
I am wondering if I use the microk8s to enable gpu, do I still have to particularly install the k8s-device-plugin of Nvidia manually?
There is only one GPU device in this node, and I was trying to create a pod with GPU that I followed the instructions from k8s official website for testing. However, I encountered the known issue below.
microk8s version : microk8s --channel=1.21/beta --classic
$ microk8s.kubectl create -f gpu_test.yaml
pod/gpu-pod created
$ microk8s.kubectl get pods
NAME READY STATUS RESTARTS AGE
gpu-operator-node-feature-discovery-master-dcf999dc8-p7s64 1/1 Running 0 58m
gpu-operator-node-feature-discovery-worker-mlcpt 1/1 Running 0 58m
gpu-operator-64df558567-xx6sx 1/1 Running 0 58m
gpu-pod 0/1 Pending 0 2m21s
$ microk8s.kubectl describe pods gpu-pod
Name: gpu-pod
Namespace: default
Priority: 0
Node: <none>
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
cuda:
Image: nvcr.io/nvidia/cuda:11.2.2-devel-ubuntu18.04
Port: <none>
Host Port: <none>
Limits:
memory: 1G
nvidia.com/gpu: 1
Requests:
memory: 1G
nvidia.com/gpu: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lrhb7 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-lrhb7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m46s default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
Warning FailedScheduling 3m45s default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
I tested many kind of yaml, but all got the same issue.
Hence, I am wondering does the GPU plugin of k8s support only one GPU device and this GPU that is not fully freedom? Scheduling GPU is very important to me because I wanna implement the TensorRT and other deep learning tasks inside pods.
I would like to provide more detail information if there is any place not clear.
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
Dear all,
Recently I just dived into k8s this popular technique. As I request the GPU of pod to implement some DL tasks but I get confused about setting GPU and scheduling GPU.
I was using the
microk8s
to create a cluster and pods. Microk8s is very handy for users to enable relevant packages including kubeflow, gpu, etc.microk8s --channel=1.21/beta --classic
The content of yaml :
I tested many kind of yaml, but all got the same issue. Hence, I am wondering does the GPU plugin of k8s support only one GPU device and this GPU that is not fully freedom? Scheduling GPU is very important to me because I wanna implement the TensorRT and other deep learning tasks inside pods.
I would like to provide more detail information if there is any place not clear.
Thank you so much!