DamonSet creation fails on charmed-kubernetes

gschwim commented 2 years ago

1. Quick Debug Checklist

[] Are you running on an Ubuntu 18.04 node?
[x] Are you running Kubernetes v1.13+?
[] Are you running Docker (>= 18.06) or CRIO (>= 1.13+)?
[x] Do you have i2c_core and ipmi_msghandler loaded on the nodes?
[x] Did you apply the CRD (kubectl describe clusterpolicies --all-namespaces)

1. Issue or feature description

Following the documented install procedure for gpu-operator on a fresh charmed-kubernetes install, I get the following error on the gpu-operator running on the node:

Couldn't create DaemonSet: ... Forbidden: disallowed by cluster policy

This results in no gpu resources becoming available to the cluster.

2. Steps to reproduce the issue

Install charmed-kubernetes via juju on bare metal servers (w/ MaaS)
Configure containerd to not use ubuntu system drivers (anticipating use of gpu-operator)
Follow directions for proxy install in gpu-operator documentation
Check status w kubectl -n gpu-operator logs <gpu-operator> to view logs confirming incomplete operation

3. Information to attach (optional if deemed irrelevant)

[x] kubernetes pods status: kubectl get pods --all-namespaces

NAMESPACE                         NAME                                                              READY   STATUS    RESTARTS        AGE
abn                               nginx-37a47647-899f6ff4c-v6mg6                                    1/1     Running   1 (3h15m ago)   6h7m
default                           cuda-vectoradd                                                    0/1     Pending   0               3h43m
gpu-operator                      gpu-operator-1656460182-node-feature-discovery-master-7d6cpjfwd   1/1     Running   0               11m
gpu-operator                      gpu-operator-1656460182-node-feature-discovery-worker-k2hd7       1/1     Running   0               11m
gpu-operator                      gpu-operator-77787587cf-57mgn                                     1/1     Running   0               11m
ingress-nginx-kubernetes-worker   default-http-backend-kubernetes-worker-6cd58d8886-h5xjl           1/1     Running   2 (2m38s ago)   6h7m
ingress-nginx-kubernetes-worker   nginx-ingress-controller-kubernetes-worker-kpfjm                  1/1     Running   1 (3h15m ago)   5h12m
kube-system                       coredns-5564855696-79vr9                                          1/1     Running   1 (3h15m ago)   6h7m
kube-system                       kube-state-metrics-5ccbcf64d5-2tqr7                               1/1     Running   1 (3h15m ago)   6h7m
kube-system                       metrics-server-v0.5.1-79b4746b65-sbbbl                            2/2     Running   2 (3h15m ago)   6h7m
kube-system                       tiller-deploy-74bcf4c66c-2vnlc                                    1/1     Running   0               141m
kubernetes-dashboard              dashboard-metrics-scraper-5cd54464bf-zf8b9                        1/1     Running   1 (3h15m ago)   6h7m
kubernetes-dashboard              kubernetes-dashboard-55796c99c-vnhlm                              1/1     Running   1 (3h15m ago)   6h7m

[x] kubernetes daemonset status: kubectl get ds --all-namespaces

NAMESPACE                         NAME                                                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                        AGE
gpu-operator                      gpu-operator-1656460182-node-feature-discovery-worker   1         1         1       1            1           <none>                               14m
ingress-nginx-kubernetes-worker   nginx-ingress-controller-kubernetes-worker              1         1         1       1            1           juju-application=kubernetes-worker   11d

[x] If a pod/ds is in an error state or pending state kubectl describe pod -n NAMESPACE POD_NAME

Pod cannot get a gpu resource. This works if I use system drivers.

Name:         cuda-vectoradd
Namespace:    default
Priority:     0
Node:         <none>
Labels:       <none>
Annotations:  kubernetes.io/psp: privileged
Status:       Pending
IP:
IPs:          <none>
Containers:
  cuda-vectoradd:
    Image:      nvidia/samples:vectoradd-cuda11.2.1
    Port:       <none>
    Host Port:  <none>
    Limits:
      nvidia.com/gpu:  1
    Requests:
      nvidia.com/gpu:  1
    Environment:       <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ggcpf (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  kube-api-access-ggcpf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                     From               Message
  ----     ------            ----                    ----               -------
  Warning  FailedScheduling  117s (x216 over 3h47m)  default-scheduler  0/1 nodes are available: 1 Insufficient nvidia.com/gpu.

[x] If a pod/ds is in an error state or pending state kubectl logs -n NAMESPACE POD_NAME

1.6564608582434602e+09  INFO    controllers.ClusterPolicy   GPU workload configuration  {"NodeName": "number1", "GpuWorkloadConfig": "container"}
1.6564608582435489e+09  INFO    controllers.ClusterPolicy   Checking GPU state labels on the node   {"NodeName": "number1"}
1.6564608582435687e+09  INFO    controllers.ClusterPolicy   Number of nodes with GPU label  {"NodeCount": 1}
1.6564608582436178e+09  INFO    controllers.ClusterPolicy   Using container runtime: containerd
1.6564608582436502e+09  INFO    controllers.ClusterPolicy   Found Resource, updating... {"RuntimeClass": "nvidia"}
1.6564608582491097e+09  INFO    controllers.ClusterPolicy   INFO: ClusterPolicy step completed  {"state:": "pre-requisites", "status": "ready"}
1.6564608582492526e+09  INFO    controllers.ClusterPolicy   Found Resource, updating... {"Service": "gpu-operator", "Namespace": "gpu-operator"}
1.6564608582618704e+09  INFO    controllers.ClusterPolicy   INFO: ClusterPolicy step completed  {"state:": "state-operator-metrics", "status": "ready"}
1.6564608582673767e+09  INFO    controllers.ClusterPolicy   Found Resource, skipping update {"ServiceAccount": "nvidia-driver", "Namespace": "gpu-operator"}
1.6564608582728472e+09  INFO    controllers.ClusterPolicy   Found Resource, updating... {"Role": "nvidia-driver", "Namespace": "gpu-operator"}
1.6564608582828317e+09  INFO    controllers.ClusterPolicy   Found Resource, updating... {"ClusterRole": "nvidia-driver", "Namespace": "gpu-operator"}
1.6564608582917275e+09  INFO    controllers.ClusterPolicy   Found Resource, updating... {"RoleBinding": "nvidia-driver", "Namespace": "gpu-operator"}
1.6564608583003638e+09  INFO    controllers.ClusterPolicy   Found Resource, updating... {"ClusterRoleBinding": "nvidia-driver", "Namespace": "gpu-operator"}
1.656460858304446e+09   INFO    controllers.ClusterPolicy   5.4.0-121-generic   {"Request.Namespace": "default", "Request.Name": "Node"}
1.656460858304628e+09   INFO    controllers.ClusterPolicy   DaemonSet not found, creating   {"DaemonSet": "nvidia-driver-daemonset", "Namespace": "gpu-operator", "Name": "nvidia-driver-daemonset"}
1.656460858309278e+09   INFO    controllers.ClusterPolicy   Couldn't create DaemonSet   {"DaemonSet": "nvidia-driver-daemonset", "Namespace": "gpu-operator", "Name": "nvidia-driver-daemonset", "Error": "DaemonSet.apps \"nvidia-driver-daemonset\" is invalid: [spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy, spec.template.spec.initContainers[0].securityContext.privileged: Forbidden: disallowed by cluster policy]"}
1.6564608583093338e+09  ERROR   controller.clusterpolicy-controller Reconciler error    {"name": "cluster-policy", "namespace": "", "error": "DaemonSet.apps \"nvidia-driver-daemonset\" is invalid: [spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy, spec.template.spec.initContainers[0].securityContext.privileged: Forbidden: disallowed by cluster policy]"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227

[ ] Output of running a container on the GPU machine: docker run -it alpine echo foo
[ ] Docker configuration file: cat /etc/docker/daemon.json
[ ] Docker runtime configuration: docker info | grep runtime
[x] NVIDIA shared directory: ls -la /run/nvidia

Does not exist

[x] NVIDIA packages directory: ls -la /usr/local/nvidia/toolkit

Does not exist

[x] NVIDIA driver directory: ls -la /run/nvidia/driver

Does not exist

[x] kubelet logs journalctl -u kubelet > kubelet.logs

-- Logs begin at Tue 2022-06-28 18:42:57 UTC, end at Wed 2022-06-29 00:08:00 UTC. --
-- No entries --

shivamerla commented 2 years ago

@gschwim Looks like PodSecurityPolicy admission controllers are enabled. You can install with --set psp.enabled=true so that we create and use appropriate PSP's with required permissions.

gschwim commented 2 years ago

Hi @shivamerla - Thanks for the reply. I did try --set psp.enabled=true on several of the testing iterations but this didn't appear to make any difference. Is there something that needs to be done in addition to this to take advantage of it?

shivamerla commented 2 years ago

@gschwim Can you run kubectl get psp and confirm PSP policies are created by GPU Operator. nvidia-driver serviceAccount is bound to the gpu-operator-privileged PSP which should allow this. Can you copy the error again with PSP enabled.

NVIDIA / gpu-operator