Open gschwim opened 2 years ago
@gschwim Looks like PodSecurityPolicy admission controllers are enabled. You can install with --set psp.enabled=true
so that we create and use appropriate PSP's with required permissions.
Hi @shivamerla - Thanks for the reply. I did try --set psp.enabled=true
on several of the testing iterations but this didn't appear to make any difference. Is there something that needs to be done in addition to this to take advantage of it?
@gschwim Can you run kubectl get psp
and confirm PSP policies are created by GPU Operator. nvidia-driver serviceAccount is bound to the gpu-operator-privileged
PSP which should allow this. Can you copy the error again with PSP enabled.
1. Quick Debug Checklist
i2c_core
andipmi_msghandler
loaded on the nodes?kubectl describe clusterpolicies --all-namespaces
)1. Issue or feature description
Following the documented install procedure for gpu-operator on a fresh charmed-kubernetes install, I get the following error on the gpu-operator running on the node:
This results in no gpu resources becoming available to the cluster.
2. Steps to reproduce the issue
kubectl -n gpu-operator logs <gpu-operator>
to view logs confirming incomplete operation3. Information to attach (optional if deemed irrelevant)
kubectl get pods --all-namespaces
kubectl get ds --all-namespaces
kubectl describe pod -n NAMESPACE POD_NAME
Pod cannot get a gpu resource. This works if I use system drivers.
kubectl logs -n NAMESPACE POD_NAME
[ ] Output of running a container on the GPU machine:
docker run -it alpine echo foo
[ ] Docker configuration file:
cat /etc/docker/daemon.json
[ ] Docker runtime configuration:
docker info | grep runtime
[x] NVIDIA shared directory:
ls -la /run/nvidia
Does not exist
ls -la /usr/local/nvidia/toolkit
Does not exist
ls -la /run/nvidia/driver
Does not exist
journalctl -u kubelet > kubelet.logs