Closed gecube closed 3 years ago
this is failing because the YAML you're using (master
) doesn't match the operator image (1.6.2
),
so you would need to:
oc --ignore-not-found=true delete crd clusterpolicies.nvidia.com
1.6.2
tag,I got bitten by this error a few times already :)
@kpouget Thanks. I decided to check tags and found 1.6.2 tag of this repo with completely different Helm chart. Uf... And it helped to deploy. I think it is good idea for developers to follow 'always green master branch principle'.
I think it is good idea for developers to follow 'always green master branch principle'.
I fully agree, I'm actually facing right now a similar issue when trying to deploy the operator as a bundle, as this image doesn't exist yet:
@kpouget We have discussed this internally few times, but no clear conclusion yet. Ideally we can maintain an image with tag latest
which always represents changes from master (i.e it gets updated with every merge). Hopefully we will add this soon, so we can use this with helm charts/CSV files etc in master branch.
@shivamerla yes an image tagged latest
or anything similar would be the best, that's easy to automate with tools like Quay.io
, I've started using it in the CI for nightly testing
1. Quick Debug Checklist
i2c_core
andipmi_msghandler
loaded on the nodes?kubectl describe clusterpolicies --all-namespaces
)What is going on.
cd deployments/gpu-operator
helm install --wait --generate-name \ . \ --set operator.defaultRuntime=containerd \ --set toolkit.env[0].name=CONTAINERD_CONFIG \ --set toolkit.env[0].value=/etc/containerd/config.toml \ --set toolkit.env[1].name=CONTAINERD_SOCKET \ --set toolkit.env[1].value=/run/containerd/containerd.sock \ --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \ --set toolkit.env[2].value=nvidia \ --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \ --set toolkit.env[3].value=true
kubectl get pods
NAME READY STATUS RESTARTS AGE gpu-operator-56bfbfd666-2g27g 0/1 CrashLoopBackOff 10 31m gpu-operator-node-feature-discovery-master-dcf999dc8-rzkv6 0/1 ErrImagePull 0 31m gpu-operator-node-feature-discovery-worker-69jw9 0/1 ContainerCreating 0 31m
kubectl logs pod/gpu-operator-56bfbfd666-2g27g unknown flag: --leader-elect Usage of gpu-operator: --zap-devel Enable zap development mode (changes defaults to console encoder, debug log level, disables sampling and stacktrace from 'warning' level) --zap-encoder encoder Zap log encoding ('json' or 'console') --zap-level level Zap log level (one of 'debug', 'info', 'error' or any integer value > 0) (default info) --zap-sample sample Enable zap log sampling. Sampling will be disabled for integer log levels > 1 --zap-stacktrace-level level Set the minimum log level that triggers stacktrace generation (default error) --zap-time-encoding timeEncoding Sets the zap time format ('epoch', 'millis', 'nano', or 'iso8601') (default ) unknown flag: --leader-elect