Open owainow opened 3 years ago
@owainow Could you collect a Cilium sysdump for that cluster? It's hard to help otherwise as cilium status
doesn't report in-depth information.
Sure let me attach it. New to Cilium so if I've not included any info etc let me know. cilium-sysdump-20210809-164914.zip
There seem to be an issue with pulling the image for one of the operator pods:
"state": {
"waiting": {
"message": "Back-off pulling image \"quay.io/cilium/operator-generic:v1.10.2@sha256:a88b04cb5895610620da6e90d362af9e512d2baa51a0a0d77ab34186dfb20c68\"",
"reason": "ImagePullBackOff"
}
}
There are also a couple errors in agents:
2021-08-09T14:42:44.922025191Z level=error msg="ListenAndServe failed for service health server, since the user might be running with kube-proxy. Please ensure that '--enable-health-check-nodeport' option is set to false if '--kube-proxy-replacement' is set to 'partial'" error="listen tcp :32313: bind: address already in use" serviceName=router-default serviceNamespace=openshift-ingress subsys=service-healthserver svcHealthCheckNodePort=32313
2021-08-09T14:42:44.922093099Z level=error msg="ListenAndServe failed for service health server" error="listen tcp :32313: bind: address already in use" serviceName=router-default serviceNamespace=openshift-ingress subsys=service-healthserver svcHealthCheckNodePort=32313
I don't expect those issues would cause the errors you are seeing however. I didn't find anything else in the sysdump. Were the errors still visible in cilium status
after you retrieved the Cilium sysdump?
Yes, after getting the syslog if I run cilium status shows a daemon error again. I have tried again on a different cluster but the problem is consistent. Unsure why because OCP is able to "Validate" the quay image. `[owain@localhost ~]$ cilium status /¯¯\ /¯¯_/¯¯\ Cilium: 1 errors _/¯¯_/ Operator: disabled /¯¯_/¯¯\ Hubble: disabled _/¯¯_/ ClusterMesh: disabled __/
Containers: cilium
cilium-operator
Errors: cilium cilium daemonsets.apps "cilium" not found
`
Hi, Any updates on this?
Can anyone help point us in a direction here? The issue seems to still exist.
@v1k0d3n when I deployed cilium via OLM on OpenShift (baremetal) and I had to manually add the service accounts from cilium to the privileged SCC, but the namespace was flooded with events related to the policy issues oc get events
. I never got it fully functioning on OpenShift though due to some issues with hubble which I posted in the cilium hubble repo.
Cilium CLI insists the DS does not exist and other components are not configured but they do and are. Perhaps the cli doesn't work with OLM installations. Under supported environments in the readme it doesn't specifically say OpenShift so I'm left with the assumption it is unsupported.
Supported Environments
minikube
kind
EKS
self-managed
GKE
AKS
k3s
Rancher
$ cilium status
/¯¯\
/¯¯\__/¯¯\ Cilium: 1 errors
\__/¯¯\__/ Operator: disabled
/¯¯\__/¯¯\ Hubble: disabled
\__/¯¯\__/ ClusterMesh: disabled
\__/
Containers: cilium
cilium-operator
Cluster Pods: 0/446 managed by Cilium
Errors: cilium cilium daemonsets.apps "cilium" not found
oc get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
cilium 12 12 12 12 12 <none> 6h54m
$ oc get pods
NAME READY STATUS RESTARTS AGE
cilium-2vsgk 1/1 Running 0 3h43m
cilium-75sl2 1/1 Running 0 3h43m
cilium-7g92r 1/1 Running 0 3h43m
cilium-b8zc5 1/1 Running 0 3h43m
cilium-dcvv4 1/1 Running 0 3h43m
cilium-gs7f6 1/1 Running 0 3h43m
cilium-kqqdc 1/1 Running 0 3h43m
cilium-kvq27 1/1 Running 0 3h43m
cilium-olm-56b8648b4f-v8mcj 1/1 Running 0 3h43m
cilium-operator-55c9dd779d-grxcc 1/1 Running 0 3h43m
cilium-operator-55c9dd779d-kgl68 1/1 Running 0 3h43m
cilium-ptk27 1/1 Running 0 3h43m
cilium-v2p4q 1/1 Running 0 3h43m
cilium-wpl26 1/1 Running 0 3h43m
cilium-znggn 1/1 Running 0 3h43m
hubble-relay-6584f5545c-99p9n 1/1 Running 0 3h43m
hubble-ui-95d74d44c-cqsqx 3/3 Running 0 3h43m
It's probably because cilium isn't installed in the default namespace. It's necessary to provide that namespace to the CLI:
E.g.
cilium status --namespace=cilium
Yes, this is the reason. We can close this issue now.
Getting the following issue when trying to install Cilium on an AWS deployment of Openshift V4.6. I am able to install cilium through the operator hub without problem however when I run the install command on the command line "cilium install --cluster-name=x" I consistently run into this error. It seems the cilium-agent containers are not able to be deployed. I cant find any reference to this issue in the docs.