cilium / cilium

eBPF-based Networking, Security, and Observability
https://cilium.io
Apache License 2.0
20.01k stars 2.94k forks source link

Unable to enqueue endpoint policy {policy|bandwidth} event #13398

Closed tgraf closed 3 years ago

tgraf commented 4 years ago

These are regular errors in the CI. Possibly harmless but the error messages will put users off.

cilium-combined-logs.txt:2020-10-04T21:27:00.315258716Z level=error msg="Unable to enqueue endpoint policy visibility event" containerID=9f680a5847 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3479 error="unable to Enqueue event" identity=8771 ipv4=10.116.2.10 ipv6= k8sPodName=cilium-monitoring/grafana-6d49bd9ff7-s8zsd subsys=endpoint
cilium-combined-logs.txt:2020-10-04T21:27:00.315269913Z level=error msg="Unable to enqueue endpoint policy bandwidth event" containerID=9f680a5847 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3479 error="unable to Enqueue event" identity=8771 ipv4=10.116.2.10 ipv6= k8sPodName=cilium-monitoring/grafana-6d49bd9ff7-s8zsd subsys=endpoint
cilium-combined-logs.txt:2020-10-04T21:27:00.316380614Z level=error msg="Unable to enqueue endpoint policy visibility event" containerID=75396cf013 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=787 error="unable to Enqueue event" identity=3774 ipv4=10.116.2.224 ipv6= k8sPodName=kube-system/stackdriver-metadata-agent-cluster-level-8496498546-vsbgf subsys=endpoint
cilium-combined-logs.txt:2020-10-04T21:27:00.316403177Z level=error msg="Unable to enqueue endpoint policy visibility event" containerID=6ffa38454e datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=1089 error="unable to Enqueue event" identity=56231 ipv4=10.116.2.115 ipv6= k8sPodName=kube-system/heapster-gke-5846699544-cflsc subsys=endpoint
cilium-combined-logs.txt:2020-10-04T21:27:00.316424857Z level=error msg="Unable to enqueue endpoint policy bandwidth event" containerID=6ffa38454e datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=1089 error="unable to Enqueue event" identity=56231 ipv4=10.116.2.115 ipv6= k8sPodName=kube-system/heapster-gke-5846699544-cflsc subsys=endpoint
cilium-combined-logs.txt:2020-10-04T21:27:00.316483187Z level=error msg="Unable to enqueue endpoint policy bandwidth event" containerID=75396cf013 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=787 error="unable to Enqueue event" identity=3774 ipv4=10.116.2.224 ipv6= k8sPodName=kube-system/stackdriver-metadata-agent-cluster-level-8496498546-vsbgf subsys=endpoint
cilium-combined-logs.txt:2020-10-04T21:27:00.318154577Z level=error msg="Unable to enqueue endpoint policy visibility event" containerID=57e2774a40 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=2955 error="unable to Enqueue event" identity=8764 ipv4=10.116.2.38 ipv6= k8sPodName=cilium-monitoring/prometheus-855fb44bbb-vc222 subsys=endpoint
cilium-combined-logs.txt:2020-10-04T21:27:00.318168963Z level=error msg="Unable to enqueue endpoint policy bandwidth event" containerID=57e2774a40 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=2955 error="unable to Enqueue event" identity=8764 ipv4=10.116.2.38 ipv6= k8sPodName=cilium-monitoring/prometheus-855fb44bbb-vc222 subsys=endpoint
cilium-combined-logs.txt:2020-10-04T21:27:00.318764838Z level=error msg="Unable to enqueue endpoint policy visibility event" containerID=c1bc6d8904 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=573 error="unable to Enqueue event" identity=33435 ipv4=10.116.2.123 ipv6= k8sPodName=kube-system/kube-dns-5c446b66bd-98cgs subsys=endpoint
cilium-combined-logs.txt:2020-10-04T21:27:00.318815149Z level=error msg="Unable to enqueue endpoint policy bandwidth event" containerID=c1bc6d8904 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=573 error="unable to Enqueue event" identity=33435 ipv4=10.116.2.123 ipv6= k8sPodName=kube-system/kube-dns-5c446b66bd-98cgs subsys=endpoint

Related: #13359

christarazi commented 4 years ago

I was able to trace this down. These messages are triggered when Cilium is restoring endpoints. For background, being unable to enqueue an event occurs if the event itself is nil or if the queue has not been initialized. Both the visibility policy and the bandwidth event are triggered when the daemon begins fetching the endpoint's metadata via the per-endpoint controller. The queue is initialized when the endpoint itself is regenerated or when EndpointManager "exposes" the endpoint.

The flow is as follows:

1) Cilium starts up, notices there are old endpoints 2) Begins restoring the endpoints 3) Cilium attempts to validate the endpoints that are being restored 4) As part of the validation, a controller begins fetching metadata, triggering visibility policy and bandwidth policy events 5) After an endpoint is restored, it's regenerated, at which point its queue is initialized