cilium / cilium

eBPF-based Networking, Security, and Observability
https://cilium.io
Apache License 2.0
19.71k stars 2.88k forks source link

cilium agent crashes when bgpp node selector has an empty list in its matchExpressions.values field #23227

Closed rio closed 1 year ago

rio commented 1 year ago

Is there an existing issue for this?

What happened?

I attempted to select all nodes for bgp peering by setting the following matchExpression:

nodeSelector:
  matchExpressions:
  - key: kubernetes.io/hostname
    operator: NotIn
    values: []

The API accepts it but all the cilium-agents immediately started crash looping.

Cilium Version

1.13.0-rc4

Kernel Version

Linux sh 5.15.86-talos #1 SMP Wed Jan 11 12:07:19 UTC 2023 aarch64 Linux

Kubernetes Version

serverVersion: buildDate: "2022-12-08T19:51:45Z" compiler: gc gitCommit: b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d gitTreeState: clean gitVersion: v1.26.0 goVersion: go1.19.4 major: "1" minor: "26" platform: linux/arm64

Sysdump

No response

Relevant log output

cilium-agent level=info msg="Cilium BGP Control Plane Controller woken for reconciliation" component=Controller.Run subsys=bgp-control-plane
cilium-agent level=error msg="Failed to convert CiliumBGPPeeringPolicy's NodeSelector to a label.Selector interface" component=PolicySelection error="values: Invalid value: []string(nil): for 'in', 'notin' operato
rs, values set can't be empty" subsys=bgp-control-plane
cilium-agent level=info msg="Auto-enabling \"enable-node-port\", \"enable-external-ips\", \"bpf-lb-sock\", \"enable-host-port\", \"enable-session-affinity\" features" subsys=daemon
cilium-agent panic: runtime error: invalid memory address or nil pointer dereference
cilium-agent [signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0xe7d030]
cilium-agent
cilium-agent goroutine 15 [running]:
cilium-agent github.com/cilium/cilium/pkg/bgpv1/agent.PolicySelection({0x4000481b20?, 0x4000d37838?}, 0x0, {0x4000b90468, 0x1, 0x4000d378f8?})
cilium-agent     /go/src/github.com/cilium/cilium/pkg/bgpv1/agent/controller.go:300 +0x240
cilium-agent github.com/cilium/cilium/pkg/bgpv1/agent.(*Controller).Reconcile(0x400050aa20, {0x31728d0, 0x4000b8d600})
cilium-agent     /go/src/github.com/cilium/cilium/pkg/bgpv1/agent/controller.go:345 +0x234
cilium-agent github.com/cilium/cilium/pkg/bgpv1/agent.(*Controller).Run(0x400050aa20, {0x31728d0, 0x4000b8d600})
cilium-agent     /go/src/github.com/cilium/cilium/pkg/bgpv1/agent/controller.go:259 +0x1d8
cilium-agent github.com/cilium/cilium/pkg/bgpv1/agent.(*Controller).Start.func3({0x31728d0?, 0x4000b8d600?})
cilium-agent     /go/src/github.com/cilium/cilium/pkg/bgpv1/agent/controller.go:201 +0x34
cilium-agent github.com/cilium/workerpool.(*WorkerPool).run.func1()
cilium-agent     /go/src/github.com/cilium/cilium/vendor/github.com/cilium/workerpool/workerpool.go:174 +0x78
cilium-agent created by github.com/cilium/workerpool.(*WorkerPool).run
cilium-agent     /go/src/github.com/cilium/cilium/vendor/github.com/cilium/workerpool/workerpool.go:171 +0x40

Anything else?

No response

Code of Conduct

borkmann commented 1 year ago

Cc @dylandreimerink

dylandreimerink commented 1 year ago

Thank you for the bug report. The label selectors indeed don't seem to like empty lists as evidenced by the values: Invalid value: []string(nil): for 'in', 'notin' operators, values set can't be empty error message.

However, I agree that this is an edge case that should be handled better given that we proceed to use the nil value after we have concluded it isn't valid. (The fix in this specific location is to add a continue within the if statement that checks the error, but we should also investigate the rest of the package for similar oversights)

Looking at the code it seems like this bug is also present in v1.12.