Open aojea opened 1 year ago
Yep, we don't support --nodeport-addresses
- https://github.com/cilium/cilium/issues/12508.
Anyway, I'd expect the device runtime detection to pick up the new IP addr, but I think it can react only to new ifaces (cc @joamaki)?
One fix would be to extend the NodePort service lookup by adding an additional lookup in a map with node local addrs, but I think there were some performance concerns (cc @borkmann @aspsk).
One fix would be to extend the NodePort service lookup by adding an additional lookup in a map with node local addrs, but I think there were some performance concerns
yeah, getting the address can be ala kube-proxy there https://github.com/kubernetes/kubernetes/pull/103116#pullrequestreview-694419686, as you may know kube-proxy reconcile every X secs, and is when it gets the new addresses ... or the netlink notify thing https://github.com/vishvananda/netlink/blob/v1.1.0/addr_linux.go#L301 ... I don't know well how cilium needs to cascade this information to the Service.
I don't think that is critical the time between an address is added to the system to the time the service is updated, is just that new addresses are never added ... and adding addresses to loopback is a common practice
Anyway, I'd expect the device runtime detection to pick up the new IP addr, but I think it can react only to new ifaces (cc @joamaki)?
Agree, especially if this is works after restarting the cilium agent, it will be a very bad experience if users have to restart the cilium-agent
everytime they want to add a new IP to the nodes.
Yep currently we only recompute the nodeport frontends when new devices are added or removed (when --enable-runtime-device-detection
is on). We're working on refactoring the service load-balancing control-plane and supporting changing IP addresses will be fairly easy to do as part of this work. I can't yet give good estimates on when this would land as it's fairly long list of changes we need to churn through.
. I can't yet give good estimates on when this would land as it's fairly long list of changes we need to churn through.
knowing that is in your radar is a good thing, if you have workable items to speed this up please let me know
@joamaki can we close this as fixed by https://github.com/cilium/cilium/pull/24677?
@joamaki can we close this as fixed by #24677?
Oops, forgot to reply to this. No we can't close it yet. #24677 added the facilities to react to these changes and in addition to that we'll need to change bunch of places to deal with 1) changing devices and addresses, 2) allow more than one IP per device for NodePort (which might break some use-cases). Working on these changes right now.
Same issue comes here. Any updates?
We want to use GCE loadbalancer to expose NodePort apps (eg: mysql/pg tcp port) to external access, but as GCE inject local route instead of append new iface, we cannot access any NodePort ports from lbloadbalancer's ip.
@zzjin We are aiming to have this solved in v1.16. For earlier releases you can use the experimental "enableRuntimeDeviceDetection" option. Be aware that not all features support it (for example bandwidth management).
@joamaki Hi, tested with cilium 1.16.0-pre.1
&&enableRuntimeDeviceDetection: true
, but still get no work.
Seems that GCE assign LB ip into local ip route table. Which means there still one iface with only one ip, and one more ip route bound with it.
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 1000
link/ether 42:01:0a:92:00:04 brd ff:ff:ff:ff:ff:ff
altname enp0s4
inet 10.146.0.4/32 metric 100 scope global dynamic ens4
valid_lft 3580sec preferred_lft 3580sec
inet6 fe80::4001:aff:fe92:4/64 scope link
valid_lft forever preferred_lft forever
3: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1460 qdisc noqueue state UP group default qlen 1000
link/ether ba:16:e6:26:59:89 brd ff:ff:ff:ff:ff:ff
inet6 fe80::b816:e6ff:fe26:5989/64 scope link
valid_lft forever preferred_lft forever
4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1460 qdisc noqueue state UP group default qlen 1000
link/ether 12:e5:47:3f:62:ad brd ff:ff:ff:ff:ff:ff
inet 100.64.18.136/32 scope global cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::10e5:47ff:fe3f:62ad/64 scope link
valid_lft forever preferred_lft forever
5: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether ea:02:80:3f:74:59 brd ff:ff:ff:ff:ff:ff
inet6 fe80::e802:80ff:fe3f:7459/64 scope link
valid_lft forever preferred_lft forever
ip route show table local
local 10.146.0.4 dev ens4 proto kernel scope host src 10.146.0.4
local 34.84.193.26 dev ens4 proto 66 scope host
local 100.64.18.136 dev cilium_host proto kernel scope host src 100.64.18.136
I can access nodeport via 10.146.0.4
but not via 34.84.193.26
@zzjin thanks a lot for testing! We currently choose the NodePort frontend IP addresses from the IPs assigned to the devices, and by default only one IPv4 and one IPv6 per device is used (private addresses preferred). This logic can now by changed with the --nodeport-addresses
option (soon also a helm value: https://github.com/cilium/cilium/pull/31672). If the address only appears in the routing table then it's not seen by Cilium.
@aojea in the description of this issue mentioned that the LB addresses should be assigned to the loopback device, in which case Cilium can be made to include them by specifying devices
(needs this fix though: https://github.com/cilium/cilium/pull/31200). In the ip addr
listing above lo
however doesn't seem to have the address assigned. Is this unexpected?
If the IP is only added as a route then we perhaps need to make pkg/datapath/tables/node_address.go
use the routing table as the source for node addresses instead of the device addresses...
@joa Manual setting --nodeport-addresses
in configmap and restart cilium still not works.
Verified this myself now, GCE load-balancer is indeed added only as a route in the local table. The solution I think is as I said above, we need to derive the addresses for NodePort use from the routing table and not from the addresses. Effectively the change would be to switch pkg/datapath/tables/node_address.go
to look at both the Table[Device]
(for selecting what devices we care about) and Table[Route]
for the local host-scoped routes. We'll see if we can land also this change for v1.16, but can't promise it. (EDIT: Did not make it)
I am a bit puzzled by the kube-proxy code which does net.InterfaceAddrs()
which in turn only interrogates the addresses. Do I understand it correctly that kube-proxy doesn't support this either? Or it works regardless of its nodeport address logic not picking up on it?
Is there an existing issue for this?
What happened?
A Service NodePort type exposes the Pods backends via a NodePort on the node addresses, forwarding any traffic from the frontend given by NodeAddresses:NodePort.
The key here is the concept of Node Addresses, after a quick skim to the code, it seems that cilium only consider the Node Addresses that are associated to the Service (ExternalIP, LoadBalancerIP) or the ones associated to the node object Node.Status.InternalIP or ExternalIP, but there are situations that a node can get new IPs and those are not reflected in Kubernetes. Advanced networking kubernetes users use to have more complex setups with multiple NICs, and they need to receive all the NodePort traffic on all the nics, or. just some subset of them.
Kube-proxy has a flag that allow users to define the range of addresses they want to specify to use for NodePorts, it defaults to all local address, and everytime that a new address is added to the host, is considered for the NodePort if it matches the range
I expect cilium to behave the same as kube-proxy, so new addresses on the nodes can be used to receive NodePort traffic, and also have the capacity to define which ranges of the local addresses should be allowed.
How to repro
You can repro with GKE: create a NodePort service and manually create a GCE loadbalancer. GCE load balancers add the LB frontend IP to the
loopback
interface of all the nodes, and then route the traffic to that IP through the nodes. The observation is that all the healthchecks directed to the GCE LB IP and the NodePort are dropped, , if my intuition is correct, it seems this new IP is not considered by cilium, hence it will not forward the traffic to it.However, it is really easy to repo with Kind
Create a kind cluster with cilium
Create a nodeport service with working backends
Create and test the Service is working
Add a new IP to the eth0 interface and it fails now