cloudnativelabs / kube-router

Kube-router, a turnkey solution for Kubernetes networking.
https://kube-router.io
Apache License 2.0
2.29k stars 465 forks source link

Node Annotations are Only Evaluated on Startup and not Actively Watched During Runtime #676

Open Feder1co5oave opened 5 years ago

Feder1co5oave commented 5 years ago

On an already existing kubernetes cluster with kube-router successfully running in route reflector mode (rr.server annotation on 3 master nodes, rr.client annotation on all the workers), my workflow for joining new worker nodes is like this (the cluster was created with kubeadm):

for each new node:

Turns out the nodes previously joined to the cluster will receive routes for all the new nodes, but the new nodes will only receive routes for the "old" nodes. Restarting kube-router on the route reflector nodes solved this issue.

My troubleshooting suggested that whenever a new node joins, the rr server kube-router daemons will peer with that node right away, even before the rr.client annotation gets added to it. Thus, the new peer is assumed to be in full mesh mode even if in fact it will be in RR mode and will only peer with RR servers. BGP route reflector allows for BGP daemons in the same AS to peer in either full-mesh or RR mode. Export policies are such that RR servers will reflect advertisements:

So the new nodes, assumed to be forming full-mesh, don't get advertisements about other new nodes. Restarting RR servers forces them to reload the node list and annotations, and correctly peer with the new nodes in RR mode.

To fix this, kube-router should watch for annotation changes on nodes, and update its internal information about which nodes are forming full-mesh, and which are joined to a RR cluster. While waiting for a fix to be implemented, I suggest the workaround to restart RR servers be documented for newcomers!

aauren commented 4 years ago

Closed via #677

ofen commented 4 years ago

Is it possible to make it automatically (without manual POD restart)?

murali-reddy commented 4 years ago

@ofen reopening as still below make sense to be fixed.

kube-router should watch for annotation changes on nodes, and update its internal information about which nodes are forming full-mesh, and which are joined to a RR cluster. While waiting for a fix to be implemented, I suggest the workaround to restart RR servers be documented for newcomers!

zhaixigui commented 4 years ago

Why can't it automatically recognize the newly added node? And to propagate routing。Why have to do such a dangerous operation to kill RR?

zhaixigui commented 4 years ago

Do we have any choice but to kill the RR?

Feder1co5oave commented 4 years ago

@zhaixigui the new nodes are recognized, but they are assumed to be in full mesh mode, and there's no clean way to do the switch to RR mode other than restart kube-router on both RR server and client to reload the nodes' information.

Killing RR servers should not bring any disruption as long as you enabled soft-restart. I've done it several times without any repercussions.

zhaixigui commented 4 years ago

@Feder1co5oave a bad idea? when the RR server watched a ”kube-router.io/rr.client=42“ annotation from most recently joined node. it first calls method DeleteNeighbor to delete this node, and then calls method AddNeighbor to add this node as RR client, and RR will automatically propagate this route ?

zhaixigui commented 4 years ago

@zhaixigui the new nodes are recognized, but they are assumed to be in full mesh mode, and there's no clean way to do the switch to RR mode other than restart kube-router on both RR server and client to reload the nodes' information.

Killing RR servers should not bring any disruption as long as you enabled soft-restart. I've done it several times without any repercussions.

soft-restart is GracefulRestart ?

zhaixigui commented 4 years ago

Does calico have the same problem?

Feder1co5oave commented 4 years ago

I don't know. Calico has a great reputation as network plugin and also great features. When I considered it for my clusters I found it is more complicated and has more moving parts than kube-router, so I ended up choosing the latter because of its "simplicity".

zhaixigui commented 4 years ago

@Feder1co5oave a bad idea? when the RR server watched a ”kube-router.io/rr.client=42“ annotation from most recently joined node. it first calls method DeleteNeighbor to delete this node, and then calls method AddNeighbor to add this node as RR client, and RR will automatically propagate this route ?

@Feder1co5oave Feder, Is this a bad idea? is there a better idea or solution?

Feder1co5oave commented 4 years ago

@zhaixigui not at all, in fact it is exactly the solution I proposed in my first post. But I'm not a gopher and this would involve a bit of modifications to the control flow, i.e. watching nodes. I believe we'll have to wait for a volunteer to write a PR.

sebltm commented 3 weeks ago

I think it's as easy as doing something like this: https://github.com/cloudnativelabs/kube-router/pull/1723 I've tested it in my local environment, seems to work as expected