cloudnativelabs / kube-router

Kube-router, a turnkey solution for Kubernetes networking.
https://kube-router.io
Apache License 2.0
2.31k stars 468 forks source link

BGP Route Reflector Feature #192

Closed ut0mt8 closed 6 years ago

ut0mt8 commented 7 years ago

First a big thanks for this very promising project. This is what I expect to be the future of the kubernetes networking :)

kube-router could be enabled to peer with external router. This is fine. But in my understanding the external router could peer with all node/kuberouter instances. For a small cluster this is ok; but for a bigger one (say 100 nodes) it s not really doable.

First problem : it doesn't scale, a full mesh of 100 nodes plus one external routeur is a big waste, and will certainly cause some convergence issue. Second problem : for the external router point adding a node will require a reconfiguration of the bgp configuration (on a linux box with quagga/bird, this can be scripted, but on a real network equipment no).

So it will be really cool if an instance of kube-router could be specialised as a Route Reflector both for bgp scaling in the kubernetes network, and for allowng simple peering with externals routers.

Does it make sense ?

murali-reddy commented 7 years ago

@ut0mt8 thanks for your comments. Adding route reflector certianly make sense. Kube-router has to support route reflector at some time as the adoption of kube-router increase and there is need from user for large cluster deployment. I will see how much of effort it takes and priortize this feature.

ut0mt8 commented 6 years ago

I wait for this :) Currently I think I can use two RRs with two external bgp peers, disabling meshing. The only problem is that adding a new node need a manual configuration of the RRs. (I surely can automate this watching api server)

murali-reddy commented 6 years ago

Sorry this request did not make any progress. There are two critical issues ( #202 slow convergence and #220 routes getting deleted during rolling upgrades) that are affecting users. Once they are addressed I am hoping that kube-router's routing has a solid ground on which more advanced features like route reflector can be added.

ut0mt8 commented 6 years ago

202 could be handled with RRs :) doing a full mesh of >50 bgp speaker is a bad idea in every context (I was network engineer in previous job).

SEJeff commented 6 years ago

@ut0mt8 you can now do a single AS per rack model with the current kube-router, which is an alternative to using reflectors depending on your topology.

We specify which using node annotations such as:

net.kuberouter.node.bgppeer.address=10.2.3.4
net.kuberouter.node.bgppeer.asn=654321
net.kuberouter.nodeasn=654322

Your kube-router daemonset can have args such as:

    spec:
      containers:
      - args:
        - --run-router=true
        - --run-firewall=false
        - --run-service-proxy=true
        - --cluster-cidr=10.2.3.0/18
        - --nodes-full-mesh=false
        - --enable-overlay=false
        - --kubeconfig=/etc/kubernetes/kubeconfig
cjbrigato commented 6 years ago

@SEJeff, just wanted to warn that's not a private 16bit ASN you've putted here, making it a public 32bit one ;)

Adding to the issue, I tryied with such configuration but ran into nothing working at all :

[root@prdinfk8s501-adm ~]# kubectl describe node prdinfk8s501.dedale.tf1.fr|grep nodeasn
Annotations:        net.kuberouter.nodeasn=65432
[root@prdinfk8s501-adm ~]# kubectl  logs kube-router-4b4bd -n kube-system|tail -n1
E1130 16:51:10.462062       1 network_routes_controller.go:172] Failed to start node BGP server: Could not find ASN number for the node. Node needs to be annotated with ASN number details to start BGP server.

Any insight ?

SEJeff commented 6 years ago

Thanks @cjbrigato. Those are just some made up bogus numbers and ip addresses I put in there as an example.

SEJeff commented 6 years ago

@jdconti might have some pointers for you.

What other annotations do you have on this node for kube-router? What topology are you using?

murali-reddy commented 6 years ago

@cjbrigato there was breaking change. we streamlined annotations. You should use kube-router.io/node.asn=

please see https://github.com/cloudnativelabs/kube-router/blob/master/Documentation/upgrading.md#v0017-aka-v010-rc1

Also see the documentation https://github.com/cloudnativelabs/kube-router/blob/master/Documentation/bgp.md#node-to-node-peering-without-full-mesh

cjbrigato commented 6 years ago

Haaa k8s ecosystem is Always going quite fast. I Always try to monitor everything I Can, even little PRs and issues but definitely didn't see this one. Thanks a lot !

@murali-reddy I spotted an interesting one. Please tell me if It should be redirected to another issue or a new one since I'm getting quite "off-topic" here.

Consecutive to the streamlining of annotation, Here is a buggy and I guess unintended behavior :

time="2017-11-30T19:59:18Z" level=warning msg="sent notification" Data="as number mismatch expected 64512, received 65432" Key=10.247.33.255 Topic=Peer

It happens when you have both the new annotations AND the old ones, with mismatching ASNs. Shouldn't the old annotations have no more involvement in Kube-router behavior ?

Annotations:        kube-router.io/node.asn=65432
                    net.kuberouter.nodeasn=64512
murali-reddy commented 6 years ago

@cjbrigato yes net.kuberouter.nodeasn shoud not have any involvement. Try deleting the kube-router pods, so on the new pods GoBGP starts from scratch with correct ASN.

If you are still running into issue, please open a different issue.

cjbrigato commented 6 years ago

You're right. That was relicates and not annotation involvment at all. Everything is working as expected now, thanks again and I'm off with off-topic'ing this issue.

h3po commented 6 years ago

@ut0mt8

Second problem : for the external router point adding a node will require a reconfiguration of the bgp configuration (on a linux box with quagga/bird, this can be scripted, but on a real network equipment no).

Actually, I picked kube-router for my first cluster because I discovered that bgpd in the quagga distribution by cumulusnetworks can be configured to peer by subnet instead of a list of IPs. So no scripting required, no hardcoded IPs anywhere. I'm running the cumulusnetworks/quagga image, here is my jinja2 template for bgpd.conf:

hostname {{inventory_hostname}} router bgp 64513 bgp router-id {{int_ip}} neighbor dynamicnodes peer-group neighbor dynamicnodes remote-as 64512 bgp listen range {{int_cidr}} peer-group dynamicnodes

where $int_cidr is the subnet that my dhcp is serving to nodes

ut0mt8 commented 6 years ago

Interesting. This remind me the possibility of bgp unumbered. But this is not applying in my case, as by nature RR is 1 to N operation. Anyway I will look at the kuberouter code to look if it was difficult to implement, On the paper it is only a matter of gobgp automation, which kuberouter already do in most part.

Matty9191 commented 6 years ago

+1 to adding a highly available set of route reflectors.

ut0mt8 commented 6 years ago

Well I ve began to wrote something, but I ve figured out that was not as easy as I think at start.

Matty9191 commented 6 years ago

Once it is complete it will be an incredible feature. The comment @ut0mt8 made having to define all your neighbors upstream is a deal killer for us. In the interim I am going to look into @SEJeff single AS per rack solution.

ut0mt8 commented 6 years ago

When I am thinking of the implementation there a couple a question/issue I found and have no definitive answer.

First, what node/worker/running kube-router instance I should choose to be RR server ? , and by what way ? a flag ? or more elegant an annotation ? and I suppose choosing kube-router running on controller make sense ? anyway it could be configurable...

Second, and related, how other node/instance have conscience to be RR client, and how they found their RR server ? read annotation ?

I was thinking of annotate RR server with kube-router.io/rr.server and RR clients with kube-router.io/rr.client. It should work like this : each kuberouter RR server should peer with other RR server (without reflection) and peer with all RR client (with reflection enabled). On the opposite kuberoute RR client should only peer with RR server.

One thing I found annoying is that kuberouter is not really made to work outside of a pod (honestly I hate this, why launching control plane component of a k8s cluster is not possible the good old way) ?

ut0mt8 commented 6 years ago

Pseudo code include (I think I'm not too far)

var rr-client int = false
var rr-server int = false
var clusterId = 0

// find Annotations
for _, node := range nodes.Items {
  nodeIP, _ := utils.GetNodeIP(&node)
  if nodeIP.String() == nrc.nodeIP.String() {
    if clusterId, ok = node.ObjectMeta.Annotations["kube-router.io/rr-server"]; ok {
      rr-server = true
    }
    if clusterId, ok = node.ObjectMeta.Annotations["kube-router.io/rr-client"]; ok {
      rr-client = true
    }
  }
}
for _, node := range nodes.Items {

  ...

  // we are rr-client peer only with rr-server
  if rr-client {
    _, ok := node.ObjectMeta.Annotations["kube-router.io/rr-server"];
    if !ok {
      continue
    }
  }

...

    // we are rr-server peer with other rr-client with reflection enabled
    if rr-server {
      if _, ok := node.ObjectMeta.Annotations["kube-router.io/rr-client"]; ok {
        //add rr options with clusterId
        n.RouteReflector = config.RouteReflector{
                Config: config.RouteReflectorConfig{
                    RouteReflectorClient:    true,
                    RouteReflectorClusterId: config.RrClusterIdType(clusterId),
                },
                State: config.RouteReflectorState{
                    RouteReflectorClient:    true,
                    RouteReflectorClusterId: config.RrClusterIdType(clusterId),
                },
            }
      }
    }
}
ut0mt8 commented 6 years ago

I'm almost done ; I just have to deal with network policies.

murali-reddy commented 6 years ago

@ut0mt8 Awesome!! let me know when you are ready, with private image. I am happy to test.

ut0mt8 commented 6 years ago

@murali-reddy : I think I'm good.

We can test with the ut0mt8/kube-router:rr4 image. The code is here https://github.com/ut0mt8/kube-router/blob/feature_rr/app/controllers/network_routes_controller.go

My first test are positive. Any comment or feedback are welcome.

ut0mt8 commented 6 years ago

Test with dual rr for redundancy look also good. I encourage all that are interested to test and review my patch.

Ah and a little howto :)

so it work with annotations

kube-router.io/rr.client=NUMBER kube-router.io/rr-server=NUMBER

Obviously the number should be the same for adjacency to form.

Note : for an optimal setup we should review default internal bgp timer which are way too high. Note2: I think we should also review all options, like ibgp-full-mesh or per-rack-asn, I don't know if all work well together. It certainly can make crazy setup however.

murali-reddy commented 6 years ago

@ut0mt8 thanks! I will try setup and test it out. Would you mind raising a PR?

ut0mt8 commented 6 years ago

@murali-reddy done ! I'm open to any review/remark :)

murali-reddy commented 6 years ago

Fixed as part of #325.

If there are any possible improvements we will open a seperate issue.