cilium / design-cfps

Repo to store Cilium CFP design docs
Apache License 2.0
31 stars 26 forks source link

CFP: 34841 BGP Route Learning #58

Closed dswaffordcw closed 2 days ago

dswaffordcw commented 2 months ago

This PR adds a CFP for BGP Route Learning (GH Issue https://github.com/cilium/cilium/issues/34841).

joestringer commented 2 months ago

cc @cilium/sig-bgp

dswaffordcw commented 5 days ago

Hi @harsimran-pabla @YutaroHayakawa @rastislavs

I presented this proposal last week to several of the Datapath maintainers (@joestringer, Jordan Rife, Hemanth Malla, Bowei Du) at the Cilium Developer's Summit, which took place alongside KubeCon. Slides: SwaffordBGPNov2024.pdf

The feedback was unanimous -- in opposition.

I focused the discussion on your last point @harsimran-pabla regarding what benefits might be unlocked by bringing RIB programming and more generically routing decisions directly into the Dataplane.

Notes:

Concerns raised:

I then switched focus to my original proposal of packaging up an off-the-shelf BGP daemon as part of Cilium, and wiring up configuration to make it transparent. We revisited the in-line and parallel BGP-speaker deployment models. Bowei Du suggested that this could easily be accomplished by making BIRD, etc. part of the Cilium deployment-specific Helm chart. Until that day, I was envisioning the need to manage BIRD, etc., independent of K8s. @harsimran-pabla Talked about a specific example where that might even be a requirement. For CoreWeave's use-case, we don't have a dependency on BGP prior to K8s. I couldn't think of a strong argument against that model, and it reminded me that the configuration for BGP peers would be deterministic anyway.

@harsimran-pabla I know you mentioned before that BGP Route Learning was a feature your group was interested in offering. I think Ratislav also mentioned that on one of the GitHub issues. With this feedback, I'm curious to hear what your thought? From my side, I plan to test out the co-habitated model of adding BIRD to the Cilium K8s deployment.

YutaroHayakawa commented 5 days ago

I then switched focus to my original proposal of packaging up an off-the-shelf BGP daemon as part of Cilium, and wiring up configuration to make it transparent. We revisited the in-line and parallel BGP-speaker deployment models. Bowei Du suggested that this could easily be accomplished by making BIRD, etc. part of the Cilium deployment-specific Helm chart.

Hmm, in this case, users can just deploy their own DaemonSet instead of packaging it as a part of Cilium chart, no? So that they can choose whatever implementation they want FRR, Bird, any proprietary router. What kind of the value we can offer by packaging it? Could you elaborate more about and wiring up configuration to make it transparent?

dswaffordcw commented 5 days ago

@YutaroHayakawa What I had in mind would be similar to how GoBGP is used under the covers, yet the user does not need to think or be aware of it.

Cilium is already quite complex. Instead of asking the user to think about and manage the deployment separately, what I had in mind was this:

When Cilium is started with BGP enabled (bgpControlPlane.enabled), and when BGP is configured (by the presence of the Custom ResourceCiliumBGPClusterConfig): 1) a BIRD configuration is internally generated from CiliumBGPClusterConfig. Instead of configuring GoBGP peers from CiliumBGPClusterConfig, the peers defined are instead configured on the BIRD instance. Cilium's GoBGP instance(s) would peer with BIRD. BIRD would peer with the outside network. 2) a container running BIRD is started in Cilium's namespace 3) the configuration generated in step 1 is loaded into BIRD container at startup 4) Cilium's GoBGP configuration contains a single peer, one that is internally configured, to the internal BIRD container.

The cilium bgp status and cilium bgp routes CLI commands would be altered to render status from the internal BIRD container instead of Cilium's GoBGP instance, and/or extended to provide visibility to both.

In this approach, the user does not need to think about or configure BIRD. That is now an implementation detail managed by Cilium. The purpose of the BIRD instance is to be a peer in-between Cilium and the network, for the sole purpose of performing RIB programming and relaying Cilium-generated routes.

Now, what I described may be a nightmare to implement. I'll admit, I am fairly new to this side of Kubernetes. Dynamically starting a container with BIRD based on the Cilium's configuration bgpControlPlane.enabled alone may not be possible. In that case, we could always spin up an instance of BIRD but that may be wasteful.

YutaroHayakawa commented 4 days ago

Thanks for your explanation! So, basically your idea is configuring BIRD with CiliumBGPClusterConfig instead of GoBGP, but the BIRD still peers with the local GoBGP to get the route correct?

I understand your point that with the colocated BGP speaker setup, we need another orchestration for BIRD and you don't want to orchestrate two different BGP speakers. However, your idea sounds almost identical to introduce another backend BGP speaker (BIRD) to BGP Control Plane 😅. Your idea a BIRD configuration is internally generated from CiliumBGPClusterConfig is essentially what BGP Control Plane does for GoBGP.

To be fair, we tried to make BGPv2 API speaker-agnostic and we even have an abstraction layer for that hides implementation details of the underlying BGP speaker (https://github.com/cilium/cilium/blob/main/pkg/bgpv1/agent/routermgr.go). However, I'd say we're surely relying on the GoBGP's behavior implicitly, so you'll spend hard time to deal with the this abstraction.

Also, I'm not sure we're ready for introducing another speaker since it doubles the maintenance cost. For example, when we introduce any new feature, we need to make sure it works for both speakers. I'm not sure if it's worth having that cost just for importing the route.

Dynamically starting a container with BIRD based on the Cilium's configuration bgpControlPlane.enabled alone may not be possible. In that case, we could always spin up an instance of BIRD but that may be wasteful.

This is possible with Helm's subchart feature for example. You can maintain your own parent Chart that packages Cilium's Chart and you can refer to the Cilium's Helm values and conditionally render the BIRD DaemonSet.

This still doesn't solve the problem of two orchestrators. You still need to render the BIRD's configuration by yourself, but I'd say it's much easier than adopting BGP Control Plane APIs. You don't need to deal with supporting all the configuration knobs of BGP Control Plane APIs unnecessary for you.

joestringer commented 4 days ago

It was great to meet you last week @dswaffordcw , this idea to also perform route imports in addition to the route exports is an interesting use case to consider, as it also makes different assumptions about the underlying network than the existing use cases.

At least from my perspective, I'm not yet fully sure about the scope of desired changes for route learning and how far into implementing BGP we would need to go, or how much abstraction we would need to build in order to grant users the flexibility they'd like in route configuration. Considering the bird configuration for BGP references 35 RFCs and has dozens of pages of configuration guide, it's a little hard to understand what is "good enough" for an abstraction if we extend the Cilium APIs to solve this use case. Probably we need the proposal to get a bit more concrete in order to explore that question.

I thought that as an initial step, it could help to deploy a BIRD sidecar with Cilium, which would ideally provide the capabilities without too much development effort. Often folks who come to Cilium looking for BGP are already well familiar with existing BGP daemons, so in a sense I figured that this would minimize the change in configuration language for users already familiar with BGP. An example issue/discussion/blog in the community with the full configuration / steps could also help to demonstrate the use case and help us figure out exactly how much of the defined BGP functionality we are looking to implement in Cilium APIs. By itself this could already be useful, even if the usability is not yet ideal. That could then serve as a basis for furthering this CFP.

I do want to leave open the possibility to go down the track of extending Cilium BGP APIs for this use case, as long as we feel that there is sufficient interest and active maintenance involvement from the Cilium development community. Though I would defer to @YutaroHayakawa , @harsimran-pabla and @rastislavs and the other folks involved in those APIs to provide the technical guidance around integrating the functionality into the Cilium APIs.

As for having a Cilium abstraction programming into BIRD, I'm not sure whether it would make sense to try to build/maintain that functionality in Cilium since we're already maintaining a GoBGP-based backend. I understood from the discussion last week that GoBGP is currently missing the logic to program Linux routing tables, but this is seems like a solvable problem. I see GoBGP already imports vishvananda/netlink for kernel API interactions for instance. Given where Cilium developers are currently spending effort, this seems like it would be the best aligned to keep the architecture as simple as possible.

dswaffordcw commented 2 days ago

It was nice to meet you as well @joestringer! I really enjoyed your developer summit.

Thank you Joe, and also @YutaroHayakawa your detailed feedback. I agree. At this point, it seems best to experiment further with the co-deployment options and withdraw the proposal for the time being.

If you come across people requesting this functionality in the future, please direct them to this CFP. I'd love to hear from others who are interested about their use case.