antrea-io / antrea

Kubernetes networking based on Open vSwitch
https://antrea.io
Apache License 2.0
1.65k stars 362 forks source link

Gobgp client binary integration to support debugging #6541

Open rajnkamr opened 1 month ago

rajnkamr commented 1 month ago

Gobgp client can be used to interact with the BGP daemon.

antoninbas commented 1 month ago

I assume you mean adding the gobgp binary published as part of official releases (https://github.com/osrg/gobgp/releases) to the antrea-agent image. We should consider a few things, in particular: 1) what would be the impact on the antrea-agent image size? 2) are there viable alternatives? a) add relevant commands to antctl? b) add the binary to antrea/toolbox instead, and suggest an approach based on ephemeral containers for troubleshooting?

We need to consider the above questions before committing to a solution.

rajnkamr commented 1 month ago

We had an initial discussion but not a detailed one, Initially we would like to have official gobgp client binary as a part of agent image(IMO)

1.gobgp size is 23 MB (not much impact on agent image) 2.We will initially support few bgp antctl commands(#6209) but these commands are not enough for debugging. 3.Adding image as a part of antrea/toolbox and using ephemeral containers for debugging can also be considered , however it would need users to be familiar with using ephemeral containers !

To start with, it might be preferred to keep this binary as part of image and then later decide if we have got enough antctl commands to support debugging and add debugging support via ephemeral containers !

antoninbas commented 1 month ago

When I download the binary from https://github.com/osrg/gobgp/releases (which is probably the right approach if we include it in the image), it seems to be ~15MB:

root@e07c2a9c4d25:/# ls -l gobgp
-rwxr-xr-x 1 1001 127 15638528 Jul  1 10:46 gobgp

I agree that it should be acceptable, given that the antrea-agent image size if ~300MB (uncompressed). It's easy to keep including new things and grow the image size however, which is why IMO it is legitimate to wonder if this binary tool is really needed. We don't do this automatically for everything. For example, we do not include the wg command-line tool to troubleshoot Wireguard (I don't know if it's the best analogy), but we do include "all" OVS utility tools. In my view the main use case would be the ability to dump the full configuration of gobgp, and it seems we could do that using antctl + the gobgp API without too much effort (also useful for support bundle). If more advanced troubleshooting is needed in the field, other solutions can be used (ephemeral container or other). Am I missing something? Including more third-party binaries can also lead to additional CVE alerts that we have to triage.

Not saying this is necessary always the model to follow, but I think that Cilium is providing the necessary commands for debugging BGP in its CLI tool (https://docs.cilium.io/en/latest/cmdref/cilium-dbg_bgp/) but is not including gobgp in its cilium/cilium container image (unless I missed it), even though they also use the gobgp library for BGP functions.

rajnkamr commented 1 month ago

Using the pre-built GoBGP binary from https://github.com/osrg/gobgp/releases offers the advantage of a smaller image size, approximately 15MB, and allows us to directly download the binary rather than building it ourselves. However, there are several disadvantages to using the pre-built binary beyond just security concerns. These include reduced control over the build process and dependencies, which could mean missing out on future optimizations or specific configurations achievable with a custom build.

The GoBGP binary tool is mainly preferred for advanced live debugging, such as setting advanced log levels and monitoring. Given its critical role in maintaining connectivity, which is as important as OVS, it might be preferable to include the binary as part of the agent image to ensure robust production support.

Alternatively, using ephemeral containers is a viable solution. However, having built-in advanced debugging capabilities within the agent image could be more beneficial for real-time troubleshooting and maintaining network stability.

antoninbas commented 1 month ago

I think it's a terrible idea to manage a custom build for a third-party binary without a solid reason, at least for an OSS project like Antrea. While it offers flexibility (e.g., resolve relevant CVEs more quickly by controlling dependencies), it just becomes cumbersome. Plus we need to ensure that the binary we build works properly. Note that we don't have a custom build for CNI plugins for example.

Either we add necessary functionality to antctl (my preferred solution), or we include the official binary in the image.

rajnkamr commented 1 month ago

Given the significant effort required to manage a custom build for GoBGP, it is practical to exclude this solution. Although antctl support for BGP is already being implemented in #6209 and offers a consistent approach, using the official GoBGP binary ensures we leverage a tested and stable version. This minimizes the risk of build errors or inconsistencies and provides the immediate, robust BGP support that complements the ongoing developments in antctl.