cilium / cilium-cli

CLI to install, manage & troubleshoot Kubernetes clusters running Cilium
https://cilium.io
Apache License 2.0
390 stars 197 forks source link

connectivity: Add pod-to-pod-no-frag to check MTU misconfigurations #2610

Closed brb closed 5 days ago

brb commented 1 week ago

CI run - https://github.com/cilium/cilium/pull/33286. Only ci-ipsec and ci-eks failures are relevant. I will fix them in a PR for cilium/cilium, which bumps CLI vsn.

brb commented 1 week ago

Is this really the best place to have this kind of test? I don't think I understand the point of moving away from Ginkgo-style tests if we are going to port the same logic and testing pattern into the CLI.

@christarazi I get your concerns regarding flakiness. Unfortunately, with Ginkgo deprecation we don't have any other place to add such e2e tests. There was an effort to create a E2E framework during the hyperjump, but it turned to be a massive investment. Maybe this topic can be picked again by some working group.

At the same time, IMO such tests are helpful for end users to diagnose non-obvious Cilium connectivity issues (which misconfigured MTU can lead to). And what I hear from users that they extensively use cilium-cli connectivity test to rely on infra deployment correctness.

(Ginkgo/gomega as a testing framework is / was a terrible foundation for Cilium's E2E tests. The argumentation https://www.reddit.com/r/golang/comments/1azj63h/comment/ks1srp2/ resonates well with us).

brb commented 5 days ago

Couldn't it be validated at endpoint creation time that the appropriate MTU value was passed to the container?

Unfortunately, noup. In-between pod-to-pod there are multiple network devices involved (some managed by Cilium), which MTU misconfiguration might not be reflected in an endpoint's MTU.

Ideally, that should be done in a BPF program unit test IIUC

The BPF unit tests do not involve netdevs. Perhaps, we should start thinking about introducing integration tests, in which cilium-agent loads BPF progs via pkg/datapath and then we inject a packet (e.g., via libpcap). That could be a replacement for the test cases from this PR. Maybe a topic for the next hive time (if yes, which WG?)?