The upstream sig-network tests (curated by the kubernetes organization):
Although we cant deprecate these tests - b/c of their inertia, we need a better tool to diagnose Service / kube-proxy implementations across providers. Examples of such providers are:
The implementations of kube proxy diverge over time, knowing wether loadbalancing is failing due to the source or target of traffic, wether all node proxies or just a few are broken, and wether configurations like node-local endpoints, or terminating endpoint scenarios are causing specific issues, becomes increasingly important for comparing services.
To solve the problem of how we can reason about different services, on different providers, with a low cognitive overhead, we'll Visualize service availability in a k8s cluster using... Tables!
With table tests we can rapidly compare what may or may not be broken and generate hypothesis, as is done in the network policy tests in upstream k8s, https://kubernetes.io/blog/2021/04/20/defining-networkpolicy-conformance-cni-providers/.
As an example, the below test probes 3 endpoints, from 3 endpoints. Assuming uniform spreading of services, this can detect potential problems in service loadbalancing that are specific to nodes sources / targets...
- name-x/a name-x/b name-x/c
name-x/a . X X
name-x/b . X X
name-x/c . X X
In this case, we can see that pods "b" and "c" in namespace x are not reachable from ANY pod, meaning that theres a problem
with any kube-proxy , on any node, i.e. the loadbalancing is fundamentally not working (the Xs
are failures).
This suite of tests validates objects rules and various scenarios using Kubernetes services as a way of control tests heuristics, as for now the following tests are available:
Covers features like: hairpin, session affinity, headless service, hostNetwork, connections via TCP and UDP, NodePortLocal, services with annotations, etc...
This is just an initial experimental repo but we'd like to fully implement this as a KEP and add test coverage to upstream K8s, if there is consensus in the sig-network group for this.
Create a local cluster using the Kind, make sure your cluster have 1+ nodes, install MetalLB:
$ kind create cluster --config=hack/kind-multi-worker.yaml
$ hack/install_metallb.sh
To run the tests directly you can use:
$ make test
To build the binary and run it, use:
$ make build
$ ./svc-test
install sonobuoy: https://github.com/vmware-tanzu/sonobuoy#installation
$ make sonobuoy-run
after finished
$ make sonobuoy-retrieve
The binary supports flags to run only UDP stale endpoints, examples:
go test -v ./tests/ -labels="type=udp_stale_endpoint"
Other flags include -debug
for verbose output and -namespace
for pick one to run tests on, when not specified
a new random namespace is created.
Download the Kubernetes repository and build the tests binary
$ make WHAT=test/e2e/e2e.test
+++ [1103 15:29:10] Building go targets for linux/amd64:
./vendor/k8s.io/code-generator/cmd/prerelease-lifecycle-gen
> non-static build: k8s.io/kubernetes/./vendor/k8s.io/code-generator/cmd/prerelease-lifecycle-gen
Generating prerelease lifecycle code for 28 targets
+++ [1103 15:29:13] Building go targets for linux/amd64:
./vendor/k8s.io/code-generator/cmd/deepcopy-gen
> non-static build: k8s.io/kubernetes/./vendor/k8s.io/code-generator/cmd/deepcopy-gen
Generating deepcopy code for 236 targets
+++ [1103 15:29:19] Building go targets for linux/amd64:
./vendor/k8s.io/code-generator/cmd/defaulter-gen
> non-static build: k8s.io/kubernetes/./vendor/k8s.io/code-generator/cmd/defaulter-gen
Generating defaulter code for 94 targets
+++ [1103 15:29:26] Building go targets for linux/amd64:
./vendor/k8s.io/code-generator/cmd/conversion-gen
> non-static build: k8s.io/kubernetes/./vendor/k8s.io/code-generator/cmd/conversion-gen
Generating conversion code for 130 targets
+++ [1103 15:29:38] Building go targets for linux/amd64:
./vendor/k8s.io/kube-openapi/cmd/openapi-gen
> non-static build: k8s.io/kubernetes/./vendor/k8s.io/kube-openapi/cmd/openapi-gen
Generating openapi code for KUBE
Generating openapi code for AGGREGATOR
Generating openapi code for APIEXTENSIONS
Generating openapi code for CODEGEN
Generating openapi code for SAMPLEAPISERVER
+++ [1103 15:29:47] Building go targets for linux/amd64:
test/e2e
> non-static build: k8s.io/kubernetes/test/e2e
There will be a new compiled binary under _ouput/bin/e2e.test
with around 150Mb, you
can use it to run your sig-network tests as:
_output/bin/e2e.test -ginkgo.focus="\[sig-network\]" \
-ginkgo.skip="\[Feature:(Networking-IPv6|Example|Federation|PerformanceDNS)\]|LB.health.check|LoadBalancer|load.balancer|GCE|NetworkPolicy|DualStack" \
--provider=local \
--kubeconfig=.kube/config
We use kubeconfig fetched from below, whichever works from the top of the list:
$HOME/.kube/config
This test requires mode than 1 Nodes, and it guarantees the proper spread of pods across the existent nodes creating len(nodes) pods. The examples above have 4 nodes (3 workers + 1 master).
On this example we have a pod-1
, pod-2
, pod-3
and pod-4
, the first lines of the probe.go in the logging
shows pod-1
probing the other probes on port 80
, this probe is repeat across all other pods, the reachability
matrix shows the result of the connections outcomes.
{"level":"info","ts":1621793385.240396,"caller":"manager/helper.go:45","msg":"Validating reachability matrix... (== FIRST TRY ==)"}
{"level":"info","ts":1621793385.3714652,"caller":"manager/probe.go:114","msg":"kubectl exec pod-1 -c cont-80-tcp -n x-12348 -- /agnhost connect s-x-12348-pod-3.x-12348.svc.cluster.local:80 --timeout=5s --protocol=tcp"}
{"level":"info","ts":1621793385.3720098,"caller":"manager/probe.go:114","msg":"kubectl exec pod-1 -c cont-80-tcp -n x-12348 -- /agnhost connect s-x-12348-pod-2.x-12348.svc.cluster.local:80 --timeout=5s --protocol=tcp"}
{"level":"info","ts":1621793385.385194,"caller":"manager/probe.go:114","msg":"kubectl exec pod-1 -c cont-80-tcp -n x-12348 -- /agnhost connect s-x-12348-pod-1.x-12348.svc.cluster.local:80 --timeout=5s --protocol=tcp"}
{"level":"info","ts":1621793385.4150555,"caller":"manager/probe.go:114","msg":"kubectl exec pod-1 -c cont-80-tcp -n x-12348 -- /agnhost connect s-x-12348-pod-4.x-12348.svc.cluster.local:80 --timeout=5s --protocol=tcp"}
...
// repreat for all pods
reachability: correct:16, incorrect:0, result=true
52 <nil>
expected:
- x-12348/pod-1 x-12348/pod-2 x-12348/pod-3 x-12348/pod-4
x-12348/pod-1 . . . .
x-12348/pod-2 . . . .
x-12348/pod-3 . . . .
x-12348/pod-4 . . . .
176 <nil>
observed:
- x-12348/pod-1 x-12348/pod-2 x-12348/pod-3 x-12348/pod-4
x-12348/pod-1 . . . .
x-12348/pod-2 . . . .
x-12348/pod-3 . . . .
x-12348/pod-4 . . . .
176 <nil>
comparison:
- x-12348/pod-1 x-12348/pod-2 x-12348/pod-3 x-12348/pod-4
x-12348/pod-1 . . . .
x-12348/pod-2 . . . .
x-12348/pod-3 . . . .
x-12348/pod-4 . . . .