Kong / kubernetes-ingress-controller

:gorilla: Kong for Kubernetes: The official Ingress Controller for Kubernetes.
https://docs.konghq.com/kubernetes-ingress-controller/
Apache License 2.0
2.22k stars 592 forks source link

Research integration tests makeover for Gateway Discovery testing #3631

Open pmalek opened 1 year ago

pmalek commented 1 year ago

Problem statement

Currently we're running integration tests by running controller manager locally: https://github.com/Kong/kubernetes-ingress-controller/blob/3ce7c14afd2099e991c4dcd3370f23ef8c742a5e/internal/util/test/controller_manager.go#L32-L101

This has worked well for some time with Proxy and Admin API LoadBalancer type services exposed (which in turn can be accessed from outside the cluster).

With Gateway Discovery being introduced, which utilizes EndpointSlice lookup of cluster internal services we have no option of reusing this workflow: locally running controller manager would be able to call k8s apiserver and get Admin API Endpoints addresses but those addresses but be unreachable when KIC would try to push configuration to them.

Proposed solution

Try to use a package or a tool that will allow running process locally but reaching cluster internal addresses.

One such tool is https://github.com/metalbear-co/mirrord which allows just that.

This approach though would require us to run controller manager as a separate process (via mirrord exec ...) and would among other things: make it impossible (or very difficult) to use debuggers to debug integration tests.

Acceptance criteria

czeslavo commented 1 year ago

Another tool that might be satisfying our needs is telepresence (its alsoProxySubnets seems like it could do the job? https://www.getambassador.io/docs/telepresence/latest/reference/config#alsoproxysubnets).

pmalek commented 1 year ago

I've spent some time with mirrord and it seems it might not be the best match for our needs since it wraps the specified binary. I managed to get as far as:

➜ 1 ~/code_/kubernetes-ingress-controller git:(main) mirrord exec --no-telemetry --target-namespace kong-system --target pod/debug ./bin/manager -- \
        --anonymous-reports=false \
        --kong-admin-svc kong-system/kong-kong-admin \
        --publish-service kong-system/kong-kong-proxy
⠦ mirrord cli starting
  ✓ ready to launch process
  ✓ layer extracted
  ✓ agent pod created
  ✓ pod is ready
INFO[0000] diagnostics server disabled
INFO[0000] starting controller manager                   commit=40cf3279 logger=setup release=v2.8.1-199-g40cf3279 repo="git@github.com:Kong/kubernetes-ingress-controller.git"
INFO[0000] the ingress class name has been set           logger=setup value=kong
INFO[0000] getting enabled options and features          logger=setup
INFO[0000] getting the kubernetes client configuration   logger=setup
W0228 19:10:43.600560   98273 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
INFO[0000] getting the kong admin api client configuration  logger=setup
W0228 19:10:43.608773   98273 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
ERRO[0000] failed to create kong client(s)               error="endpointslices.discovery.k8s.io is forbidden: User \"system:serviceaccount:kong-system:default\" cannot list resource \"endpointslices\" in API group \"discovery.k8s.io\" in the namespace \"kong-system\""

but I left the default service account issue as seen above unsolved.


Another tool I've noticed which might be better suited is: https://github.com/telepresenceio/telepresence.

This is promising, with the client installed it's as easy as:

$ telepresence helm install # installs the agent in the cluster
$ telepresence connect
$ curl kong-kong-proxy.kong-system
{"message":"no Route matched with those values"}%

with that I was able to run the manager but some flags needed to be specified additionally:

./bin/manager \
        --anonymous-reports=false \
        --kong-admin-svc kong-system/kong-kong-admin --kong-admin-svc-port-names kong-admin-tls --election-namespace kong-system --kong-admin-tls-skip-verify \
        --publish-service kong-system/kong-kong-proxy \
        --kubeconfig <YOUR_LOCAL_KUBECONFIG_FILE>

We could either use it or try to figure out how to use their go module: https://github.com/telepresenceio/telepresence/tree/release/v2/pkg/client

czeslavo commented 1 year ago

I had some thoughts on that and I suggest we postpone this makeover to the next release (or completely abandon it - more in the whys). Whys:

  1. It'll make us busy with work that doesn't really provide value in my opinion in the current situation (our integration test suite doesn't cover any specific feature that's affected by GD - these are mostly tests covering configuring particular CRDs and asserting they are configured as expected in the Gateway). Making integration tests use GD will make them more complex (the setup requires more dependencies, making the whole suite exposed to their bugs; I'd expect more flakiness could be introduced due to that).
  2. We already have a quite good test coverage of the GD on its own. We have:
    • an E2E test that ensures that the feature as a whole work as expected (we can scale proxy deployment and all of the gateways are properly configured).
    • unit tests covering the core functionality of the discovery which gets addresses from the Admin API service's EndpointSlices: 1, 2. 1️⃣ on the diagram below.
    • unit tests covering the client manager that's responsible for keeping track of the current list of gateways to be configured - both in GD and non-GD cases (1, 2, 3). 2️⃣ on the diagram below.
  3. If we want to spend more time on improving testing to feel safe releasing GD, we should rather spend it on the pieces of the diagram that lack unit test coverage (which is 4️⃣, and some parts of 1️⃣ that should be covered by https://github.com/Kong/kubernetes-ingress-controller/issues/3568; 3️⃣ is an external dependency on go-kong). Migrating integration tests to use GD won't improve this coverage as those tests are focused on other features that are not dependent on GD. I think that we could even completely abandon the idea of migrating integration tests to using GD as that will make the test suite more complex, not bringing much value given the GD is covered on other levels.

Diagram of dependencies to better surface places that GD touches: image

pmalek commented 1 year ago

Nice writeup. I do agree with most of the points and I am leaning towards pushing this issue into a later milestone for the reasons you've already mentioned.

With #3635 under review we can create an issue to track effort to improve 4️⃣ and then we can push this issue here to v2.10. @czeslavo Do you mind creating this issue and adding what areas you see there that needs improved testing?


I think that we could even completely abandon the idea of migrating integration tests to using GD as that will make the test suite more complex, not bringing much value given the GD is covered on other levels.

This I'm not completely sold on. I do believe that there's merit in researching this. For this reason I'd keep this open.

czeslavo commented 1 year ago

Yeah, I'm totally ok with leaving this issue and reconsidering it for the next milestone. 👍

For coverage of 4️⃣ I've created https://github.com/Kong/kubernetes-ingress-controller/issues/3634.