aws / aws-application-networking-k8s

A Kubernetes controller for Amazon VPC Lattice
https://www.gateway-api-controller.eks.aws.dev/
Apache License 2.0
164 stars 49 forks source link

Inject pod readiness gate #606

Closed erikfuller closed 7 months ago

erikfuller commented 7 months ago

What type of PR is this? feature

Which issue does this PR fix: https://github.com/aws/aws-application-networking-k8s/issues/596

What does this PR do / Why do we need it: This PR adds a pod mutating webhook to inject the controller's pod readiness gate.

One of the biggest changes is that webhooks must be network-reachable by the API server. Previously, the controller only required communication from itself to the API server. This means local testing against a remote cluster is very challenging. Most of my testing was done by pushing a custom ECR image for the controller and deploying it to my test cluster.

Webhook invocation is only over TLS, so the controlller must now present a TLS certificate to the API server. This certificate is validated by its DNS name(s) and must also pass trust validation checks. More details are in the pod-readiness-gates.md file.

For the TLS cert we require a standard kubernetes tls secret called webhook-cert. It must be present in the same namespace as the controller itself. The certificate is mounted as read-only by the controller container at /etc/webhook-cert.

By default, the webhook includes a namespace label selector, meaning the webhook will only be called for namespaces which include the label aws-application-networking-k8s/pod-readiness-gate-inject with value enabled.

Otherwise, this work is based off the webhook work in aws-load-balancer-controller and aws-app-mesh-controller-for-k8s. Note that, unlike the load balancer controller, the webhook configuration here is created by default. Note however, the webhook logic is only enabled by the namespace label described above, otherwise the webhook will not execute or inject the controller readiness gate.

What is left to be done? These will follow in separate PRs:

  1. The actual pod readiness gate implementation
  2. Helm chart update to support the webhook and configure the TLS certificate
  3. Update to build scripts to automatically run webhook e2e tests

Testing done on this change: The general testing process:

# after build and deploy image to ECR 
# deploy controller (and webhook) to cluster
kubectl create -f deploy.yaml 

# recreate the webhook secret
kubectl delete secret tls $WEBHOOK_SECRET_NAME --namespace $WEBHOOK_NAMESPACE
kubectl create secret tls $WEBHOOK_SECRET_NAME --namespace $WEBHOOK_NAMESPACE --cert=${CERT_FILE} --key=${KEY_FILE}

# patch the webhook with the cert details to enable trust from API server
kubectl patch mutatingwebhookconfigurations.admissionregistration.k8s.io aws-appnet-gwc-mutating-webhook --namespace $WEBHOOK_NAMESPACE --type='json' -p="[{'op': 'replace', 'path': '/webhooks/0/clientConfig/caBundle', 'value': '${CERT_B64}'}]"
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-appnet-gwc-mutating-webhook patched

# run manual tests OR webhook-e2e-tests
make webhook-e2e-test

Automation added to e2e: Since webhooks must be invoked by the API server, they do not fit into the standard make e2e-test, at least for local development and testing. Instead, I have created a new make target make webhook-e2e-test. These tests can be run in a local dev environment against a remote cluster once the controller is deployed and the webhook secret updated.

make webhook-e2e-test
=== RUN   TestIntegration
Running Suite: WebhookIntegration - /.../aws-application-networking-k8s/test/suites/webhook
===============================================================================================================
Random Seed: 1709056486

Will run 2 of 2 specs
------------------------------
[SynchronizedBeforeSuite] 
/.../aws-application-networking-k8s/test/suites/webhook/suite_test.go:23
[SynchronizedBeforeSuite] PASSED [0.000 seconds]
------------------------------
Readiness Gate Inject create deployment in untagged namespace, no readiness gate
/.../aws-application-networking-k8s/test/suites/webhook/readiness_gate_inject_test.go:35
{"level":"info","ts":"2024-02-27T09:54:46.310-0800","caller":"test/framework.go:264","msg":"Deleting objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-02-27T09:54:46.409-0800","caller":"test/framework.go:282","msg":"Waiting for NotFound, objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-02-27T09:54:46.425-0800","caller":"test/framework.go:186","msg":"Creating objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-02-27T09:54:46.447-0800","caller":"test/framework.go:186","msg":"Creating objects: *v1.Deployment/sparr-1-inmjcpummz"}
{"level":"info","ts":"2024-02-27T09:54:46.506-0800","caller":"test/pod_manager.go:68","msg":"deployment.Spec.Selector.MatchLabels: map[app:untagged-test-pod]"}
• [0.215 seconds]
------------------------------
Readiness Gate Inject create deployment in tagged namespace, has readiness gate
/.../aws-application-networking-k8s/test/suites/webhook/readiness_gate_inject_test.go:56
{"level":"info","ts":"2024-02-27T09:54:46.526-0800","caller":"test/framework.go:186","msg":"Creating objects: *v1.Deployment/finge-2-hicyysf5br"}
{"level":"info","ts":"2024-02-27T09:54:46.569-0800","caller":"test/pod_manager.go:68","msg":"deployment.Spec.Selector.MatchLabels: map[app:tagged-test-pod]"}
{"level":"info","ts":"2024-02-27T09:54:46.588-0800","caller":"test/framework.go:264","msg":"Deleting objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-02-27T09:54:46.608-0800","caller":"test/framework.go:282","msg":"Waiting for NotFound, objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
• [11.283 seconds]
------------------------------
[SynchronizedAfterSuite] 
/.../aws-application-networking-k8s/test/suites/webhook/suite_test.go:39
[SynchronizedAfterSuite] PASSED [0.000 seconds]
------------------------------

Ran 2 of 2 Specs in 11.500 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestIntegration (11.50s)
PASS
ok      github.com/aws/aws-application-networking-k8s/test/suites/webhook   12.869s

Will this PR introduce any new dependencies?: We now have a dependency on the presence of the webhook secret. Note that, if the webhook is not invoked

Will this break upgrades or downgrades. Has updating a running cluster been tested?: I tested deploying with the previous build's deploy.yaml, then applying the new deploy.yaml which includes the webhook configuration, without issue.

Does this PR introduce any user-facing change?: Yes

Support for automatically injecting pod readiness gate. For more details see docs/guides/pod-readiness-gates.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.