What does this PR do / Why do we need it:
This PR adds a pod mutating webhook to inject the controller's pod readiness gate.
One of the biggest changes is that webhooks must be network-reachable by the API server. Previously, the controller only required communication from itself to the API server. This means local testing against a remote cluster is very challenging. Most of my testing was done by pushing a custom ECR image for the controller and deploying it to my test cluster.
Webhook invocation is only over TLS, so the controlller must now present a TLS certificate to the API server. This certificate is validated by its DNS name(s) and must also pass trust validation checks. More details are in the pod-readiness-gates.md file.
For the TLS cert we require a standard kubernetes tls secret called webhook-cert. It must be present in the same namespace as the controller itself. The certificate is mounted as read-only by the controller container at /etc/webhook-cert.
By default, the webhook includes a namespace label selector, meaning the webhook will only be called for namespaces which include the label aws-application-networking-k8s/pod-readiness-gate-inject with value enabled.
Otherwise, this work is based off the webhook work in aws-load-balancer-controller and aws-app-mesh-controller-for-k8s. Note that, unlike the load balancer controller, the webhook configuration here is created by default. Note however, the webhook logic is only enabled by the namespace label described above, otherwise the webhook will not execute or inject the controller readiness gate.
What is left to be done?
These will follow in separate PRs:
The actual pod readiness gate implementation
Helm chart update to support the webhook and configure the TLS certificate
Update to build scripts to automatically run webhook e2e tests
Testing done on this change:
The general testing process:
# after build and deploy image to ECR
# deploy controller (and webhook) to cluster
kubectl create -f deploy.yaml
# recreate the webhook secret
kubectl delete secret tls $WEBHOOK_SECRET_NAME --namespace $WEBHOOK_NAMESPACE
kubectl create secret tls $WEBHOOK_SECRET_NAME --namespace $WEBHOOK_NAMESPACE --cert=${CERT_FILE} --key=${KEY_FILE}
# patch the webhook with the cert details to enable trust from API server
kubectl patch mutatingwebhookconfigurations.admissionregistration.k8s.io aws-appnet-gwc-mutating-webhook --namespace $WEBHOOK_NAMESPACE --type='json' -p="[{'op': 'replace', 'path': '/webhooks/0/clientConfig/caBundle', 'value': '${CERT_B64}'}]"
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-appnet-gwc-mutating-webhook patched
# run manual tests OR webhook-e2e-tests
make webhook-e2e-test
Automation added to e2e:
Since webhooks must be invoked by the API server, they do not fit into the standard make e2e-test, at least for local development and testing. Instead, I have created a new make target make webhook-e2e-test. These tests can be run in a local dev environment against a remote cluster once the controller is deployed and the webhook secret updated.
make webhook-e2e-test
=== RUN TestIntegration
Running Suite: WebhookIntegration - /.../aws-application-networking-k8s/test/suites/webhook
===============================================================================================================
Random Seed: 1709056486
Will run 2 of 2 specs
------------------------------
[SynchronizedBeforeSuite]
/.../aws-application-networking-k8s/test/suites/webhook/suite_test.go:23
[SynchronizedBeforeSuite] PASSED [0.000 seconds]
------------------------------
Readiness Gate Inject create deployment in untagged namespace, no readiness gate
/.../aws-application-networking-k8s/test/suites/webhook/readiness_gate_inject_test.go:35
{"level":"info","ts":"2024-02-27T09:54:46.310-0800","caller":"test/framework.go:264","msg":"Deleting objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-02-27T09:54:46.409-0800","caller":"test/framework.go:282","msg":"Waiting for NotFound, objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-02-27T09:54:46.425-0800","caller":"test/framework.go:186","msg":"Creating objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-02-27T09:54:46.447-0800","caller":"test/framework.go:186","msg":"Creating objects: *v1.Deployment/sparr-1-inmjcpummz"}
{"level":"info","ts":"2024-02-27T09:54:46.506-0800","caller":"test/pod_manager.go:68","msg":"deployment.Spec.Selector.MatchLabels: map[app:untagged-test-pod]"}
• [0.215 seconds]
------------------------------
Readiness Gate Inject create deployment in tagged namespace, has readiness gate
/.../aws-application-networking-k8s/test/suites/webhook/readiness_gate_inject_test.go:56
{"level":"info","ts":"2024-02-27T09:54:46.526-0800","caller":"test/framework.go:186","msg":"Creating objects: *v1.Deployment/finge-2-hicyysf5br"}
{"level":"info","ts":"2024-02-27T09:54:46.569-0800","caller":"test/pod_manager.go:68","msg":"deployment.Spec.Selector.MatchLabels: map[app:tagged-test-pod]"}
{"level":"info","ts":"2024-02-27T09:54:46.588-0800","caller":"test/framework.go:264","msg":"Deleting objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-02-27T09:54:46.608-0800","caller":"test/framework.go:282","msg":"Waiting for NotFound, objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
• [11.283 seconds]
------------------------------
[SynchronizedAfterSuite]
/.../aws-application-networking-k8s/test/suites/webhook/suite_test.go:39
[SynchronizedAfterSuite] PASSED [0.000 seconds]
------------------------------
Ran 2 of 2 Specs in 11.500 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestIntegration (11.50s)
PASS
ok github.com/aws/aws-application-networking-k8s/test/suites/webhook 12.869s
Will this PR introduce any new dependencies?:
We now have a dependency on the presence of the webhook secret. Note that, if the webhook is not invoked
Will this break upgrades or downgrades. Has updating a running cluster been tested?:
I tested deploying with the previous build's deploy.yaml, then applying the new deploy.yaml which includes the webhook configuration, without issue.
Does this PR introduce any user-facing change?:
Yes
Support for automatically injecting pod readiness gate. For more details see docs/guides/pod-readiness-gates.md
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
What type of PR is this? feature
Which issue does this PR fix: https://github.com/aws/aws-application-networking-k8s/issues/596
What does this PR do / Why do we need it: This PR adds a pod mutating webhook to inject the controller's pod readiness gate.
One of the biggest changes is that webhooks must be network-reachable by the API server. Previously, the controller only required communication from itself to the API server. This means local testing against a remote cluster is very challenging. Most of my testing was done by pushing a custom ECR image for the controller and deploying it to my test cluster.
Webhook invocation is only over TLS, so the controlller must now present a TLS certificate to the API server. This certificate is validated by its DNS name(s) and must also pass trust validation checks. More details are in the
pod-readiness-gates.md
file.For the TLS cert we require a standard kubernetes tls secret called
webhook-cert
. It must be present in the same namespace as the controller itself. The certificate is mounted as read-only by the controller container at/etc/webhook-cert
.By default, the webhook includes a namespace label selector, meaning the webhook will only be called for namespaces which include the label
aws-application-networking-k8s/pod-readiness-gate-inject
with valueenabled
.Otherwise, this work is based off the webhook work in aws-load-balancer-controller and aws-app-mesh-controller-for-k8s. Note that, unlike the load balancer controller, the webhook configuration here is created by default. Note however, the webhook logic is only enabled by the namespace label described above, otherwise the webhook will not execute or inject the controller readiness gate.
What is left to be done? These will follow in separate PRs:
Testing done on this change: The general testing process:
Automation added to e2e: Since webhooks must be invoked by the API server, they do not fit into the standard
make e2e-test
, at least for local development and testing. Instead, I have created a new make targetmake webhook-e2e-test
. These tests can be run in a local dev environment against a remote cluster once the controller is deployed and the webhook secret updated.Will this PR introduce any new dependencies?: We now have a dependency on the presence of the webhook secret. Note that, if the webhook is not invoked
Will this break upgrades or downgrades. Has updating a running cluster been tested?: I tested deploying with the previous build's
deploy.yaml
, then applying the newdeploy.yaml
which includes the webhook configuration, without issue.Does this PR introduce any user-facing change?: Yes
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.