aws / aws-application-networking-k8s

A Kubernetes controller for Amazon VPC Lattice
https://www.gateway-api-controller.eks.aws.dev/
Apache License 2.0
175 stars 50 forks source link

Update Helm chart to include pod mutating webhook for readiness gates #612

Closed erikfuller closed 8 months ago

erikfuller commented 8 months ago

What type of PR is this? feature

Which issue does this PR fix: https://github.com/aws/aws-application-networking-k8s/issues/596

What does this PR do / Why do we need it:

  1. Updates Helm install to include pod readiness gate webhook.
  2. For manual installs, defers cert generation to install time and does NOT enable webhook by default
  3. New scripts in scripts/ to provision webhook TLS certificate and enable the webhook
  4. Include webhook-e2e-test in github workflow
  5. Doc updates

Since the controller cannot start without a valid TLS secret in place AND we cannot provision a cert at build time without confusing github with "secret" material, we now disable the webhook by default for manual installs using deploy.yaml. Instead, I have created new scripts (gen-webhook-secret.sh and patch-deploy-yaml.sh) to help provision the certificate and configure the webhook. The webhook is still enabled by default in the Helm chart.

Otherwise, the github workflow looks like it's currently broken, so there is a small risk I may be breaking it more.

Testing done on this change: Installed locally built Helm chart. Tested with and without setting new environment variable setting.

helm package helm
tar xvf aws-gateway-controller-chart-v1.0.3.tgz

helm install test-release ./aws-gateway-controller-chart --set=serviceAccount.create=true --namespace aws-application-networking-system --set=log.level=debug --set=awsRegion=$AWS_REGION --set=clusterVpcId=$CLUSTER_VPC_ID --set=awsAccountId=$AWS_ACCOUNT_ID --set clusterName=$CLUSTER_NAME --set=defaultServiceNetwork=test-gateway
NAME: test-release
LAST DEPLOYED: Mon Mar 11 19:30:22 2024
NAMESPACE: aws-application-networking-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
aws-gateway-controller-chart has been installed.
This chart deploys "<redacted>.dkr.ecr.us-west-2.amazonaws.com/controller:".

Check its status by running:
  kubectl --namespace aws-application-networking-system get pods -l "app.kubernetes.io/instance=test-release"

The controller is running in "cluster" mode.

Ran webhook e2e-tests

make webhook-e2e-test                                                                       
=== RUN   TestIntegration
Running Suite: WebhookIntegration - /.../aws-application-networking-k8s/test/suites/webhook
===============================================================================================================
Random Seed: 1710185534

Will run 2 of 2 specs
------------------------------
[SynchronizedBeforeSuite] 
/.../aws-application-networking-k8s/test/suites/webhook/suite_test.go:23
[SynchronizedBeforeSuite] PASSED [0.000 seconds]
------------------------------
Readiness Gate Inject create deployment in untagged namespace, no readiness gate
/.../aws-application-networking-k8s/test/suites/webhook/readiness_gate_inject_test.go:42
{"level":"info","ts":"2024-03-11T12:32:15.506-0700","caller":"test/framework.go:282","msg":"Waiting for NotFound, objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-03-11T12:32:15.526-0700","caller":"test/framework.go:186","msg":"Creating objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-03-11T12:32:15.621-0700","caller":"test/pod_manager.go:68","msg":"deployment.Spec.Selector.MatchLabels: map[app:untagged-test-pod]"}
{"level":"info","ts":"2024-03-11T12:32:25.694-0700","caller":"test/pod_manager.go:68","msg":"deployment.Spec.Selector.MatchLabels: map[app:untagged-test-pod]"}
• [11.019 seconds]
------------------------------
Readiness Gate Inject create deployment in tagged namespace, has readiness gate
/.../aws-application-networking-k8s/test/suites/webhook/readiness_gate_inject_test.go:62
{"level":"info","ts":"2024-03-11T12:32:25.772-0700","caller":"test/pod_manager.go:68","msg":"deployment.Spec.Selector.MatchLabels: map[app:tagged-test-pod]"}
{"level":"info","ts":"2024-03-11T12:32:25.798-0700","caller":"test/framework.go:264","msg":"Deleting objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-03-11T12:32:25.825-0700","caller":"test/framework.go:282","msg":"Waiting for NotFound, objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
• [11.325 seconds]
------------------------------
[SynchronizedAfterSuite] 
/.../aws-application-networking-k8s/test/suites/webhook/suite_test.go:39
[SynchronizedAfterSuite] PASSED [0.000 seconds]
------------------------------

Ran 2 of 2 Specs in 22.345 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestIntegration (22.35s)
PASS
ok      github.com/aws/aws-application-networking-k8s/test/suites/webhook   23.032s

Automation added to e2e: n/a

Will this PR introduce any new dependencies?: No

Will this break upgrades or downgrades. Has updating a running cluster been tested?: By design, the webhook does not install to the controller namespace

kubectl get mutatingwebhookconfigurations.admissionregistration.k8s.io aws-appnet-gwc-mutating-webhook         
NAME                              WEBHOOKS   AGE
aws-appnet-gwc-mutating-webhook   1          8m18s

If a namespace is tagged with application-networking.k8s.aws/pod-readiness-gate-inject=enabled and the webhook exists BUT the controller has been downgraded to a pre-webhook version, you will not be able to successfully create new pods. To work around this issue, you would need to remove the namespace tag or delete the webhook, then also re-launch any pods or manually set the readiness gate value.

Does this PR introduce any user-facing change?: Yes, but it is covered in the main PR https://github.com/aws/aws-application-networking-k8s/pull/606

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

erikfuller commented 8 months ago

Have some additional work coming - converting back to draft while I finalize.