aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 321 forks source link

[EKS] Webhook as a Lambda #747

Open jasonrichardsmith opened 5 years ago

jasonrichardsmith commented 5 years ago

We can run webhooks as lambdas instead of a persistence service continually running in cluster.

I would like to add this as a deployment option. Links for reference: https://github.com/aws-samples/amazon-api-gateway-mutating-webhook-for-k8 https://github.com/nbrandaleone/eksClient/ https://github.com/kelseyhightower/denyenv-validating-admission-webhook

The webhook pkg just needs to be broken out here: https://github.com/aws/aws-app-mesh-inject/blob/master/pkg/webhook/server.go#L132

so we can handle http and lambda in the same fashion.

Looking for feedback before I start.

stefanprodan commented 5 years ago

This means that a Lambda incident would impact a Kubernetes cluster as new pods would be created without sidecars. Another issue is latency as the Kubernetes API will have to cross vpcs just to reach the function not to mention cold starts. Running the webhook outside Kubernetes would prevent us from implementing the Injector CRD since the webhook will not be able to use Kubernetes informers.

jasonrichardsmith commented 5 years ago

These are good points.

a Lambda incident would impact a Kubernetes cluster as new pods would be created without sidecars

this could be a problem no matter where the webhook resides.

Another issue is latency as the Kubernetes API will have to cross vpcs just to reach the function not to mention cold starts.

The API server is already crossing VPCs to reach the webhook. https://eksworkshop.com/introduction/eks/eks_high_architecture/ Coldstarts for Go are around .5 seconds which is not bad, considering most pod start time relies on downloading images, and admission controllers have a 30 second timeout.

Running the webhook outside Kubernetes would prevent us from implementing the Injector CRD since the webhook will not be able to use Kubernetes informers.

I am not sure what this means. The webhook itself is only supposed to do one thing, take a pod spec, check if a mutation is in order and perform mutation if it is. Very close to stdin stdout functionality. Lambda would be a good fit here.

Also if your pods are long lived and you are not mutating once every 4 hours or less than that, having a pod running continuously seems like a waste of cluster resources. This also moves some of the cluster control out of the cluster. Thinking long term if a user introduces more dynamic admission controllers they start to add up on cluster overhead for services that may not be called that often.

jaypipes commented 4 years ago

@jasonrichardsmith all good points. I think it would be interesting to see a webhook implemented as a Lambda. For a few long-lived pods, sure, there might be some cost savings by using a Lambda, though I don't believe admission webhooks/controllers are the source of too much resource consumption in the control plane at this point. Still, would be a cool experiment to see!

sftim commented 2 years ago

Perhaps Lambda function URLs make this possible?

Would be nice to have a ValidatingAdmissionAWSLambda CRD in a cluster so that the cluster's IAM role directly invokes the Lambda, and a resource policy on the Lambda can restrict the function so that no other principal is able to execute it.