aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 316 forks source link

[EKS] [request]: Simplify CNI custom networking #867

Open mikestef9 opened 4 years ago

mikestef9 commented 4 years ago

Community Note

Tell us about your request Simplify and remove certain steps required to use custom networking with VPC CNI plugin.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Custom networking is a feature that allows you to run pods using separate subnets and security groups from worker nodes, however multiple setup steps are required:

Many of these steps should be simplified and/or automated.

Additionally, documentation is limited. Should add more content like this to EKS docs.

stevehipwell commented 3 years ago
  • ENIConfigs must be created for each availability zone. There should be an option to auto discover these subnets based on tags.

Is there any news on the above point? In a real world cluster with both public and private subnets an ENIConfig per AZ isn't enough, one is needed per subnet. To do this currently you need to use a dynamic label to use multi AZ ASGs.

mikestef9 commented 3 years ago

Want to get some feedback on what we are thinking here.

For subnets where you want pods to run, tag the subnet with key vpc-cni:pod-subnet and valued shared. Set AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG to True.

The VPC CNI plugin will periodically make a DescribeSubnets API call, and filter by the VPC ID of the cluster, as well as by subnets having tag key vpc-cni:pod-subnet. The plugin will loop through each subnet returned, and a create/update a map of availability zone to subnets.

When a new node is launched, and AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG is set to True, behavior will initially remain the same, with CNI plugin looking for an ENIConfig. If found it will use that configuration to provision additional ENIs on the node.

If no ENIConfig is found, CNI plugin will query the map from the previous step, and lookup all the subnets based on the availability zone of the worker node.

The subnet field in ENIConfig will be made optional. If you are ok with having security groups copied from the primary ENI to secondary ENIs, then ENIConfig is no longer required at all in this proposal. But if you do care about different security groups as well, you can still specify them in the ENIConfig and use a node label or annotation to point to that ENIConfig like today. The upside with this is there is no AZ dependency, and there can be a single ENIConfig that could potentially be used for all nodes if only security groups need to specified. Further, security groups for pods also works with custom networking, so you can leverage that feature to specify even more fine grained security groups if needed.

Open Questions: How to pick which subnet to use if multiple are found per AZ of the worker node? Some initial ideas below

Please let us know any feedback on this idea, or feel free to suggest any other ideas you feel would help simplify your workflow using custom networking today.

stevehipwell commented 3 years ago

@mikestef9 here's a couple of feedback points after implementing this with the current options.

I'm interested in why the current docs and future plans are AZ based instead of subnet based, which would match the reference architecture? Our requirements involve linking a separate secondary subnet to our public and private subnets. Currently we need to dynamically label our nodes (with the node primary subnet) to achieve this, but it would work better if this could be achieved via subnet tags without us having to add any node specific logic. An extension of the above pattern to use both the vpc-cni:pod-subnet=shared tag to enable the logic and vpc-cni:pod-subnet-for-worker-subnet=subnet-xxxxx to link a secondary subnet to the worker's primary subnet.

I'm also interested if it would be possible to have custom networking enabled but only for the nodes with the label set; or not lose the primary ENI if the custom networking refers back to the node primary subnet.

Finally it would be good if the max pods value could be set dynamically as the required inputs for the calculation are present here.

jwenz723 commented 3 years ago

I like the option:

Have the value of the tag on the subnet be an integer instead of shared, and use that value as priority sorting mechanism.

This seems to be the most flexible.

This option could be integrated with either the Random option or the Look at availableIpAddressCount of each subnet and for each node, choose the one with the most free IPs. option by saying that if more than 1 subnet have the same numerical rank (i.e. if subnet-A and subnet-B both have the value vpc-cni:pod-subnet=1) then the secondary strategy will be performed (Random or availableIpAddressCount).

yoanisgil commented 3 years ago

Are you also considering adding a feature flag that enables toggling on/off CNI custom networking? This is the trickiest thing to do in the project I'm working on, as the EKS module from terraform has no way of exposing such functionality (because there is no exposure to the configuration of the aws-node daemonset from EKS).

1mamute commented 3 years ago

This is an absolute must. Configuring VPC CNI is challenging and introduces a lot of overhead to operators. If you deployed the VPC CNI via EKS add-on and tweak a setting, you need to patch the aws-node and restart the deamonset manually.

cazlo commented 1 year ago

A lot of this complexity is encapsulated in https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/vpc-cni-custom-networking/main.tf

Summary of the "extra complexity" AWS end-users must manage when applying custom networking to worker nodes:

This "extra complexity" negatively impacts the reliability of systems utilizing custom networking. For example, the continued functionality of pods and scaling behavior depends on the user-managed ENIConfig resources to be available and correctly configured. If we play the "chaos engineering" role for a minute and take away 1 ENIConfig, it will totally break the network functionality of new nodes spun up.

Additionally, custom networking is not supported on Windows worker nodes. However, it requires "cluster level" configuration changes to VPC CNI (setting env variables in the aws-node daemonset). This seemingly precludes "safe" custom networking use for mixed-OS clusters workload use cases.

As custom networking provides advantages with regard to:

The aws-ia examples and documentation available at https://aws.github.io/aws-eks-best-practices/networking/index/ have greatly helped with this process, however there is still much complexity to manage to use custom networking.

To make this process easier for future devs I would love to see the following:

stevehipwell commented 1 year ago

@cazlo not that this answers your main concerns but it might help you out. You shouldn't need the kubectl binary on your machine to use the kubectl Terraform provider, you'd only need it if you have to run arbitrary commands via either a provisioner or a shell resource.

I think IP prefix mode should be the default behaviour for the VPC CNI which would solve a lot of the configuration issues out of the box. Custom networking could also be defaulted to using a tagging strategy like I suggested above. Then if the node ENI IPs are no longer a constraint node bootstrap shouldn't care about the networking specifics; which is useful as by definition bootstrap can't see the K8s components until it's configured and connected.

On your other points (I'm completely ignoring Windows here) you are constrained by the terraform-aws-eks module, which underpins the blueprints module, and the EKS managed addons. If you're not using managed node groups you should be able to get around most of this by using self-managed addons.

autarchprinceps commented 1 year ago

Setting ENI_CONFIG_LABEL_DEF via EKS addon isn't even supported, only AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG. I'm not sure under what circumstances an addon throws away manual changes on envs, a simple version update thankfully doesn't seem to, but if you offer the option to set envs via addon, you should at least support all those featured in official docs.aws.amazon.com userguides.

jayasuryakumar-dh commented 11 months ago

I am following this document https://repost.aws/knowledge-center/eks-custom-subnet-for-pod to use IPs from ENIConfig subnet than node subnet for the pods.

Below are my specs: EKS cluster version: 1.24 amazon-k8s-cni-init:v1.15.0-eksbuild.2 amazon-k8s-cni:v1.15.0-eksbuild.2

I follow the same steps in the document. 1) Set the env variable kubectl set env daemonset aws-node -n kube-system AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true. aws-node pods are restarted, running. 2) Created ENIConfig objects with same name as AZ(eu-west-1a, eu-west-1b, eu-west-1c) and without security groups. 3) I wanted to automatically label new nodes with ENIConfig object.

But when I set the following env variable kubectl set env daemonset aws-node -n kube-system ENI_CONFIG_LABEL_DEF=topology.kubernetes.io/zone the aws-node pods are restarted and failing with following error.

Warning  Unhealthy  23s   kubelet            Readiness probe failed: {"level":"info","ts":"2023-10-05T13:40:15.581Z","caller":"/root/sdk/go1.20.4/src/runtime/proc.go:250","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning  Unhealthy  13s   kubelet            Readiness probe failed: {"level":"info","ts":"2023-10-05T13:40:25.587Z","caller":"/root/sdk/go1.20.4/src/runtime/proc.go:250","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning  Unhealthy  3s    kubelet            Readiness probe failed: {"level":"info","ts":"2023-10-05T13:40:35.605Z","caller":"/root/sdk/go1.20.4/src/runtime/proc.go:250","msg":"timeout: failed to connect service \":50051\" within 5s"}

Can you please share some information? Am I missing something or any extra configuration is needed?

sjastis commented 5 months ago

Appreciate your feedback on how we can simplify default experience for IP address management in VPC CNI. Starting with VPC CNI v1.18, we support automatic subnet discovery and dynamic address allocation based on IP address utilization across available subnets. To learn more, here is blog post : https://aws.amazon.com/blogs/containers/amazon-vpc-cni-introduces-enhanced-subnet-discovery/

For use cases that do not require running pods on a different subnet and using separate security groups, we believe the new feature ( also enabled by default) provides a more simpler experience. Check it out and let us know how we can improve the default experience further.