aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.2k stars 316 forks source link

[EKS] [Request]: Allow on-premises / non-AWS virtual machines / other nodes to join EKS control plane #1812

Open doctorpangloss opened 2 years ago

doctorpangloss commented 2 years ago

Community Note

Tell us about your request

Better support on-premises nodes connecting to the AWS control plane.

  1. Remove the instance ID check from the AWS cloud controller in Kubernetes (see https://github.com/kubernetes/cloud-provider-aws/issues/441#issuecomment-1220874828)
  2. Create a built-in policy / role for the aws-auth configmap in EKS to permit nodes to join easier.
  3. Improve your support for different vendors' routers / VPN appliances to make it easier to peer networks with AWS VPC. For example, you should provide a correct fail-over script for Mikrotik routers, comparable to the configuration you authored for Vyatta derived devices, as per your VPN Gateway configuration.
  4. Make it possible for the control plane to reach non-VPC addresses (support case 10547190871). This issue occurs even though Reachability Analyzer shows that the control plane's ENI can correctly reach external LAN addresses.
  5. Add common CNIs like Calico and Flannel to the control plane, or improve CNI genie (https://github.com/cni-genie/CNI-Genie) to allow the control plane to reach pods that were not assigned VPC IPs and mix and match VPC and other CNI IP address assignment. (see the note in https://projectcalico.docs.tigera.io/getting-started/kubernetes/managed-public-cloud/eks)

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? The objective is to join on-premises nodes to the EKS control plane. This makes it easier to use EKS to meet demand through scaling; and it allows sophisticated users to work around out-of-date hardware and software on AWS.

Are you currently working around this issue? I am currently using on-premises nodes with EKS with some degraded capabilities, like limitations on the control plane being able to access the on-premises nodes, and the flimsiness of pretending to be an actual AWS instance.

Additional context You already support this technologically, you just need to align your product teams with this.

Tonkonozhenko commented 9 months ago

@doctorpangloss could you share a small guide on how you managed to connect on-prem nodes to EKS?

vectornguyen76 commented 9 months ago

Can you tell me about the current progress. I have same problem with you when add edge node to EKS Cluster. @doctorpangloss

doctorpangloss commented 8 months ago

@vectornguyen76 @Tonkonozhenko here you go - https://appmana.com/blog/hacking-aws-on-premises-eks-nodes

pdf commented 8 months ago

@vectornguyen76 @Tonkonozhenko here you go - https://appmana.com/blog/hacking-aws-on-premises-eks-nodes

Provisioning and retaining EC2 instances so that you can impersonate them on your external workers is a super-ugly hack, not really a solution.

doctorpangloss commented 8 months ago

Provisioning and retaining EC2 instances

On the one hand, if you are trying to use on premises nodes, I can see that it's disappointing that you'll need at least one EC2 instance anywhere.

On the other hand, if you are trying to use AWS without ever "provisioning and retaining" an EC2 instance, well, you better start believing in EC2 instances, because nearly everything in AWS is made more useful by them. You will only need one of the very cheapest possible instance to admit all your nodes, and it can be a preexisting EKS node which you will have anyway to run CoreDNS.

pdf commented 8 months ago

You will only need one of the very cheapest possible instance to admit all your nodes

Do you not need one EC2 instance per on-prem node for each kubelet to impersonate? Even if you can get away with this hack using only a single node, it's clearly a hack, and there's zero guarantee that your cluster doesn't break one day due to changes committed to the AWS provider. Not viable for production IMO.

doctorpangloss commented 8 months ago

You will only need one of the very cheapest possible instance to admit all your nodes

Do you not need one EC2 instance per on-prem node for each kublet to impersonate?

You only need at most one EC2 instance ID, that one ID will work for all your on premises nodes / it can appear multiple times on multiple workers' arguments.

Of course this is a hack, but I happily used it in production for years. It's the end user's prerogative. The limitations of how the control plane can reach your nodes is much more impactful.