aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 319 forks source link

[EKS] [request]: VPC endpoint support for EKS API #298

Closed tdmalone closed 1 year ago

tdmalone commented 5 years ago

Tell us about your request VPC endpoint support for EKS, so that worker nodes that can register with an EKS-managed cluster without requiring outbound internet access.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Worker nodes based on the EKS AMI run bootstrap.sh to connect themselves to the cluster. As part of this process, aws eks describe-cluster is called, which currently requires outbound internet access.

I'd love to be able to turn off outbound internet access but still easily bootstrap worker nodes without providing additional configuration.

Are you currently working around this issue?

Additional context

devonkinghorn commented 4 years ago

Is there any news on this?

michael-burt commented 4 years ago

Any updates on this issue?

mikestef9 commented 4 years ago

If you use EKS Managed Nodes, the bootstrapping process avoids the aws eks describe-cluster API call, so you can launch workers into a private subnet without outbound internet access as long as you setup the other required PrivateLink endpoints correctly.

michael-burt commented 4 years ago

Thanks Mike. Unfortunately managed nodes are not an option because they cannot be scaled to 0. We run some machine learning workloads that require scaling up ASGs with expensive VMs (x1.32xlarge) and we need to be able to scale them back to 0 once the workloads have completed.

mikestef9 commented 4 years ago

Thanks for the feedback. Can you open a separate GH issue with that feature request for Managed Node Groups?

Will keep this issue open as it's something we are researching.

dsw88 commented 4 years ago

@mikestef9 I'm interested in the managed nodes solution. What do you mean by "you can launch workers into a private subnet without outbound internet access as long as you setup the other required PrivateLink endpoints correctly"?

Which PrivateLink endpoints are you referring to? Just the other service endpoints such as SQS and SNS that the applications running on the cluster may happen to use? Or do you mean that there are particular PrivateLink endpoints required to run EKS in private subnets with no internet gateway?

mikestef9 commented 4 years ago

Hi @dsw88,

In order for the worker node to join the cluster, you will need to configure VPC endpoints for ECR, EC2, and S3

See this GH repo https://github.com/jpbarto/private-eks-cluster created by an AWS Solutions Architect for a reference implementation. Note that only 1.13 and above EKS clusters have a kubelet version that is compatible with the ECR VPC endpoint.

dsw88 commented 4 years ago

@mikestef9 Thanks so much for the info, and thanks for the pointer to the private EKS cluster reference repository!

I have one final question that I'm having a hard time figuring out how to deal with: How can I configure other hosts in this same private VPC to be able to talk to the cluster? Knowing the private DNS name isn't a huge deal, because I can just hard-code it into whatever needs to talk to the cluster. A bigger problem, however, is how a host in the private VPC can authenticate with the cluster.

Currently when I use the AWS API to set up a kubeconfig with EKS, it includes the following snippet in the generated kubeconfig file:

- name: arn:aws:eks:REGION:ACCOUNT_ID:cluster/CLUSTER_NAME
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      args:
      - --region
      - REGION
      - eks
      - get-token
      - --cluster-name
      - CLUSTER_NAME
      command: aws
      env: null

As you can see, it called the EKS API to get a token that authenticates it with the cluster. That definitely presents a problem since my hosts in the private VPC also don't have access to the EKS API. Is there another way that I can authenticate to the cluster without EKS API access?

zucler commented 4 years ago

See this GH repo https://github.com/jpbarto/private-eks-cluster created by an AWS Solutions Architect for a reference implementation. Note that only 1.13 and above EKS clusters have a kubelet version that is compatible with the ECR VPC endpoint.

It seems that this repo uses unmanaged nodes though. I tried deploying it and it brought up a cluster without any nodes listed under the EKS web console. Is this correct?

vranystepan commented 4 years ago

@mikestef9 Thank you very much for this clue. Now I have a working setup with managed worker groups and no access to the Internet :tada:

I was not sure if it's feasible as the documentation says:

Amazon EKS managed node groups can be launched in both public and private subnets. The only requirement is for the subnets to have outbound internet access. Amazon EKS automatically associates a public IP to the instances started as part of a managed node group to ensure that these instances can successfully join a cluster.

Well, apparently it is. If someone needs working Terraform recipes, ping me stepan@vrany.dev.

mikestef9 commented 4 years ago

@vranystepan great to hear you have this working. As part of our fix for #607 we will make sure to get our documentation updated.

duckie commented 4 years ago

This is still a real issue.

I need to actually create and delete new clusters from private subnets with no NAT or Egress gateways. I can create private endpoints for apparently every AWS service but EKS. This is a a deep pain for some customers, as we have to build complicated workarounds to have traffic routed towards the EKS service, whereas every other AWS service is easily exposed with a private endpoint.

evanlurvey commented 3 years ago

I agree with @duckie this issue should not be closed yet. EKS support is laughable.

dsw88 commented 3 years ago

I agree that VPC endpoints are still very important, and this issue should be kept open. It is possible to run EKS clusters in private subnets with no internet egress, but it is not possible to manage those clusters from within that private VPC. We are limited in the tooling we can develop around EKS for lifecycle actions such as creating, updating, and deleting clusters because we can't perform those actions inside our private VPC. Please consider implementing a VPC endpoint for EKS! Thanks!

amitkarpe commented 3 years ago

Hi, Any workaround for this issue? We should able to create and manage EKS cluster in private VPC. In our situation (due to security policies), our bastion server (and vpc) don’t have public access. In that case how we can create an eks cluster? We are using Terraform to provision EKS.

taro-cmd commented 3 years ago

Is there status on this issue? This is a real problem for vendors that only use the bootstrap.sh to perform automated eks deployments because our environment are private. I would like to know if anyone is working on this eks private endpoint? Thanks

torengaw commented 3 years ago

We have the problem too. We've built a private cluster for a private vpc with CDK (the VPC is connected to a Transit Gateway). CDK makes usage of a custom resource lambda for doing the kubeconfig update. When the cluster endpointAccess is private (or public and private) this lambda is associated to the VPC (via ENIs). The Lambda function calls "aws eks update-kubeconfig" from "inside" of the VPC, but is unable to access the cluster endpoint and fails with a timeout. All necessary VPC Endpoints (according to the official EKS docs) are set (ecr.api, ecr.dkr, s3, ...,).

xor007 commented 3 years ago

+1 Making fully private clusters that are custom cloud formation resources is actually not possible without this: a lambda in VPC cannot get kubectl tokens.

ctrongminh commented 3 years ago

+1 For my case, I cannot use codebuild with attached VPC (all subnets are private) to call to the private EKS cluster via "aws eks update-kubeconfig"

The result would be Connect timeout on endpoint URL: "https://eks.<region>.amazonaws.com/clusters/xxxxx"

nhsk4u commented 2 years ago

when i create cluster with no internet access, getting below error... Is there any update on VPC endpoint support for EKS API?

Command used to create cluster: aws eks create-cluster \ --region ap-southeast-1 \ --name CP-EKS-TEST-NHSK \ --kubernetes-version 1.21 \ --role-arn arn:aws:iam::4103:role/nhsk \ --resources-vpc-config subnetIds=subnet-063b9,subnet-04,securityGroupIds=sg-03

Error Message: connect timeout on endpoint url: "https://eks.ap-southeast-1.amazonaws.com/clusters"

laurecs commented 2 years ago

I need this as well. Is there a solution or a current workaround yet?

djjames72 commented 2 years ago

Commenting as well. An EKS VPC Endpoint would be a huge help. Have there been any updates recently?

deitch commented 2 years ago

@mikestef9

If you use EKS Managed Nodes, the bootstrapping process avoids the aws eks describe-cluster API call, so you can launch workers into a private subnet without outbound internet access as long as you setup the other required PrivateLink endpoints correctly.

Mike, what are the "other required endpoints"? Is there a list somewhere that says, "here are all of the endpoints that a managed node requires"?

Xat59 commented 2 years ago

@mikestef9

If you use EKS Managed Nodes, the bootstrapping process avoids the aws eks describe-cluster API call, so you can launch workers into a private subnet without outbound internet access as long as you setup the other required PrivateLink endpoints correctly.

Mike, what are the "other required endpoints"? Is there a list somewhere that says, "here are all of the endpoints that a managed node requires"?

@deitch imho the folowing VPC endpoints are required :

deitch commented 2 years ago

Cool thanks. Are the ECR only if you use containers from ECR? Or general requirement?

This should be documented formally somewhere in AWS.

Xat59 commented 2 years ago

Cool thanks. Are the ECR only if you use containers from ECR? Or general requirement?

This should be documented formally somewhere in AWS.

Using EKS then ECR is required to bootstrap nodes. And because ECR stores images on S3 under-the-hood, you have to get access to S3. You can take a look at this documentation for EKS : https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html

deitch commented 2 years ago

Much appreciated.

malikdraz commented 2 years ago

Are there any updates on this team?

bogdando commented 2 years ago

Cluster autoscaler, when running in a private EKS cluster, also experiences that problem:

    managed_nodegroup_cache.go:133] Failed to query the managed nodegroup foo for the cluster bar while looking for labels/taints: RequestError: send request failed
    caused by: Get "https://eks.<region>.amazonaws.com/clusters/bar/node-groups/foo": dial tcp <*public_IP*>:443: i/o timeout

After reading https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html I think there could be a w/a to that: "DHCP options set for your VPC must include AmazonProvidedDNS in its domain name servers list". But I'm not sure which domain name to configure in dhcp options... Should it be eks.<region>.amazonaws.com?

mikestef9 commented 1 year ago

Amazon EKS now supports AWS PrivateLink for the EKS management APIs.

A few call outs:

laserpedro commented 3 days ago

Hello @mikestef9 , maybe this doc should be updated to enable EKS Pod identity to work with limited internet access.

mikestef9 commented 3 days ago

yes good call, will get that updated.