aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.51k stars 3.86k forks source link

(eks): Support isolated VPCs #12171

Open iliapolo opened 3 years ago

iliapolo commented 3 years ago

Provisioning clusters inside an isolated vpc (i.e no internet access) is not currently supported. This is because the lambda functions that operate the cluster need to invoke the EKS service, which does not offer a VPC endpoint.

See https://github.com/aws/containers-roadmap/issues/298

Use Case

We've seen users mentioning their environment uses an isolated VPC.

Other

Adding some information here to possibly facilitate alternative approaches.

If you have a proxy setup, you can inject proxy information to the handlers via custom environment variables.

const proxy = "https://proxy.mycompany.com:8080/”;
new eks.Cluster(this, 'Cluster', {
  ...,

  kubectlEnvironment: {
    HTTPS_PROXY: proxy,
  },

  clusterHandlerEnvironment: {
    HTTPS_PROXY: proxy
  }
})

Also, following is a list of AWS services that our Lambda handlers interact with in order to operate the cluster. All of these services offer a VPC endpoint except for EKS.

Related: https://github.com/aws/aws-cdk/issues/10036

Once EKS does offer a VPC endpoint, it would be nice if we just provision the necessary endpoints given if we identify that the VPC does not have internet access (internet gateway, NAT).


This is a :rocket: Feature Request

BowlesCR commented 3 years ago

In my scenario, my "isolated" subnets aren't really isolated from the internet as I use a TGW to route traffic via an egress network. If you try for private and natGateways=0, CDK insists you call them isolated. If you call them isolated, you can't put EKS on them.

Is there a workaround to this, or could there be some sort of "I know what I'm doing" override added?

iliapolo commented 3 years ago

@BowlesCR

If you call them isolated, you can't put EKS on them.

If they are not actually isolated, you should be able to use them. Are you getting some kind of error?

This issue refers only to truly isolated subnets that have no internet access.

BowlesCR commented 3 years ago

Yes: jsii.errors.JSIIError: There are no 'Private' subnet groups in this VPC. Available types: Isolated,Public

aws_eks.Cluster() appears to be inferring the subnet type from the endpoint_access param, and aws_eks.EndpointAccess doesn't have an enum for ISOLATED.

I think I just found the correct way to do this, which is leave endpoint_access = PRIVATE, but manually specify vpc_subnets = vpc.isolated_subnets

As I'm thinking about this more... I think my complaint is more properly lodged with the natGateways=0 requires ISOLATED logic... I fear that calling them isolated will lead to someone making a poor assumption about their (lack of) internet access down the road.

iliapolo commented 3 years ago

I think my complaint is more properly lodged with the natGateways=0 requires ISOLATED logic

I agree about that. Might be worth opening a separate issue for the ec2 package. I'm still a little fuzzy though on the error you get from the EKS construct.

appears to be inferring the subnet type from the endpoint_access param

It doesn't really do that. The only logic pertaining to subnets is that we try and select the private subnets from the configured VPC, but we actually treat ISOLATED as PRIVATE there.

Would help if you could attach the full stack trace and/or code snippet.

BowlesCR commented 3 years ago

Sure thing Stacktrace (file paths lightly sanitized):

jsii.errors.JavaScriptError: 
  Error: There are no 'Private' subnet groups in this VPC. Available types: Isolated,Public
      at Vpc.selectSubnetObjectsByType (/tmp/jsii-kernel-Jsmxfu/node_modules/@aws-cdk/aws-ec2/lib/vpc.js:206:19)
      at Vpc.selectSubnetObjects (/tmp/jsii-kernel-Jsmxfu/node_modules/@aws-cdk/aws-ec2/lib/vpc.js:172:28)
      at Vpc.selectSubnets (/tmp/jsii-kernel-Jsmxfu/node_modules/@aws-cdk/aws-ec2/lib/vpc.js:59:30)
      at /tmp/jsii-kernel-Jsmxfu/node_modules/@aws-cdk/aws-eks/lib/cluster.js:265:77
      at Array.map (<anonymous>)
      at new Cluster (/tmp/jsii-kernel-Jsmxfu/node_modules/@aws-cdk/aws-eks/lib/cluster.js:265:59)
      at /tmp/tmpyat3mqfa/lib/program.js:2720:58
      at Kernel._wrapSandboxCode (/tmp/tmpyat3mqfa/lib/program.js:3148:24)
      at Kernel._create (/tmp/tmpyat3mqfa/lib/program.js:2720:34)
      at Kernel.create (/tmp/tmpyat3mqfa/lib/program.js:2461:29)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./CDK-Infrastructure/app.py", line 382, in <module>
    main()
  File "./CDK-Infrastructure/app.py", line 338, in main
    eks_stack = EksStack(
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/jsii/_runtime.py", line 83, in __call__
    inst = super().__call__(*args, **kwargs)
  File "./CDK-Infrastructure/cdk_infrastructure/cdk_eks/cdk_eks_stack.py", line 48, in __init__
    self.cluster = eks.Cluster(
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/jsii/_runtime.py", line 83, in __call__
    inst = super().__call__(*args, **kwargs)
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/aws_cdk/aws_eks/__init__.py", line 7895, in __init__
    jsii.create(Cluster, self, [scope, id, props])
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/jsii/_kernel/__init__.py", line 265, in create
    response = self.provider.create(
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/jsii/_kernel/providers/process.py", line 348, in create
    return self._process.send(request, CreateResponse)
  File "./CDK-Infrastructure/.env/lib64/python3.9/site-packages/jsii/_kernel/providers/process.py", line 330, in send
    raise JSIIError(resp.error) from JavaScriptError(resp.stack)
jsii.errors.JSIIError: There are no 'Private' subnet groups in this VPC. Available types: Isolated,Public
Subprocess exited with error 1

cdk_eks_stack.py Line 48:

        self.cluster = eks.Cluster(
            self,
            "Cluster",
            cluster_name=cluster_name,
            vpc=vpc,
            version=eks.KubernetesVersion.V1_18,
            default_capacity=0,
            endpoint_access=eks.EndpointAccess.PRIVATE,
            masters_role=adminRole,
            secrets_encryption_key=secrets_key,
            security_group=security_group,
            # vpc_subnets=vpc.isolated_subnets,
        )
iliapolo commented 3 years ago

Ok I understand now. Yeah your solution is appropriate, without the vpc_subnets config we try to select PRIVATE subnets and fail. Thanks.

BowlesCR commented 3 years ago

Excellent. Thank you for taking a look.

ArtiomL commented 2 years ago

Hi, Is there a way to disable the handler lambdas as part of the deployment? And if not - would it make sense to add this as a selectable option?

pahud commented 1 year ago

At this moment(cdk 2.63.0), it's possible to deploy a private eks endpoint with nodegroup in the PRIVATE_WITH_EGRESS subnets. Check out the sample below:

    const cluster = new eks.Cluster(this, 'Cluster', {
      vpc,
      version: eks.KubernetesVersion.V1_24,
      // private endpoint only
      endpointAccess: eks.EndpointAccess.PRIVATE,
      vpcSubnets: [
        { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
      ],
      // lambda handler with vpc access
      placeClusterHandlerInVpc: true,
      kubectlLayer: new KubectlLayer(this, 'KUbectlLayer'),
      defaultCapacity: 0,
    });
    // nodegroup in privage subnet with egress to access internet without any vpc endpoints
    cluster.addNodegroupCapacity('NG', {
      subnets: vpc.selectSubnets({ subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS, }),
    })
emanserav commented 1 year ago

hi @pahud can we get some details on why the EKS cluster cannot currently be deployed on Isolated Subnets ? What could be the issue(s) within the Lambda functions ? Why would the Lambda functions need Internet access ? (shouldn't be enough to reach out the EKS (and other endpoints as needed inside the VPC) cluster Endpoint (private) ?) thank you

emanserav commented 1 year ago

If I am using the Isolated subnets in the VPC and we don't use a NAT but instead we direct all outgoing networking through a proxy, is there a way to pass this proxy setup to the EKS Cluster Construct (or somehow to the Lambdas deployed as part of the Cluster construct) ? This is just for the case where not looking or understanding on why the Lambdas deployed by the EKS Cluster construct will need Internet access and just setup the proxy for those Lambdas so they can reach the Internet.

here is the error I am currently getting when trying to create the EKS Cluster, the error happens on the Lambdas deployed as part of the EKS Cluster L2 Construct when trying to update the k8s cluster auth manifest (logical ID AwsAuthmanifest): Received response status [FAILED] from custom resource. Message returned: Error: b'\nConnect timeout on endpoint URL: "https://sts.amazonaws.com/"\nUnable to connect to the server: getting credentials: exec: executable aws failed with exit code 255 (Client.Timeout exceeded while awaiting headers)\n' Logs:

(I am also wondering why the STS global https://sts.amazonaws.com/ as I am already using an STS endpoint in the VPC by region, I was expecting this one to be tried to be reached by the Lambdas)

pahud commented 1 year ago

hi @pahud can we get some details on why the EKS cluster cannot currently be deployed on Isolated Subnets ? What could be the issue(s) within the Lambda functions ? Why would the Lambda functions need Internet access ? (shouldn't be enough to reach out the EKS (and other endpoints as needed inside the VPC) cluster Endpoint (private) ?) thank you

Yes technically the eks cluster can be associated with isolated subnets but the primary consider for that is - If your lambda function is associated with isolated subnets, it can access the control plane but won't be able to access the EKS service API until some private endpoints are enabled or http_proxy configured. It's still unclear to us how to configure correctly in CDK so I would suggest associate PRIVATE_WITH_EGRESS subnets for vpcSubnets and make sure your lambda function won't associate with isolated subnets without appropriate http_proxy configuration.

vpcSubnets: [
        { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
      ],
ClaudiusMZ commented 1 year ago

If I am using the Isolated subnets in the VPC and we don't use a NAT but instead we direct all outgoing networking through a proxy, is there a way to pass this proxy setup to the EKS Cluster Construct (or somehow to the Lambdas deployed as part of the Cluster construct) ? This is just for the case where not looking or understanding on why the Lambdas deployed by the EKS Cluster construct will need Internet access and just setup the proxy for those Lambdas so they can reach the Internet.

here is the error I am currently getting when trying to create the EKS Cluster, the error happens on the Lambdas deployed as part of the EKS Cluster L2 Construct when trying to update the k8s cluster auth manifest (logical ID AwsAuthmanifest): Received response status [FAILED] from custom resource. Message returned: Error: b'\nConnect timeout on endpoint URL: "https://sts.amazonaws.com/"\nUnable to connect to the server: getting credentials: exec: executable aws failed with exit code 255 (Client.Timeout exceeded while awaiting headers)\n' Logs:

(I am also wondering why the STS global https://sts.amazonaws.com/ as I am already using an STS endpoint in the VPC by region, I was expecting this one to be tried to be reached by the Lambdas)


I run into the same issue - this should help you:

  const cluster = new eks.Cluster(this, 'your-cluster', {
    clusterName:                '<your-cluster-name>',
    version:                    eks.KubernetesVersion.V1_27,
    kubectlLayer:               new KubectlV27Layer(this, 'kubectl-v27-layer'),
    endpointAccess:             eks.EndpointAccess.PRIVATE,
    vpc:                        vpc,
    vpcSubnets:                 [{ subnets: [<isolated_subnet_1>, <isolated_subnet_2>] }],
    ...
    placeClusterHandlerInVpc:   true,
    clusterHandlerEnvironment:  { AWS_STS_REGIONAL_ENDPOINTS: 'regional'},
    kubectlEnvironment:         { AWS_STS_REGIONAL_ENDPOINTS: 'regional'}
  });

requires: regional vpc endpoint for sts

emanserav commented 1 year ago

thank you @ClaudiusMZ for your input but that didn't help. Before your comment (using cdk 2.89.0) I've tried to insert my proxy (by inserting proxy common ENV variables: _http_proxy, https_proxy, noproxy) in the 2 fields:

and I was getting "Received response status [FAILED] from custom resource. Message returned: Error: connect ETIMEDOUT <some PUBLIC_IP>:443 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16) (RequestId: )_"

So when I saw your comment I thought that you may have a good point there and that will go directly to the regional STS end point as expected but However, I got exactly the same error (if matters or not but cdk now was 2.93.0).

Therefore: I still suspect the handlers don't pick up the proxy, nor reaching directly the STS endpoints defined in VPC as you mentioned (without proxy)

Analyzing more on what I wrote above actually I am guessing now WHY the Internet is needed (my original question) and we need to go back to how to setup the proxy: the only way for Control Plane to reach the STS is through internet due to the EKS Control Plane which is AWS Managed.

And I will just add for others in case they will bump into this, I am talking specifically for CDK TS, my guts tells me that if I will do CDK Python proxy may work (who knows ?!) the proxy setup in node/js may not be as nice as in python, or is it and I am missing still something very easy here ?

github-actions[bot] commented 12 months ago

This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.

vishwanjalijadhav commented 10 months ago

We are also trying to create the EKS cluster through CDK in private subnets (VPC has internet access via proxy) and we are using the enterprise proxy. But we are getting below error, if we try to use the 'placeClusterHandlerInVpc ' true and setting the proxy for cluster handler environment. | Custom::AWSCDK-EKS-Cluster | copitoekscluster1683824D Received response status [FAILED] from custom resource. Message returned: Error: connect ETIMEDOUT 52.94.204.134:443 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16) (RequestId: b840dd7e-0ec4-41c9-add2-848bfd523de9)

Cluster instantiation code looks like below:

const eksCluster = new eks.FargateCluster(this, 'copito-eks-cluster', { version: eks.KubernetesVersion.V1_27, kubectlEnvironment: { https_proxy: 'enterprise proxy url', }, clusterHandlerEnvironment: { https_proxy: 'enterprise proxy url', }, clusterHandlerSecurityGroup : proxysecurityGroup, mastersRole : masterRole, clusterName : props?.clusterName, vpc: copitoDevVPC, endpointAccess: eks.EndpointAccess.PRIVATE, vpcSubnets: [{ subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS}], placeClusterHandlerInVpc : true, clusterLogging: [ eks.ClusterLoggingTypes.API, eks.ClusterLoggingTypes.AUTHENTICATOR, eks.ClusterLoggingTypes.SCHEDULER, eks.ClusterLoggingTypes.CONTROLLER_MANAGER ], });

If we don't use 'placeClusterHandlerInVpc' i.e. set it to false, then we are getting below error,

Exception: b'Unable to connect to the server: proxyconnect tcp: x509: certificate signed by unknown authority\n' Traceback (most recent call last): File "/var/task/index.py", line 20, in handler return patch_handler(event, context) File "/var/task/patch/init.py", line 50, in patch_handler kubectl([ 'patch', resource_name, '-n', resource_namespace, '-p', patch_json, '--type', patch_type ]) File "/var/task/patch/init.py", line 66, in kubectl raise Exception(output)

Our enterprise proxy instance shows successful connection with both EKS and STS endpoints. I believe there is no issue with proxy.

caretak3r commented 10 months ago

I got this partially working, by creating the necessary VPC endpoints required to get the lambdas to communicate properly without modifying any security groups. I used what @ClaudiusMZ provided above:

place_cluster_handler_in_vpc=True,  # Place the cluster handler in the VPC
cluster_handler_environment={"AWS_STS_REGIONAL_ENDPOINTS": "regional"},
kubectl_environment={"AWS_STS_REGIONAL_ENDPOINTS": "regional"},

Our VPC has two subnets that are fully private, no NAT Gateways, No IGWs, EKS API is PRIVATE, only routes are to local and VPC Gateway Endpoint for S3. I created the following VPC endpoints:

image

Seems... excessive, but I went through the woes of figuring out what the lambdas needed in order to complete setting up the cluster, and provisioning kube manifests and some helm charts. In this case, I have to host the charts and images in a private ECR in the same VPC. So far it looks possible to accomplish.

The CDK documentation leaves a lot to be desired in terms of what is being done behind the scenes in the L2 constructs I am using, but by digging into the typescript cdk codebase I was able to make sense of some of it.