Open robd003 opened 1 year ago
Hi @robd003 - thanks for opening this issue. I'm creating a cluster now and attempting to reproduce your issue.
Can you share which version of eksctl
you have?
I'm using:
❯ eksctl version
0.138.0-rc.0
I'm noticing in your config that you have your Nodegroups privateNetworking
enabled and set to true. But your subnet configuration is:
subnets:
- pub-us-east-2a
- priv-us-east-2a
which makes me think that your Bottlerocket node is failing to reach that public enpoint since you have only private networking enabled.
When placing nodegroups inside a private subnet, privateNetworking must be set to true on the nodegroup
So I'm unsure if that is supported. This example makes me think that you might need a separate node-group to enable the public access.
I'll be back in a few minutes with some results!
I'm getting the following error when attempting to use 1 public and 1 private subnet from your example:
❯ eksctl create cluster -f 3064-repro.yaml
2023-05-01 16:44:43 [ℹ] eksctl version 0.138.0-rc.0
2023-05-01 16:44:43 [ℹ] using region us-west-2
2023-05-01 16:44:43 [✖] unable to use given VPC (vpc-xxx) and subnets (private:map[private-subnet:{subnet-xxx us-west-2a 192.168.128.0/19 0 }] public:map[public-subnet:{subnet-xxx us-west-2a 192.168.0.0/19 0 }])
Error: insufficient number of subnets, at least 2x public and/or 2x private subnets are required
Have you been able to reproduce this with the given cluster config? I'll keep trying with another private subnet in the vpc
field
I'm using eksctl 0.139.0
I was unable to add the private subnet unless I had privateNetworking set to true for the nodegroup.
The part that confused me is that the EKS cluster has both private and public access, so you would think that the nodes would be able to connect regardless.
The main issue I was seeing was that Bottlerocket was unable to get its private DNS name via pluto autodiscovery. Did you also see that error in the "Get Console Log" on the node instances?
I'm getting the following error when attempting to use 1 public and 1 private subnet from your example:
❯ eksctl create cluster -f 3064-repro.yaml 2023-05-01 16:44:43 [ℹ] eksctl version 0.138.0-rc.0 2023-05-01 16:44:43 [ℹ] using region us-west-2 2023-05-01 16:44:43 [✖] unable to use given VPC (vpc-xxx) and subnets (private:map[private-subnet:{subnet-xxx us-west-2a 192.168.128.0/19 0 }] public:map[public-subnet:{subnet-xxx us-west-2a 192.168.0.0/19 0 }]) Error: insufficient number of subnets, at least 2x public and/or 2x private subnets are required
Have you been able to reproduce this with the given cluster config? I'll keep trying with another private subnet in the
vpc
field
I'm using an existing VPC and subnets, so I get past that part of eksctl.
In the example I pasted I just cut it down to a single AZ for the sake of brevity. Try defining 2+ AZs and it should work.
I was able to reproduce the issue:
Here's my eksctl
cluster config:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: 3064-repro
region: us-west-2
version: '1.26'
vpc:
clusterEndpoints:
publicAccess: true
privateAccess: true
id: "vpc-xxx"
subnets:
public:
public-subnet:
id: "subnet-xxx" # In AZ us-west-2a with CIDR - 192.168.0.0/19
# and has internet gateway / route table attached
another-public-subnet:
id: "subnet-xxx" # In AZ us-west-2b with CIRD - 192.168.64.0/19
# and has internet gateway / route table attached
private:
private-subnet:
id: "subnet-xxx" # In AZ us-west-2a with CIDR - 192.168.128.0/19
another-private-subnet:
id: "subnet-xxx" # In AZ us-west-2b with CIDR - 192.168.64.0/19
iam:
withOIDC: true
nodeGroups:
- name: test-nodegroup-3064
instanceType: m7g.xlarge
amiFamily: Bottlerocket
privateNetworking: true
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
withAddonPolicies:
autoScaler: true
cloudWatch: false
externalDNS: true
maxSize: 4
subnets:
- public-subnet
- private-subnet
labels:
env: prod
desiredCapacity: 1
I had provision and specify 2 existing private and public subnets to get eksctl to use my custom, existing vpc.
Here's the run of eksctl
:
❯ eksctl create cluster -f 3064-repro.yaml
2023-05-01 17:00:06 [ℹ] eksctl version 0.138.0-rc.0
2023-05-01 17:00:06 [ℹ] using region us-west-2
2023-05-01 17:00:07 [✔] using existing VPC (vpc-xxx) and subnets (private:map[another-private-subnet:{subnet-xxx us-west-2b 192.168.192.0/19 0 } private-subnet:{subnet-xxx us-west-2a 192.168.128.0/19 0 }] public:map[another-public-subnet:{subnet-xxx us-west-2b 192.168.64.0/19 0 } public-subnet:{subnet-xxx us-west-2a 192.168.0.0/19 0 }])
2023-05-01 17:00:07 [!] custom VPC/subnets will be used; if resulting cluster doesn't function as expected, make sure to review the configuration of VPC/subnets
2023-05-01 17:00:07 [ℹ] nodegroup "test-nodegroup-3064" will use "ami-03afaac8605e281d8" [Bottlerocket/1.26]
2023-05-01 17:00:07 [ℹ] using Kubernetes version 1.26
2023-05-01 17:00:07 [ℹ] creating EKS cluster "3064-repro" in "us-west-2" region with un-managed nodes
2023-05-01 17:00:07 [ℹ] 1 nodegroup (test-nodegroup-3064) was included (based on the include/exclude rules)
2023-05-01 17:00:07 [ℹ] will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s)
2023-05-01 17:00:07 [ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
2023-05-01 17:00:07 [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --cluster=3064-repro'
2023-05-01 17:00:07 [ℹ] Kubernetes API endpoint access will use provided values {publicAccess=true, privateAccess=true} for cluster "3064-repro" in "us-west-2"
2023-05-01 17:00:07 [ℹ] CloudWatch logging will not be enabled for cluster "3064-repro" in "us-west-2"
2023-05-01 17:00:07 [ℹ] you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-west-2 --cluster=3064-repro'
2023-05-01 17:00:07 [ℹ]
2 sequential tasks: { create cluster control plane "3064-repro",
2 sequential sub-tasks: {
4 sequential sub-tasks: {
wait for control plane to become ready,
associate IAM OIDC provider,
2 sequential sub-tasks: {
create IAM role for serviceaccount "kube-system/aws-node",
create serviceaccount "kube-system/aws-node",
},
restart daemonset "kube-system/aws-node",
},
create nodegroup "test-nodegroup-3064",
}
}
2023-05-01 17:00:07 [ℹ] building cluster stack "eksctl-3064-repro-cluster"
2023-05-01 17:00:08 [ℹ] deploying stack "eksctl-3064-repro-cluster"
2023-05-01 17:00:38 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:01:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:02:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:03:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:04:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:05:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:06:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:07:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:08:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:09:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:10:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:11:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:12:08 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-cluster"
2023-05-01 17:14:09 [ℹ] building iamserviceaccount stack "eksctl-3064-repro-addon-iamserviceaccount-kube-system-aws-node"
2023-05-01 17:14:09 [ℹ] deploying stack "eksctl-3064-repro-addon-iamserviceaccount-kube-system-aws-node"
2023-05-01 17:14:09 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-addon-iamserviceaccount-kube-system-aws-node"
2023-05-01 17:14:39 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-addon-iamserviceaccount-kube-system-aws-node"
2023-05-01 17:14:39 [ℹ] serviceaccount "kube-system/aws-node" already exists
2023-05-01 17:14:39 [ℹ] updated serviceaccount "kube-system/aws-node"
2023-05-01 17:14:40 [ℹ] daemonset "kube-system/aws-node" restarted
2023-05-01 17:14:40 [ℹ] building nodegroup stack "eksctl-3064-repro-nodegroup-test-nodegroup-3064"
2023-05-01 17:14:40 [ℹ] --nodes-min=1 was set automatically for nodegroup test-nodegroup-3064
2023-05-01 17:14:40 [!] public subnet public-subnet is being used with `privateNetworking` enabled, please ensure this is the desired behaviour
2023-05-01 17:14:40 [ℹ] deploying stack "eksctl-3064-repro-nodegroup-test-nodegroup-3064"
2023-05-01 17:14:40 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-nodegroup-test-nodegroup-3064"
2023-05-01 17:15:10 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-nodegroup-test-nodegroup-3064"
2023-05-01 17:15:48 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-nodegroup-test-nodegroup-3064"
2023-05-01 17:17:40 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-nodegroup-test-nodegroup-3064"
2023-05-01 17:19:16 [ℹ] waiting for CloudFormation stack "eksctl-3064-repro-nodegroup-test-nodegroup-3064"
2023-05-01 17:19:16 [ℹ] waiting for the control plane to become ready
2023-05-01 17:19:16 [!] failed to determine authenticator version, leaving API version as default v1alpha1: failed to parse versions: unable to parse first version "unversioned": Invalid character(s) found in major number "unversioned"
2023-05-01 17:19:16 [✔] saved kubeconfig as "/home/ubuntu/.kube/config"
2023-05-01 17:19:16 [ℹ] no tasks
2023-05-01 17:19:16 [✔] all EKS cluster resources for "3064-repro" have been created
2023-05-01 17:19:16 [ℹ] adding identity "arn:aws:iam::994959692891:role/eksctl-3064-repro-nodegroup-test-NodeInstanceRole-3QFDL9P4KWS" to auth ConfigMap
2023-05-01 17:19:16 [ℹ] nodegroup "test-nodegroup-3064" has 0 node(s)
2023-05-01 17:19:16 [ℹ] waiting for at least 1 node(s) to become ready in "test-nodegroup-3064"
It just hangs waiting for the node to come up.
And I see the following failure in the bootlog:
[ 27.562077] sundog[1145]: Setting generator 'pluto private-dns-name' failed with exit code 1 - stderr: Error describing instance 'i-0b01925b70255a154': dispatch failure: timeout: error trying to connect: HTTP connect timeout occurred after 3.1s: HTTP connect timeout occurred after 3.1s: timed out (DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: hyper::Error(Connect, HttpTimeoutError { kind: "HTTP connect", duration: 3.1s }) } }))
[FAILED] Failed to start User-specified setting generators.
See 'systemctl status sundog.service' for details.
[DEPEND] Dependency failed for Applies settings to create config files.
[DEPEND] Dependency failed for Send signal to CloudFormation Stack.
[DEPEND] Dependency failed for Bottlerocket initial configuration complete.
[DEPEND] Dependency failed for Isolates configured.target.
[DEPEND] Dependency failed for Sets the hostname.
But this warning log from eksctl
gives me pause:
2023-05-01 17:14:40 [!] public subnet public-subnet is being used with `privateNetworking` enabled, please ensure this is the desired behaviour
I also see you opened https://github.com/weaveworks/eksctl/issues/6563. I'm not sure what this privateNetworking
setting is doing. We should also confirm any assumptions about that setting with that team.
Also going to attempt to reproduce this with a 1.25 cluster since we moved to an in-tree cloud provider for 1.26 and this may be related to that.
1.25 with this configuration fails to bring up the kubelet:
Starting Kubelet...
[ OK ] Finished Isolates multi-user.target.
[ OK ] Finished Send boot success.
[FAILED] Failed to start Kubelet.
See 'systemctl status kubelet.service' for details.
[ OK ] Reached target Multi-User System.
@jpmcb Are you able to get the logs to see why it failed?
I'm having trouble getting the ssm agent to connect - I've added the necessary endpoints in the private subnet but I'm wondering if the bottlerocket network configuration selects the wrong interface and gets a DHCP lease that can't hit those endpoints.
Any thoughts @zmrow or @yeazelm ?
@jpmcb Hi! Just found this issue from AWS support case. Is there any ETA for resolving this?
Hi @svyatoslavmo - thanks for the update. Can you provide some more detail on what you're attempting to do with a private and public subnet?
This usecase is somewhat abnormal since a private subnet with no gateway will never be able to pull down images from ECR (or another image registry). This is especially relevant for the admin and control containers which are required to perform debugging operations (and is making determining the kublet failure logs difficult)
For example, with both the private and public subnet attached to the node group (and privateNetworking
set), these able to pull these logs from the
[ 776.392160] host-ctr[1277]: time="2023-05-02T16:23:56Z" level=error msg="retries exhausted: failed to resolve reference \"ecr.aws/arn:aws:ecr:us-west-2:328549459982:repository/bottlerocket-admin:v0.10.0\": RequestError: send request failed\ncaused by: Post \"https://api.ecr.us-west-2.amazonaws.com/\": dial tcp 52.119.173.252:443: i/o timeout" ref="ecr.aws/arn:aws:ecr:us-west-2:328549459982:repository/bottlerocket-admin:v0.10.0"
[ 776.403281] host-ctr[1277]: time="2023-05-02T16:23:56Z" level=fatal msg="retries exhausted: failed to resolve reference \"ecr.aws/arn:aws:ecr:us-west-2:328549459982:repository/bottlerocket-admin:v0.10.0\": RequestError: send request failed\ncaused by: Post \"https://api.ecr.us-west-2.amazonaws.com/\": dial tcp 52.119.173.252:443: i/o timeout"
[ 779.398958] host-ctr[1278]: time="2023-05-02T16:23:59Z" level=error msg="retries exhausted: failed to resolve reference \"ecr.aws/arn:aws:ecr:us-west-2:328549459982:repository/bottlerocket-control:v0.7.1\": RequestError: send request failed\ncaused by: Post \"https://api.ecr.us-west-2.amazonaws.com/\": dial tcp 52.119.173.252:443: i/o timeout" ref="ecr.aws/arn:aws:ecr:us-west-2:328549459982:repository/bottlerocket-control:v0.7.1"
Does this work on different node operating systems?
The more typical use case I've seen is where a kubernetes cluster has separate node groups where a subset is attached to the wider internet, and other groups are segmented away. I'm not sure how the case where a group has both private and public subnets.
I thought maybe there's something eksctl
is doing with the privateNetworking = true
key. So I used the following:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: private-networking
region: us-west-2
version: '1.25'
iam:
withOIDC: true
nodeGroups:
- name: test-nodegroup
instanceType: m7g.xlarge
amiFamily: Bottlerocket
privateNetworking: true
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
desiredCapacity: 1
ssh:
allow: true
publicKeyName: abc123
which only attaches to the created private subnets: everything comes up fine, including when pulling container images and starting the kubelet. So it must be a symptom of attaching both the private and public subnets.
privateNetworking
usually behaves in the following manner:
(1) - when the subnets are not already existing and user defined, but rather not specified in the config file and hence created by eksctl
. Here, for privateNetworking: true
, eksctl
will only assign private subnets to the nodegroup, and the reverse also applies.
(2) - when the subnets already exist and they are defined by the user, privateNetworking
is only used for some validation purposes, but it usually falls under the user's responsibility to only assign private subnets to that nodegroup in order to achieve the desired behaviour. (hence why you're only seeing this warning 2023-05-01 17:14:40 [!] public subnet public-subnet is being used with privateNetworking enabled, please ensure this is the desired behaviour
)
@jpmcb your use case falls under scenario (1), the important aspect here not being that you didn't specify public subnets, but rather the fact that the private subnets are eksctl
created, which means a NAT gateway is also being created and attached to those subnets.
@robd003 your use case falls under scenario (2). I think the issue was highlighted in one of the above messages -
a private subnet with no gateway will never be able to pull down images from ECR (or another image registry)
. If one of your worker nodes is deployed within the private subnet, you'll run into this problem. I think it's possible to continue specifying both public and private subnets for your nodegroup, as long as the private one has a gateway configured. If you continue specifying both, you can drop the privateNetworking: true
flag as you are not really achieving privateNetworking
for the entire nodegroup anyways.
@robd003 your use case falls under scenario (2). I think the issue was highlighted in one of the above messages - a private subnet with no gateway will never be able to pull down images from ECR (or another image registry). If one of your worker nodes is deployed within the private subnet, you'll run into this problem.
Thanks @TiberiuGC so much for the insight! I thought the solution here should be to specify VPC endpoints, which keep us from having to give the subnet a NAT or internet gateway. In my test environment, I gave my private subnets VPC endpoints to ECR and SSM, made the security groups allow all traffic, but still was not able to get my nodes to hit ECR to pull down the images.
Thought it might be something weird with my subnet, but even attaching a internet gateway to the subnet and deploying a test ubuntu image works fine but the bottlerocket node still can't come up. Any thoughts?
Thanks for the help guys. I'll try just sticking with a public only cluster for now. Bottlerocket has been great during my testing so far!
@robd003 your use case falls under scenario (2). I think the issue was highlighted in one of the above messages - a private subnet with no gateway will never be able to pull down images from ECR (or another image registry). If one of your worker nodes is deployed within the private subnet, you'll run into this problem.
Thanks @TiberiuGC so much for the insight! I thought the solution here should be to specify VPC endpoints, which keep us from having to give the subnet a NAT or internet gateway. In my test environment, I gave my private subnets VPC endpoints to ECR and SSM, made the security groups allow all traffic, but still was not able to get my nodes to hit ECR to pull down the images.
Thought it might be something weird with my subnet, but even attaching a internet gateway to the subnet and deploying a test ubuntu image works fine but the bottlerocket node still can't come up. Any thoughts?
Unfortunately nothing that comes to mind instantly ... this may require further investigation
1.25 with this configuration fails to bring up the kubelet:
Starting Kubelet... [ OK ] Finished Isolates multi-user.target. [ OK ] Finished Send boot success. [FAILED] Failed to start Kubelet. See 'systemctl status kubelet.service' for details. [ OK ] Reached target Multi-User System.
I have the exact same problem (EKS 1.25, amazon/bottlerocket-aws-k8s-1.25-x86_64-v1.13.5-33225cc9
), except I create everything using terraform.
I would like to remove the NAT (& EIGW) from my private subnets (make them fully offline) for security reasons. But I have the same error as above. SSH connection is impossible (port 22: Connection refused
).
My VPC endpoints are as such:
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.default.id
service_name = "com.amazonaws.${data.aws_region.current.name}.s3"
route_table_ids = [
aws_route_table.internet_private.id,
aws_route_table.internet_public.id
]
}
resource "aws_vpc_endpoint" "offline" {
for_each = toset(["sts", "ecr.dkr", "ec2", "autoscaling", "eks", "ssm"])
vpc_id = aws_vpc.default.id
service_name = "com.amazonaws.${data.aws_region.current.name}.${each.key}"
subnet_ids = local.subnets_private_ids
vpc_endpoint_type = "Interface"
}
Once the node has joined the cluster, I can remove the NAT (& EIGW) and everything seems to work.
@awoimbee - thanks for surfacing this.
That does look similar to the above. Do those subnets have VPC endpoints to ECR to pull down the container images to start the admin/control container? You won't be able to get ssh access unless the admin container can be pulled down and started to start serving ssh clients.
Hi, I had multiple mistakes in the above snippet:
private_dns_enabled
ecr.api
endpoint (containers could not be pulled) -> I wonder why dkr and api are 2 separate endpoints if we always need both ?So, here is my final definition to get truly offline EKS nodes (has been working for me for a week):
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.default.id
service_name = "com.amazonaws.${data.aws_region.current.name}.s3"
route_table_ids = [
aws_route_table.internet_private.id,
aws_route_table.internet_public.id
]
tags = {
Name = "gw-${var.name}-s3-vpc-endpoint"
module = local.module
}
}
resource "aws_security_group" "vpc_endpoint" {
description = "Security group for VPC endpoints"
name = "vpc-endpoint-${var.name}"
vpc_id = aws_vpc.default.id
tags = {
Name = "VPCEndpoint"
module = local.module
}
ingress {
cidr_blocks = ["0.0.0.0/0"]
from_port = 0
to_port = 0
protocol = "-1"
}
}
resource "aws_vpc_endpoint" "offline" {
for_each = toset(["sts", "ecr.dkr", "ecr.api", "ec2", "autoscaling", "eks", "ssm"])
vpc_id = aws_vpc.default.id
service_name = "com.amazonaws.${data.aws_region.current.name}.${each.key}"
subnet_ids = [local.subnets_private_ids[0]]
vpc_endpoint_type = "Interface"
private_dns_enabled = true
security_group_ids = [aws_security_group.vpc_endpoint.id]
tags = {
Name = "iedp-${var.name}-${each.key}-vpc-endpoint"
module = local.module
}
}
Hi All,
We've run into a issue which isn't the exact scenario but has similarities. We have all private subnets, however use a secondary CIDR block for pod IPs.
Cluster with MNG running happily on 1.25. Upgrade to 1.26 runs successfully but on nodegroup upgrade to new AMI, nodes won't join cluster and errors in the instance log:
[ 304.743884] sundog[1345]: Setting generator 'pluto private-dns-name' failed with exit code 1 - stderr: Timed out retrieving private DNS name from EC2: deadline has elapsed [FAILED] Failed to start User-specified setting generators. See 'systemctl status sundog.service' for details. [DEPEND] Dependency failed for Applies settings to create config files. [DEPEND] Dependency failed for Send signal to CloudFormation Stack. [DEPEND] Dependency failed for Bottlerocket initial configuration complete. [DEPEND] Dependency failed for Isolates configured.target. [DEPEND] Dependency failed for Sets the hostname.
We do some CIS hardening using a bootstrap container. Could that potentially be causing an issue or is this error happening before it runs?
+1
I've experienced the same issue when my SG was missing 0.0.0.0/0 egress (misconfiguration) to be able to access AWS APIs, for pure private subnets - @awoimbee solution seems to be the best.
Image I'm using: Bottlerocket OS 1.13.4 (aws-k8s-1.26)
What I expected to happen: AWS instance has two subnets, a public subnet and a private subnet (without a NAT gateway)
What actually happened:
How to reproduce the problem: Try to launch an EKS cluster with Bottlerocket for EKS 1.26 with two subnets, one public and one private.
DHCP options for the subnets:
Example eksctl config:
Full boot log: https://gist.github.com/robd003/7f05a5f76bf241f047a99ab3135f6a03