eksctl-io / eksctl

The official CLI for Amazon EKS
https://eksctl.io
Other
4.93k stars 1.41k forks source link

[Bug] entire availability zone is removed if one of the instances is missing #7064

Open matti opened 1 year ago

matti commented 1 year ago

What were you trying to accomplish?

Create managed nodegroups with a list of instances

What happened?

I have a list of instances as follows from ec2-instance-selector:

c5a.8xlarge c6i.8xlarge c6in.8xlarge m5.8xlarge m6i.8xlarge m7i-flex.8xlarge m7i.8xlarge r5.8xlarge r5b.8xlarge r5n.8xlarge r6i.8xlarge

In https://github.com/eksctl-io/eksctl/pull/6464 @TiberiuGC fixed https://github.com/eksctl-io/eksctl/issues/6461 issue by removing all availability zones where one of the instances is not available

skipping eu-north-1a from selection because it doesn't support the following instance type(s): r5b.8xlarge

This is extremely unlucky, as I use ec2-instance-selector to just get a list of machines of certain type - I don't actually care what the exact instance types are, as long as they fit certain criteria.

So now I'm missing one availability zone in my node groups.

matti commented 1 year ago

a sample of ec2-instance-selector that I'm using to get the list I want. I can not use eksctl built-in ec2-instance-selector as it's bare bones and doesn't support all filtering options provided in ec2-instance-selector

        ec2-instance-selector \
          --region="$region" \
          --availability-zones $zones \
          --vcpus="$vcpus" \
          --memory-min="$memory" \
          --hypervisor nitro \
          --cpu-architecture x86_64 \
          --deny-list "^vt.|^inf.|d\.|en\.|dn\." \
          --gpus 0 \
          --network-performance-max "$network_performance_max" \
          --root-device-type ebs \
          --usage-class="$class" \
          --price-per-hour-max "$price_max" \
          --max-results 100
TiberiuGC commented 1 year ago

Hi @matti. If we allow eu-north-1a to be selected, EKS may actually try to create r5b.8xlarge instances within this AZ, as it does not have any kind of filtering/validation mechanism to avoid that. By skipping the AZ, eksctl is merely preventing you from running into the error caught in https://github.com/eksctl-io/eksctl/issues/6461. So, even if we don't skip the AZ, you may occasionally run into a nodegroup creation error.

With that in mind, what would be your desired behaviour?

matti commented 1 year ago

error out instead as I am requesting things that can not be fulfilled

matti commented 1 year ago

now the behaviour is imo unexpected as it "silently" (unless I follow the logs) removes an AZ from what I'm expecting.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.