Open carlosrodlop opened 3 months ago
gp3
at least deploy one node in the same Az as defined in the topology constraint for the SCIdea: Node Group using Gp3 as Storage Class, divide into 2 different node group including for one of them subnet_ids
same to the GP3 SC topology. For example cb_apps
= cb_apps_aza
+ cb_azbc
Ref: https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest/submodules/eks-managed-node-group?tab=inputs
[Node Group, single AZ] seems possible https://tanmay-bhat.medium.com/how-to-migrate-a-node-group-from-multi-az-to-single-az-in-aws-eks-73b0dc553ed. But it would be more interesting to ensure autoscaler does not delete nodes from a particular AZ (the AZ you constraint to be your EBS controllers) and share the node pools for EBS and EFS controllers https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest/submodules/eks-managed-node-group?tab=inputs ==> placement_group_az
Description
Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration. The reproduction MUST be executable by running
terraform init && terraform apply
without any further changes.If your request is for a new feature, please use the
Feature request
template.⚠️ Note
Before you submit an issue, please perform the following first:
.terraform
directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!):rm -rf .terraform/
terraform init
Versions
Module version [Required]:
Terraform version:
Provider version(s):
Reproduction Code [Required]
Steps to reproduce the behaviour:
It is random behaviour.
After recovering from Hibernation and re-provisioning team-b the following error can be read from kubernetes events
It is not related to the issue explained on the article Autoscaling issue when provisioning controllers in Multi AZ Environment because the storage class is already using wait for first customer.
Expected behavior
Team-b recover successfully from Hibernation
Actual behavior
Team-b does not recover successfully from Hibernation
Terminal Output Screenshot(s)
Additional context
Explore the option to use allowed topologies https://github.com/jenkins-infra/aws/blob/09548bf41176b32fb91f1a3c915829032e4e8ec1/eks-public-cluster.tf#L247-L282 that it is aligned with: