env0 / k8s-modules

3 stars 2 forks source link

Chore - more instance types #37

Closed chpl closed 3 months ago

chpl commented 3 months ago

Problem

Our node group fails to scale with following error

Launching a new EC2 instance. Status Reason: Could not launch Spot Instances.
UnfulfillableCapacity - Unable to fulfill capacity due to your request configuration.
Please adjust your request and try again. Launching EC2 instance failed.

Solution

Add more instance types to choose from

env0-dev[bot] commented 3 months ago

🚀  env0 had composed a PR Plan for environment K8S Agent - KuShield / Kushield EKS Cluster (kushield_eks_cluster):

Plan: 1 to add, 4 to change, 1 to destroy.
Plan Details ```diff # module.eks.module.eks.module.eks_managed_node_group["deployment"].aws_eks_node_group.this[0] has changed ! resource "aws_eks_node_group" "this" { id = "kushield-new:deployment" ! status = "DEGRADED" -> "ACTIVE" tags = { "Name" = "deployment" } # (14 unchanged attributes hidden) # (4 unchanged blocks hidden) } Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or respond to these changes. ───────────────────────────────────────────────────────────────────────────── OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: ! update in-place +/- create replacement and then destroy <= read (data resources) OpenTofu will perform the following actions: # module.autoscaler.data.aws_eks_node_group.node_group will be read during apply # (depends on a resource or a module with changes pending) <= data "aws_eks_node_group" "node_group" { + ami_type = (known after apply) + arn = (known after apply) + capacity_type = (known after apply) + cluster_name = "kushield-new" + disk_size = (known after apply) + id = (known after apply) + instance_types = (known after apply) + labels = (known after apply) + launch_template = (known after apply) + node_group_name = "deployment" + node_role_arn = (known after apply) + release_version = (known after apply) + remote_access = (known after apply) + resources = (known after apply) + scaling_config = (known after apply) + status = (known after apply) + subnet_ids = (known after apply) + tags = (known after apply) + taints = (known after apply) + version = (known after apply) } # module.autoscaler.module.eks-cluster-autoscaler.data.aws_iam_policy_document.this[0] will be read during apply # (depends on a resource or a module with changes pending) <= data "aws_iam_policy_document" "this" { + id = (known after apply) + json = (known after apply) + minified_json = (known after apply) + statement { + actions = [ + "autoscaling:DescribeAutoScalingGroups", + "autoscaling:DescribeAutoScalingInstances", + "autoscaling:DescribeLaunchConfigurations", + "autoscaling:DescribeScalingActivities", + "autoscaling:DescribeTags", + "autoscaling:SetDesiredCapacity", + "autoscaling:TerminateInstanceInAutoScalingGroup", + "ec2:DescribeImages", + "ec2:DescribeInstanceTypes", + "ec2:DescribeLaunchTemplateVersions", + "ec2:GetInstanceTypesFromInstanceRequirements", + "eks:DescribeNodegroup", ] + effect = "Allow" + resources = [ + "*", ] + sid = "Autoscaling" } } # module.autoscaler.module.eks-cluster-autoscaler.data.aws_iam_policy_document.this_irsa[0] will be read during apply # (depends on a resource or a module with changes pending) <= data "aws_iam_policy_document" "this_irsa" { + id = (known after apply) + json = (known after apply) + minified_json = (known after apply) + statement { + actions = [ + "sts:AssumeRoleWithWebIdentity", ] + effect = "Allow" + condition { + test = "StringEquals" + values = [ + "system:serviceaccount:cluster-autoscaler:cluster-autoscaler", ] + variable = "oidc.eks.us-east-1.amazonaws.com/id/552F599E8C96AA4363438C8900259368:sub" } + principals { + identifiers = [ + "arn:aws:iam::343806850935:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/552F599E8C96AA4363438C8900259368", ] + type = "Federated" } } } # module.autoscaler.module.eks-cluster-autoscaler.data.aws_region.current will be read during apply # (depends on a resource or a module with changes pending) <= data "aws_region" "current" { + description = (known after apply) + endpoint = (known after apply) + id = (known after apply) + name = (known after apply) } # module.autoscaler.module.eks-cluster-autoscaler.data.utils_deep_merge_yaml.values[0] will be read during apply # (config refers to values not yet known) <= data "utils_deep_merge_yaml" "values" { + id = (known after apply) + input = (known after apply) + output = (known after apply) } # module.autoscaler.module.eks-cluster-autoscaler.aws_iam_policy.this[0] will be updated in-place ! resource "aws_iam_policy" "this" { id = "arn:aws:iam::343806850935:policy/cluster-autoscaler-irsa-cluster-autoscaler" name = "cluster-autoscaler-irsa-cluster-autoscaler" ! policy = jsonencode( { - Statement = [ - { - Action = [ - "eks:DescribeNodegroup", - "ec2:GetInstanceTypesFromInstanceRequirements", - "ec2:DescribeLaunchTemplateVersions", - "ec2:DescribeInstanceTypes", - "ec2:DescribeImages", - "autoscaling:TerminateInstanceInAutoScalingGroup", - "autoscaling:SetDesiredCapacity", - "autoscaling:DescribeTags", - "autoscaling:DescribeScalingActivities", - "autoscaling:DescribeLaunchConfigurations", - "autoscaling:DescribeAutoScalingInstances", - "autoscaling:DescribeAutoScalingGroups", ] - Effect = "Allow" - Resource = "*" - Sid = "Autoscaling" }, ] - Version = "2012-10-17" } ) -> (known after apply) tags = {} # (6 unchanged attributes hidden) } # module.autoscaler.module.eks-cluster-autoscaler.aws_iam_role.this[0] will be updated in-place ! resource "aws_iam_role" "this" { ! assume_role_policy = jsonencode( { - Statement = [ - { - Action = "sts:AssumeRoleWithWebIdentity" - Condition = { - StringEquals = { - "oidc.eks.us-east-1.amazonaws.com/id/552F599E8C96AA4363438C8900259368:sub" = "system:serviceaccount:cluster-autoscaler:cluster-autoscaler" } } - Effect = "Allow" - Principal = { - Federated = "arn:aws:iam::343806850935:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/552F599E8C96AA4363438C8900259368" } }, ] - Version = "2012-10-17" } ) -> (known after apply) id = "cluster-autoscaler-irsa-cluster-autoscaler" name = "cluster-autoscaler-irsa-cluster-autoscaler" tags = {} # (8 unchanged attributes hidden) } # module.autoscaler.module.eks-cluster-autoscaler.helm_release.this[0] will be updated in-place ! resource "helm_release" "this" { id = "cluster-autoscaler" ! metadata = [ - { - app_version = "1.27.2" - chart = "cluster-autoscaler" - first_deployed = 1714592145 - last_deployed = 1714592145 - name = "cluster-autoscaler" - namespace = "cluster-autoscaler" - notes = <<-EOT To verify that cluster-autoscaler has started, run: kubectl --namespace=cluster-autoscaler get pods -l "app.kubernetes.io/name=aws-cluster-autoscaler,app.kubernetes.io/instance=cluster-autoscaler" EOT - revision = 1 - values = jsonencode( { - autoDiscovery = { - clusterName = "kushield-new" } - awsRegion = "us-east-1" - rbac = { - create = true - serviceAccount = { - annotations = { - "eks.amazonaws.com/role-arn" = "arn:aws:iam::343806850935:role/cluster-autoscaler-irsa-cluster-autoscaler" } - create = true - name = "cluster-autoscaler" } } } ) - version = "9.33.0" }, ] -> (known after apply) name = "cluster-autoscaler" ! values = [ - <<-EOT autoDiscovery: clusterName: kushield-new awsRegion: us-east-1 rbac: create: true serviceAccount: annotations: eks.amazonaws.com/role-arn: arn:aws:iam::343806850935:role/cluster-autoscaler-irsa-cluster-autoscaler create: true name: cluster-autoscaler EOT, ] -> (known after apply) # (26 unchanged attributes hidden) } # module.efs_csi_driver.module.efs_csi_role.data.aws_caller_identity.current will be read during apply # (depends on a resource or a module with changes pending) <= data "aws_caller_identity" "current" { + account_id = (known after apply) + arn = (known after apply) + id = (known after apply) + user_id = (known after apply) } # module.efs_csi_driver.module.efs_csi_role.data.aws_iam_policy_document.this[0] will be read during apply # (depends on a resource or a module with changes pending) <= data "aws_iam_policy_document" "this" { + id = (known after apply) + json = (known after apply) + minified_json = (known after apply) + statement { + actions = [ + "sts:AssumeRoleWithWebIdentity", ] + effect = "Allow" + condition { + test = "StringEquals" + values = [ + "sts.amazonaws.com", ] + variable = "oidc.eks.us-east-1.amazonaws.com/id/552F599E8C96AA4363438C8900259368:aud" } + condition { + test = "StringEquals" + values = [ + "system:serviceaccount:kube-system:efs-csi-controller-sa", + "system:serviceaccount:kube-system:efs-csi-node-sa", ] + variable = "oidc.eks.us-east-1.amazonaws.com/id/552F599E8C96AA4363438C8900259368:sub" } + principals { + identifiers = [ + "arn:aws:iam::343806850935:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/552F599E8C96AA4363438C8900259368", ] + type = "Federated" } } } # module.efs_csi_driver.module.efs_csi_role.data.aws_partition.current will be read during apply # (depends on a resource or a module with changes pending) <= data "aws_partition" "current" { + dns_suffix = (known after apply) + id = (known after apply) + partition = (known after apply) + reverse_dns_prefix = (known after apply) } # module.efs_csi_driver.module.efs_csi_role.data.aws_region.current will be read during apply # (depends on a resource or a module with changes pending) <= data "aws_region" "current" { + description = (known after apply) + endpoint = (known after apply) + id = (known after apply) + name = (known after apply) } # module.efs_csi_driver.module.efs_csi_role.aws_iam_role.this[0] will be updated in-place ! resource "aws_iam_role" "this" { ! assume_role_policy = jsonencode( { - Statement = [ - { - Action = "sts:AssumeRoleWithWebIdentity" - Condition = { - StringEquals = { - "oidc.eks.us-east-1.amazonaws.com/id/552F599E8C96AA4363438C8900259368:aud" = "sts.amazonaws.com" - "oidc.eks.us-east-1.amazonaws.com/id/552F599E8C96AA4363438C8900259368:sub" = [ - "system:serviceaccount:kube-system:efs-csi-controller-sa", - "system:serviceaccount:kube-system:efs-csi-node-sa", ] } } - Effect = "Allow" - Principal = { - Federated = "arn:aws:iam::343806850935:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/552F599E8C96AA4363438C8900259368" } }, ] - Version = "2012-10-17" } ) -> (known after apply) id = "kushield-new_AmazonEKS_EFS_CSI_DriverRole" name = "kushield-new_AmazonEKS_EFS_CSI_DriverRole" tags = {} # (8 unchanged attributes hidden) } # module.eks.module.eks.module.eks_managed_node_group["deployment"].aws_eks_node_group.this[0] must be replaced +/- resource "aws_eks_node_group" "this" { ! arn = "arn:aws:eks:us-east-1:343806850935:nodegroup/kushield-new/deployment/eec7a9b3-9315-727e-5a33-e0c61e00e37e" -> (known after apply) ! disk_size = 0 -> (known after apply) ! id = "kushield-new:deployment" -> (known after apply) ! instance_types = [ # forces replacement "t3a.2xlarge", + "t3a.xlarge", + "t3.2xlarge", + "t3.xlarge", ] - labels = {} -> null + node_group_name_prefix = (known after apply) ! release_version = "1.29.3-20240424" -> (known after apply) ! resources = [ - { - autoscaling_groups = [ - { - name = "eks-deployment-eec7a9b3-9315-727e-5a33-e0c61e00e37e" }, ] - remote_access_security_group_id = "" }, ] -> (known after apply) ! status = "ACTIVE" -> (known after apply) tags = { "Name" = "deployment" } # (8 unchanged attributes hidden) ! launch_template { id = "lt-00535462c138ef24e" ! name = "deployment-20240507141339729700000001" -> (known after apply) ! version = "1" -> "2" } ! scaling_config { ! desired_size = 4 -> 2 # (2 unchanged attributes hidden) } ! update_config { - max_unavailable = 0 -> null # (1 unchanged attribute hidden) } # (1 unchanged block hidden) } Plan: 1 to add, 4 to change, 1 to destroy. ╷ │ Warning: Value for undeclared variable │ │ The root module does not declare a variable named │ "service_image_pull_secret" but a value was found in file │ "env0.auto.tfvars.json". If you meant to use this value, add a "variable" │ block to the configuration. │ │ To silence these warnings, use TF_VAR_... environment variables to provide │ certain "global" settings to all configurations in your organization. To │ reduce the verbosity of these warnings, use the -compact-warnings option. ╵ ╷ │ Warning: Value for undeclared variable │ │ The root module does not declare a variable named "instance_type" but a │ value was found in file "env0.auto.tfvars.json". If you meant to use this │ value, add a "variable" block to the configuration. │ │ To silence these warnings, use TF_VAR_... environment variables to provide │ certain "global" settings to all configurations in your organization. To │ reduce the verbosity of these warnings, use the -compact-warnings option. ╵ ╷ │ Warning: Values for undeclared variables │ │ In addition to the other similar warnings shown, 5 other variable(s) │ defined without being declared. ╵ ```

To apply this plan, use the following comment:

env0 apply -e kushield_eks_cluster

Full PR Plan logs on env0

Wassap124 commented 3 months ago

Why would more types solve the issue?

Wassap124 commented 3 months ago

what would dictate which type would be used?

chpl commented 3 months ago

@Wassap124

Why would more types solve the issue?

There more instances to choose from when single instance type is unavailable - see here.

what would dictate which type would be used?

similar to the type we already use

Wassap124 commented 3 months ago

similar to the type we already use

what does that mean?

chpl commented 3 months ago

@Wassap124

we use t3a.2xlarge instances - AMD processor 8 cores 32GB

I added