Open ben851 opened 7 months ago
Merged to staging
Merged to prod, migrated workload to the new nodes. Need to submit a PR for old node removal
New PR created https://github.com/cds-snc/notification-terraform/pull/1245
New nodes were created in production.
Old nodes were deleted in staging this morning, will do a release to complete this in prod today.
node groups as expected:
$ aws eks list-nodegroups --cluster-name notification-canada-ca-production-eks-cluster
{
"nodegroups": [
"notification-canada-ca-production-eks-primary-node-group-k8s"
]
}
Subnets verified with aws ec2 describe-subnets
.
Description
As a developer of Notify, I would like our system to be able to accommodate scaling up in the future so that we can grow without having to rearchitect our infrastructure.
Currently the private subnets for the EKS nodes are /24 meaning we only have room for 255 IPs. We have received a warning (below) from AWS stating that we are running out of IPs in production, and that there may be service interruptions when they do patches.
WHY are we building?
We received a warning from AWS that we are running out of IPs in production
WHAT are we building?
VALUE created by our solution
Increased reliability when patching. Allows for scaling up the system more
Acceptance Criteria
QA Steps
Appendix
Amazon EKS detected cluster health issues in your AWS account 296255494825.
The following is a list of affected clusters with their cluster arns, cluster health status and corresponding cluster health issues(s): arn:aws:eks:ca-central-1:296255494825:cluster/notification-canada-ca-production-eks-cluster : IMPAIRED : Not Enough Free IP Addresses In Subnet.
The health of an EKS cluster is a shared responsibility between AWS and customers. You must resolve these issues to maintain operational stability for your EKS cluster(s). Cluster health issues can prevent Amazon EKS from patching your clusters or prevent you from upgrading to newer Kubernetes versions.
Starting on 2024-04-15, Amazon EKS will patch clusters to the latest supported platform version [1]. Clusters that are unstable due to outstanding health issues may experience loss in connectivity between the Kubernetes control plane instances and worker nodes where your workload runs. To avoid this, we recommend that you resolve outstanding cluster health issues [2] before this date.
You can also view your affected clusters in the 'Affected resources' tab in your AWS Health Dashboard or by using the DescribeCluster API [3].
[1] https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html [2] https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html#cluster-health-status