HarshadRanganathan / harshadranganathan.github.io

Personal Website
MIT License
0 stars 2 forks source link

EKS Upgrades #98

Open HarshadRanganathan opened 2 years ago

HarshadRanganathan commented 2 years ago

Kubernetes 1.19

For more information about Kubernetes 1.19, see the official release announcement.

  • Starting with 1.19, Amazon EKS no longer adds the kubernetes.io/cluster/my-cluster tag to subnets passed in when clusters are created. This subnet tag is only required if you want to influence where the Kubernetes service controller or AWS Load Balancer Controller places Elastic Load Balancers. For more information about the requirements of subnets passed to Amazon EKS during cluster creation, see updates to Amazon EKS VPC and subnet requirements and considerations.

  • You're no longer required to provide a security context for non-root containers that must access the web identity token file for use with IAM roles for service accounts. For more information, see IAM roles for service accounts andproposal for file permission handling in projected service account volume on GitHub.

  • The pod identity webhook has been updated to address the missing startup probes GitHub issue. The webhook also now supports an annotation to control token expiration. For more information, see the GitHub pull request.

  • CoreDNS version 1.8.0 is the recommended version for Amazon EKS 1.19 clusters. This version is installed by default in new Amazon EKS 1.19 clusters. For more information, see Managing the CoreDNS add-on.

  • Amazon EKS optimized Amazon Linux 2 AMIs include the Linux kernel version 5.4 for Kubernetes version 1.19. For more information, see Changelog on GitHub.

  • The CertificateSigningRequest API has been promoted to stable certificates.k8s.io/v1 with the following changes:

    • spec.signerName is now required. You can't create requests for kubernetes.io/legacy-unknown with the certificates.k8s.io/v1 API.

    • You can continue to create CSRs with the kubernetes.io/legacy-unknown signer name with the certificates.k8s.io/v1beta1 API.

    • You can continue to request that a CSR to is signed for a non-node server cert, webhooks (for example, with the certificates.k8s.io/v1beta1 API). These CSRs aren't auto-approved.

    • To approve certificates, a privileged user requires kubectl 1.18.8 or later.

    For more information about the certificate v1 API, see Certificate Signing Requests in the Kubernetes documentation.

The following Amazon EKS Kubernetes resources are critical for the Kubernetes control plane to work. We recommend that you don't delete or edit them.

Permission | Kind | Namespace | Reason -- | -- | -- | -- eks:certificate-controller | Rolebinding | kube-system | Impacts signer and approver functionality in the control plane. eks:certificate-controller | Role | kube-system | Impacts signer and approver functionality in the control plane. eks:certificate-controller | ClusterRolebinding | All | Impacts kubelet's ability to request server certificates, which affects certain cluster functionality like kubectl exec and kubectl logs.

The following Kubernetes features are now supported in Kubernetes 1.19 Amazon EKS clusters:

  • The ExtendedResourceToleration admission controller is enabled. This admission controller automatically adds tolerations for taints to pods requesting extended resources, such as GPUs. This way, you don't have to manually add the tolerations. For more information, see ExtendedResourceToleration in the Kubernetes documentation.

  • Elastic Load Balancers (CLB and NLB) provisioned by the in-tree Kubernetes service controller support filtering the nodes included as instance targets. This can help prevent reaching target group limits in large clusters. For more information, see the related GitHub issue and the service.beta.kubernetes.io/aws-load-balancer-target-node-labels annotation under Other ELB annotations in the Kubernetes documentation.

  • Pod Topology Spread has reached stable status. You can use topology spread constraints to control how pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability, as well as efficient resource utilization. For more information, see Pod Topology Spread Constraints in the Kubernetes documentation.

  • The Ingress API has reached general availability. For more information, see Ingress in the Kubernetes documentation.

  • EndpointSlices are enabled by default. EndpointSlices are a new API that provides a more scalable and extensible alternative to the Endpoints API for tracking IP addresses, ports, readiness, and topology information for Pods backing a Service. For more information, see Scaling Kubernetes Networking With EndpointSlices in the Kubernetes blog.

  • Secret and ConfigMap volumes can now be marked as immutable. This significantly reduces load on the API server if there are many Secret and ConfigMap volumes in the cluster. For more information, see ConfigMap and Secret in the Kubernetes documentation.

For the complete Kubernetes 1.19 changelog, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md.

HarshadRanganathan commented 2 years ago

Kubernetes 1.20 For more information about Kubernetes 1.20, see the official release announcement.

1.20 brings new default roles and users. You can find more information in Default EKS Kubernetes roles and users. Ensure that you are using a supported cert-manager version.

The following Kubernetes features are now supported in Kubernetes 1.20 Amazon EKS clusters:

API Priority and Fairness has reached beta status and is enabled by default. This allows kube-apiserver to categorize incoming requests by priority levels.

RuntimeClass has reached stable status. The RuntimeClass resource provides a mechanism for supporting multiple runtimes in a cluster and surfaces information about that container runtime to the control plane.

Process ID Limits has now graduated to general availability.

kubectl debug has reached beta status. kubectl debug supports common debugging workflows directly from kubectl.

The Docker container runtime has been phased out. The Kubernetes community has written a blog post about this in detail with a dedicated FAQ page. Docker-produced images can continue to be used and will work as they always have. You can safely ignore the dockershim deprecation warning message printed in kubelet startup logs. Amazon EKS will eventually move to containerd as the runtime for the Amazon EKS optimized Amazon Linux 2 AMI. You can follow the containers roadmap issue for more details.

Pod Hostname as FQDN has graduated to beta status. This feature allows setting a pod’s hostname to its Fully Qualified Domain Name (FQDN), giving the ability to set the hostname field of the kernel to the FQDN of a pod.

The client-go credential plugins can now be passed in the current cluster information via the KUBERNETES_EXEC_INFO environment variable. This enhancement allows Go clients to authenticate using external credential providers, such as a key management system (KMS).

For the complete Kubernetes 1.20 changelog, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md.

HarshadRanganathan commented 2 years ago

Kubernetes 1.21 Kubernetes 1.21 is now available in Amazon EKS. For more information about Kubernetes 1.21, see the official release announcement.

Important BoundServiceAccountTokenVolume graduated to beta and is enabled by default in Kubernetes version 1.21. This feature improves security of service account tokens by allowing workloads running on Kubernetes to request JSON web tokens that are audience, time, and key bound. Service account tokens now have an expiration of one hour. In previous Kubernetes versions, they didn't have an expiration. This means that clients that rely on these tokens must refresh the tokens within an hour. The following Kubernetes client SDKs refresh tokens automatically within the required time frame:

Go version 0.15.7 and later

Python version 12.0.0 and later

Java version 9.0.0 and later

JavaScript version 0.10.3 and later

Ruby master branch

Haskell version 0.3.0.0

C# version 7.0.5 and later

If your workload is using an older client version, then you must update it. To enable a smooth migration of clients to the newer time-bound service account tokens, Kubernetes version 1.21 adds an extended expiry period to the service account token over the default one hour. For Amazon EKS clusters, the extended expiry period is 90 days. Your Amazon EKS cluster's Kubernetes API server rejects requests with tokens older than 90 days. We recommend that you check your applications and their dependencies to make sure that the Kubernetes client SDKs are the same or later than the versions listed previously. For instructions about how to identify pods that are using stale tokens, see Kubernetes service accounts.

Dual-stack networking support (IPv4 and IPv6 addresses) on pods, services, and nodes reached beta status. However, Amazon EKS and the Amazon VPC CNI plugin for Kubernetes don't currently support dual stack networking.

The Amazon EKS Optimized Amazon Linux 2 AMI now contains a bootstrap flag to enable the containerd runtime as a Docker alternative. This flag allows preparation for the removal of Docker as a supported runtime in the next Kubernetes release. For more information, see Enable the containerd runtime bootstrap flag. This can be tracked through the container roadmap on Github.

Managed node groups support for Cluster Autoscaler priority expander.

Newly created managed node groups on Amazon EKS version 1.21 clusters use the following format for the underlying Auto Scaling group name:

eks-managed-node-group-name-uuid

This enables using the priority expander feature of Cluster Autoscaler to scale node groups based on user defined priorities. A common use case is to prefer scaling spot node groups over on-demand groups. This behavior change solves the containers roadmap issue #1304.

The following Kubernetes features are now supported in Amazon EKS 1.21 clusters:

CronJobs (previously ScheduledJobs) have now graduated to stable status. With this change, users perform regularly scheduled actions such as backups and report generation.

Immutable Secrets and ConfigMaps have now graduated to stable status. A new, immutable field was added to these objects to reject changes. This rejection protects the cluster from updates that can unintentionally break the applications. Because these resources are immutable, kubelet doesn't watch or poll for changes. This reduces kube-apiserver load and improving scalability and performance.

Graceful Node Shutdown has now graduated to beta status. With this update, the kubelet is aware of node shutdown and can gracefully terminate that node's pods. Before this update, when a node shutdown, its pods didn't follow the expected termination lifecycle. This caused workload problems. Now, the kubelet can detect imminent system shutdown through systemd, and inform running pods so they terminate gracefully.

Pods with multiple containers can now use the kubectl.kubernetes.io/default-container annotation to have a container preselected for kubectl commands.

PodSecurityPolicy is being phased out. PodSecurityPolicy will still be functional for several more releases according to Kubernetes deprecation guidelines. For more information, see PodSecurityPolicy Deprecation: Past, Present, and Future and the AWS blog.

For the complete Kubernetes 1.21 changelog, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md.

HarshadRanganathan commented 2 years ago

Kubernetes 1.18 For more information about Kubernetes 1.18, see the official release announcement.

The following Kubernetes features are now supported in Kubernetes 1.18 Amazon EKS clusters:

Topology Manager has reached beta status. This feature allows the CPU and Device Manager to coordinate resource allocation decisions, optimizing for low latency with machine learning and analytics workloads. For more information, see Control Topology Management Policies on a node in the Kubernetes documentation.

Server-side Apply is updated with a new beta version. This feature tracks and manages changes to fields of all new Kubernetes objects. This helps you to know what changed your resources and when. For more information, see What is Server-side Apply? in the Kubernetes documentation.

A new pathType field and a new IngressClass resource has been added to the Ingress specification. These features make it simpler to customize Ingress configuration, and are supported by the AWS Load Balancer Controller (formerly called the ALB Ingress Controller). For more information, see Improvements to the Ingress API in Kubernetes1.18 in the Kubernetes documentation.

Configurable horizontal pod autoscaling behavior. For more information, see Support for configurable scaling behavior in the Kubernetes documentation.

New clusters contain updated default values in externalTrafficPolicy. HealthyThresholdCount and UnhealthyThresholdCount are 2 each, and HealthCheckIntervalSeconds is reduced to ten seconds. Clusters created in older versions and upgraded retain the old values.

For the complete Kubernetes 1.18 changelog, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md.

HarshadRanganathan commented 1 year ago

Kubernetes 1.22 Kubernetes 1.22 is now available in Amazon EKS. For more information about Kubernetes 1.22, see the official release announcement.

Important BoundServiceAccountTokenVolume graduated to stable and enabled by default in Kubernetes version 1.22. This feature improves security of service account tokens. It allows workloads that are running on Kubernetes to request JSON web tokens that are audience, time, and key bound. Service account tokens now have an expiration of one hour. In previous Kubernetes versions, they didn't have an expiration. This means that clients that rely on these tokens must refresh the tokens within an hour. The following Kubernetes client SDKs refresh tokens automatically within the required timeframe:

Go version 0.15.7 and later

Python version 12.0.0 and later

Java version 9.0.0 and later

JavaScript version 0.10.3 and later

Ruby master branch

Haskell version 0.3.0.0

C# version 7.0.5 and later

If your workload is using an older client version, then you must update it. To enable a smooth migration of clients to the newer time-bound service account tokens, Kubernetes version 1.22 adds an extended expiry period to the service account token over the default one hour. For Amazon EKS clusters, the extended expiry period is 90 days. Your Amazon EKS cluster's Kubernetes API server rejects requests with tokens older than 90 days. We recommend that you check your applications and their dependencies. Make sure that the Kubernetes client SDKs are the same or later than the versions listed previously. For instructions about how to identify pods that are using stale tokens, see Kubernetes service accounts.

Kubernetes 1.22 removes a number of APIs that are no longer available. You might need to make changes to your application before you upgrade to Amazon EKS version 1.22. Follow the Kubernetes version 1.22 prerequisites carefully before updating your cluster.

The Ingress API versions extensions/v1beta1 and networking.k8s.io/v1beta1 have been removed in Kubernetes 1.22. If you're using the AWS Load Balancer Controller, you must upgrade to at least version 2.4.1 before you upgrade your Amazon EKS clusters to version 1.22. Additionally, you must modify Ingress manifests to use apiVersion networking.k8s.io/v1. This has been available since Kubernetes version 1.19). For more information about changes between Ingress v1beta1 and v1, see the Kubernetes documentation. The AWS Load Balancer Controller controller sample manifest uses the v1 spec.

The Amazon EKS legacy Windows support controllers use the admissionregistration.k8s.io/v1beta1 API that was removed in Kubernetes 1.22. If you're running Windows workloads, you must remove legacy Windows support and enable Windows support before upgrading to Amazon EKS version 1.22.

The CertificateSigningRequest (CSR) API version certificates.k8s.io/v1beta1 was removed in Kubernetes version 1.22. You must migrate manifests and API clients to use the certificates.k8s.io/v1 CSR API. This API has been available since version 1.19. For instructions on how to use CSR in Amazon EKS, see Certificate signing.

The CustomResourceDefinition API version apiextensions.k8s.io/v1beta1 was removed in Kubernetes 1.22. Make sure that all custom resource definitions in your cluster are updated to v1. API version v1 custom resource definitions are required to have Open API v3 schema validation defined. For more information, see the Kubernetes documentation.

If you're using App Mesh, you must upgrade to at least App Mesh controller v1.4.3 or later before you upgrade to Amazon EKS version 1.22. Older versions of the App Mesh controller use v1beta1 CustomResourceDefinition API version and aren't compatible with Kubernetes version 1.22 and later.

Amazon EKS version 1.22 enables the EndpointSliceTerminatingCondition feature by default, which includes pods in terminating state within EndpointSlices. If you set enableEndpointSlices to True (disabled by default) in the AWS Load Balancer Controller, you must upgrade to at least AWS Load Balancer Controller version 2.4.1+ before upgrading to Amazon EKS version 1.22.

Starting with Amazon EKS version 1.22, kube-proxy is configured by default to expose Prometheus metrics outside the pod. This behavior change addresses the request made in containers roadmap issue #657 .

The initial launch of Amazon EKS version 1.22 uses etcd version 3.4 as a backend, and is not affected by the possibility of data corruption present in etcd version 3.5.

Starting with Amazon EKS 1.22, Amazon EKS is decoupling AWS cloud specific control logic from core control plane code to the out-of-tree AWS Kubernetes Cloud Controller Manager. This is in line with the upstream Kubernetes recommendation. By decoupling the interoperability logic between Kubernetes and the underlying cloud infrastructure, the cloud-controller-manager component enables cloud providers to release features at a different pace compared to the main Kubernetes project. This change is transparent and requires no action. However, a new log stream named cloud-controller-manager now appears under the ControllerManager log type when enabled. For more information, see Amazon EKS control plane logging.

Starting with Amazon EKS 1.22, Amazon EKS is changing the default AWS Security Token Service endpoint used by IAM roles for service accounts (IRSA) to be the regional endpoint instead of the global endpoint to reduce latency and improve reliability. You can optionally configure IRSA to use the global endpoint in Configuring the AWS Security Token Service endpoint for a service account.

The following Kubernetes features are now supported in Kubernetes 1.22 Amazon EKS clusters:

Server-side Apply graduates to GA - Server-side Apply helps users and controllers manage their resources through declarative configurations. It allows them to create or modify objects declaratively by sending their fully specified intent. After being in beta for a couple releases, Server-side Apply is now generally available.

Warning mechanism for use of unsupported APIs - Use of unsupported APIs produces warnings visible to API consumers, and metrics visible to cluster administrators.

For the complete Kubernetes 1.22 changelog, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#changelog-since-v1210.

HarshadRanganathan commented 1 year ago

Kubernetes 1.23 Kubernetes 1.23 is now available in Amazon EKS. For more information about Kubernetes 1.23, see the official release announcement.

Important The Kubernetes in-tree to container storage interface (CSI) volume migration feature is enabled. This feature enables the replacement of existing Kubernetes in-tree storage plugins for Amazon EBS with a corresponding Amazon EBS CSI driver. For more information, see Kubernetes 1.17 Feature: Kubernetes In-Tree to CSI Volume Migration Moves to Beta on the Kubernetes blog.

The feature translates in-tree APIs to equivalent CSI APIs and delegates operations to a replacement CSI driver. With this feature, if you use existing StorageClass, PersistentVolume, and PersistentVolumeClaim objects that belong to these workloads, there likely won't be any noticeable change. The feature enables Kubernetes to delegate all storage management operations from the in-tree plugin to the CSI driver. If you use Amazon EBS volumes in an existing cluster, install the Amazon EBS CSI driver in your cluster before you update your cluster to version 1.23. If you don't install the driver before updating an existing cluster, interruptions to your workloads might occur. If you plan to deploy workloads that use Amazon EBS volumes in a new 1.23 cluster, install the Amazon EBS CSI driver in your cluster before deploying the workloads your cluster. For instructions on how to install the Amazon EBS CSI driver on your cluster, see Amazon EBS CSI driver. For frequently asked questions about the migration feature, see Amazon EBS CSI migration frequently asked questions.

Kubernetes stopped supporting dockershim in version 1.20 and removed dockershim in version 1.24. For more information, see Kubernetes is Moving on From Dockershim: Commitments and Next Steps in the Kubernetes blog. Amazon EKS will end support for dockershim starting in Amazon EKS version 1.24. Starting with Amazon EKS version 1.24, Amazon EKS official AMIs will have containerd as the only runtime.

Even though Amazon EKS version 1.23 continues to support dockershim, we recommend that you start testing your applications now to identify and remove any Docker dependencies. This way, you are prepared to update your cluster to version 1.24. For more information about dockershim removal, see Amazon EKS ended support for Dockershim.

Kubernetes graduated IPv4/IPv6 dual-stack networking for Pods, services, and nodes to general availability. However, Amazon EKS and the Amazon VPC CNI plugin for Kubernetes don't support dual-stack networking. Your clusters can assign IPv4 or IPv6 addresses to Pods and services, but can't assign both address types.

Kubernetes graduated the Pod Security Admission (PSA) feature to beta. The feature is enabled by default. For more information, see Pod Security Admission in the Kubernetes documentation. PSA replaces the Pod Security Policy (PSP) admission controller. The PSP admission controller isn't supported and is scheduled for removal in Kubernetes version 1.25.

The PSP admission controller enforces Pod security standards on Pods in a namespace based on specific namespace labels that set the enforcement level. For more information, see Pod Security Standards (PSS) and Pod Security Admission (PSA) in the Amazon EKS best practices guide.

The kube-proxy image deployed with clusters is now the minimal base image maintained by Amazon EKS Distro (EKS-D). The image contains minimal packages and doesn't have shells or package managers.

Kubernetes graduated ephemeral containers to beta. Ephemeral containers are temporary containers that run in the same namespace as an existing Pod. You can use them to observe the state of Pods and containers for troubleshooting and debugging purposes. This is especially useful for interactive troubleshooting when kubectl exec is insufficient because either a container has crashed or a container image doesn't include debugging utilities. An example of a container that includes a debugging utility is distroless images. For more information, see Debugging with an ephemeral debug container in the Kubernetes documentation.

Kubernetes graduated the HorizontalPodAutoscaler autoscaling/v2 stable API to general availability. The HorizontalPodAutoscaler autoscaling/v2beta2 API is deprecated. It will be unavailable in 1.26.

The goaway-chance option in the Kubernetes API server helps prevent HTTP/2 client connections from being stuck on a single API server instance, by randomly closing a connection. When the connection is closed, the client will try to reconnect, and will likely land on a different API server as a result of load balancing. Amazon EKS version 1.23 has enabled goaway-chance flag. If your workload running on Amazon EKS cluster uses a client that is not compatible with HTTP GOAWAY, we recommend that you update your client to handle GOAWAY by reconnecting on connection termination.

For the complete Kubernetes 1.23 changelog, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#changelog-since-v1220.

HarshadRanganathan commented 1 year ago

Kubernetes 1.24 Kubernetes 1.24 is now available in Amazon EKS. For more information about Kubernetes 1.24, see the official release announcement.

Important Starting with Kubernetes 1.24, new beta APIs aren't enabled in clusters by default. By default, existing beta APIs and new versions of existing beta APIs continue to be enabled. Amazon EKS follows the same behavior as upstream Kubernetes 1.24. The feature gates that control new features for both new and existing API operations are enabled by default. This is in alignment with upstream Kubernetes. For more information, see KEP-3136: Beta APIs Are Off by Default on GitHub.

Support for Container Runtime Interface (CRI) for Docker (also known as Dockershim) is removed from Kubernetes 1.24. Amazon EKS official AMIs have containerd as the only runtime. Before moving to Amazon EKS 1.24 or higher, you must remove any reference to bootstrap script flags that aren't supported anymore. You must also make sure that IP forwarding is enabled for your worker nodes. For more information, see Amazon EKS ended support for Dockershim.

If you already have Fluentd configured for Container Insights, then you must migrate Fluentd to Fluent Bit before updating your cluster. The Fluentd parsers are configured to only parse log messages in JSON format. Unlike dockerd, the containerd container runtime has log messages that aren't in JSON format. If you don't migrate to Fluent Bit, some of the configured Fluentd's parsers will generate a massive amount of errors inside the Fluentd container. For more information on migrating, see Set up Fluent Bit as a DaemonSet to send logs to CloudWatch Logs.

In Kubernetes 1.23 and earlier, kubelet serving certificates with unverifiable IP and DNS Subject Alternative Names (SANs) are automatically issued with unverifiable SANs. These unverifiable SANs are omitted from the provisioned certificate. In version 1.24 and later clusters, kubelet serving certificates aren't issued if any SAN can't be verified. This prevents kubectl exec and kubectl logs commands from working. For more information, see Certificate signing considerations before upgrading your cluster to Kubernetes 1.24.

When upgrading an Amazon EKS 1.23 cluster that uses Fluent Bit, you must make sure that it's running k8s/1.3.12 or later. You can do this by reapplying the latest applicable Fluent Bit YAML file from GitHub. For more information, see Setting up Fluent Bit in the Amazon CloudWatch User Guide.

You can use Topology Aware Hints to indicate your preference for keeping traffic in zone when cluster worker nodes are deployed across multiple availability zones. Routing traffic within a zone can help reduce costs and improve network performance. By default, Topology Aware Hints are enabled in Amazon EKS 1.24. For more information, see Topology Aware Hints in the Kubernetes documentation.

The PodSecurityPolicy (PSP) is scheduled for removal in Kubernetes 1.25. PSPs are being replaced with Pod Security Admission (PSA). PSA is a built-in admission controller that uses the security controls that are outlined in the Pod Security Standards (PSS) . PSA and PSS are both beta features and are enabled in Amazon EKS by default. To address the removal of PSP in version 1.25, we recommend that you implement PSS in Amazon EKS. For more information, see Implementing Pod Security Standards in Amazon EKS on the AWS blog.

The client.authentication.k8s.io/v1alpha1 ExecCredential is removed in Kubernetes 1.24. The ExecCredential API was generally available in Kubernetes 1.22. If you use a client-go credential plugin that relies on the v1alpha1 API, contact the distributor of your plugin on how to migrate to the v1 API.

For Kubernetes 1.24, we contributed a feature to the upstream Cluster Autoscaler project that simplifies scaling Amazon EKS managed node groups to and from zero nodes. Previously, for the Cluster Autoscaler to understand the resources, labels, and taints of a managed node group that was scaled to zero nodes, you needed to tag the underlying Amazon EC2 Auto Scaling group with the details of the nodes that it was responsible for. Now, when there are no running nodes in the managed node group, the Cluster Autoscaler calls the Amazon EKS DescribeNodegroup API operation. This API operation provides the information that the Cluster Autoscaler requires of the managed node group's resources, labels, and taints. This feature requires that you add the eks:DescribeNodegroup permission to the Cluster Autoscaler service account IAM policy. When the value of a Cluster Autoscaler tag on the Auto Scaling group powering an Amazon EKS managed node group conflicts with the node group itself, the Cluster Autoscaler prefers the value of the Auto Scaling group tag. This is so that you can override values as needed. For more information, see Autoscaling.

If you intend to use Inferentia or Trainium instance types with Amazon EKS 1.24, you must upgrade to the AWS Neuron device plugin version 1.9.3.0 or later. For more information, see Neuron K8 release [1.9.3.0] in the AWS Neuron Documentation.

Containerd has IPv6 enabled for Pods, by default. It applies node kernel settings to Pod network namespaces. Because of this, containers in a Pod bind to both IPv4 (127.0.0.1) and IPv6 (::1) loopback addresses. IPv6 is the default protocol for communication. Before updating your cluster to version 1.24, we recommend that you test your multi-container Pods. Modify apps so that they can bind to all IP addresses on loopback interfaces. The majority of libraries enable IPv6 binding, which is backward compatible with IPv4. When it’s not possible to modify your application code, you have two options:

Run an init container and set disable ipv6 to true (sysctl -w net.ipv6.conf.all.disable ipv6=1).

Configure a mutating admission webhook to inject an init container alongside your application Pods.

If you need to block IPv6 for all Pods across all nodes, you might have to disable IPv6 on your instances.

The goaway-chance option in the Kubernetes API server helps prevent HTTP/2 client connections from being stuck on a single API server instance, by randomly closing a connection. When the connection is closed, the client will try to reconnect, and will likely land on a different API server as a result of load balancing. Amazon EKS version 1.24 has enabled goaway-chance flag. If your workload running on Amazon EKS cluster uses a client that is not compatible with HTTP GOAWAY, we recommend that you update your client to handle GOAWAY by reconnecting on connection termination.

For the complete Kubernetes 1.24 changelog, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#changelog-since-v1230.

HarshadRanganathan commented 5 months ago

Kubernetes 1.25 is now available in Amazon EKS. For more information about Kubernetes 1.25, see the official release announcement.

Important Starting with Kubernetes version 1.25, you will no longer be able to use Amazon EC2 P2 instances with the Amazon EKS optimized accelerated Amazon Linux AMIs out of the box. These AMIs for Kubernetes versions 1.25 or later will support NVIDIA 525 series or later drivers, which are incompatible with the P2 instances. However, NVIDIA 525 series or later drivers are compatible with the P3, P4, and P5 instances, so you can use those instances with the AMIs for Kubernetes version 1.25 or later. Before your Amazon EKS clusters are upgraded to version 1.25, migrate any P2 instances to P3, P4, and P5 instances. You should also proactively upgrade your applications to work with the NVIDIA 525 series or later. We plan to back port the newer NVIDIA 525 series or later drivers to Kubernetes versions 1.23 and 1.24 in late January 2024.

PodSecurityPolicy (PSP) is removed in Kubernetes 1.25. PSPs are replaced with Pod Security Admission (PSA) and Pod Security Standards (PSS). PSA is a built-in admission controller that implements the security controls outlined in the PSS . PSA and PSS are graduated to stable in Kubernetes 1.25 and are enabled in Amazon EKS by default. If you have PSPs in your cluster, make sure to migrate from PSP to the built-in Kubernetes PSS or to a policy-as-code solution before upgrading your cluster to version 1.25. If you don't migrate from PSP, you might encounter interruptions to your workloads. For more information, see the Pod security policy (PSP) removal FAQ.

Kubernetes version 1.25 contains changes that alter the behavior of an existing feature known as API Priority and Fairness (APF). APF serves to shield the API server from potential overload during periods of heightened request volumes. It does this by placing restrictions on the number of concurrent requests that can be processed at any given time. This is achieved through the application of distinct priority levels and limits to requests originating from various workloads or users. This approach ensures that critical applications or high-priority requests receive preferential treatment, while simultaneously preventing lower priority requests from overwhelming the API server. For more information, see API Priority and Fairness in the Kubernetes documentation or API Priority and Fairness in the EKS Best Practices Guide.

These updates were introduced in PR #10352 and PR #118601. Previously, APF treated all types of requests uniformly, with each request consuming a single unit of the concurrent request limit. The APF behavior change assigns higher units of concurrency to LIST requests due to the exceptionally heavy burden put on the API server by these requests. The API server estimates the number of objects that will be returned by a LIST request. It assigns a unit of concurrency that is proportional to the number of objects returned.

Upon upgrading to Amazon EKS version 1.25 or higher, this updated behavior might cause workloads with heavy LIST requests (that previously functioned without issue) to encounter rate limiting. This would be indicated by an HTTP 429 response code. To avoid potential workload disruption due to LIST requests being rate limited, we strongly encourage you to restructure your workloads to reduce the rate of these requests. Alternatively, you can address this issue by adjusting the APF settings to allocate more capacity for essential requests while reducing the capacity allocated to non-essential ones. For more information about these mitigation techniques, see Preventing Dropped Requests in the EKS Best Practices Guide.

Amazon EKS 1.25 includes enhancements to cluster authentication that contain updated YAML libraries. If a YAML value in the aws-auth ConfigMap found in the kube-system namespace starts with a macro, where the first character is a curly brace, you should add quotation marks (“ ”) before and after the curly braces ({ }). This is required to ensure that aws-iam-authenticator version v0.6.3 accurately parses the aws-auth ConfigMap in Amazon EKS 1.25.

The beta API version (discovery.k8s.io/v1beta1) of EndpointSlice was deprecated in Kubernetes 1.21 and is no longer served as of Kubernetes 1.25. This API has been updated to discovery.k8s.io/v1. For more information, see EndpointSlice in the Kubernetes documentation. The AWS Load Balancer Controller v2.4.6 and earlier used the v1beta1 endpoint to communicate with EndpointSlices. If you're using the EndpointSlices configuration for the AWS Load Balancer Controller, you must upgrade to AWS Load Balancer Controller v2.4.7 before upgrading your Amazon EKS cluster to 1.25. If you upgrade to 1.25 while using the EndpointSlices configuration for the AWS Load Balancer Controller, the controller will crash and result in interruptions to your workloads. To upgrade the controller, see What is the AWS Load Balancer Controller?.

SeccompDefault is promoted to beta in Kubernetes 1.25. By setting the --seccomp-default flag when you configure kubelet, the container runtime uses its RuntimeDefaultseccomp profile, rather than the unconfined (seccomp disabled) mode. The default profiles provide a strong set of security defaults, while preserving the functionality of the workload. Although this flag is available, Amazon EKS doesn't enable this flag by default, so Amazon EKS behavior is effectively unchanged. If you want to, you can start enabling this on your nodes. For more details, see the tutorial Restrict a Container's Syscalls with seccomp in the Kubernetes documentation.

Support for the Container Runtime Interface (CRI) for Docker (also known as Dockershim) was removed from Kubernetes 1.24 and later. The only container runtime in Amazon EKS official AMIs for Kubernetes 1.24 and later clusters is containerd. Before upgrading to Amazon EKS 1.24 or later, remove any reference to bootstrap script flags that aren't supported anymore. For more information, see Amazon EKS ended support for Dockershim.

The support for wildcard queries was deprecated in CoreDNS 1.8.7 and removed in CoreDNS 1.9. This was done as a security measure. Wildcard queries no longer work and return NXDOMAIN instead of an IP address.

The goaway-chance option in the Kubernetes API server helps prevent HTTP/2 client connections from being stuck on a single API server instance, by randomly closing a connection. When the connection is closed, the client will try to reconnect, and will likely land on a different API server as a result of load balancing. Amazon EKS version 1.25 has enabled goaway-chance flag. If your workload running on Amazon EKS cluster uses a client that is not compatible with HTTP GOAWAY, we recommend that you update your client to handle GOAWAY by reconnecting on connection termination.

For the complete Kubernetes 1.25 changelog, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#changelog-since-v1240.

HarshadRanganathan commented 5 months ago

Kubernetes 1.26 is now available in Amazon EKS. For more information about Kubernetes 1.26, see the official release announcement.

Important Kubernetes 1.26 no longer supports CRI v1alpha2. This results in the kubelet no longer registering the node if the container runtime doesn't support CRI v1. This also means that Kubernetes 1.26 doesn't support containerd minor version 1.5 and earlier. If you're using containerd, you need to upgrade to containerd version 1.6.0 or later before you upgrade any nodes to Kubernetes 1.26. You also need to upgrade any other container runtimes that only support the v1alpha2. For more information, defer to the container runtime vendor. By default, Amazon Linux and Bottlerocket AMIs include containerd version 1.6.6.

Before you upgrade to Kubernetes 1.26, upgrade your Amazon VPC CNI plugin for Kubernetes to version 1.12 or later. If you don't upgrade to Amazon VPC CNI plugin for Kubernetes version 1.12 or later, the Amazon VPC CNI plugin for Kubernetes will crash. For more information, see Working with the Amazon VPC CNI plugin for Kubernetes Amazon EKS add-on.

The goaway-chance option in the Kubernetes API server helps prevent HTTP/2 client connections from being stuck on a single API server instance, by randomly closing a connection. When the connection is closed, the client will try to reconnect, and will likely land on a different API server as a result of load balancing. Amazon EKS version 1.26 has enabled goaway-chance flag. If your workload running on Amazon EKS cluster uses a client that is not compatible with HTTP GOAWAY, we recommend that you update your client to handle GOAWAY by reconnecting on connection termination.

HarshadRanganathan commented 5 months ago

Kubernetes 1.27 is now available in Amazon EKS. For more information about Kubernetes 1.27, see the official release announcement.

Important The support for the alpha seccomp annotations seccomp.security.alpha.kubernetes.io/pod and container.seccomp.security.alpha.kubernetes.io annotations was removed. The alpha seccomp annotations was deprecated in 1.19, and with their removal in 1.27, seccomp fields will no longer auto-populate for Pods with seccomp annotations. Instead, use the securityContext.seccompProfile field for Pods or containers to configure seccomp profiles. To check whether you are using the deprecated alpha seccomp annotations in your cluster, run the following command:

kubectl get pods --all-namespaces -o json | grep -E 'seccomp.security.alpha.kubernetes.io/pod|container.seccomp.security.alpha.kubernetes.io' The --container-runtime command line argument for the kubelet was removed. The default container runtime for Amazon EKS has been containerd since 1.24, which eliminates the need to specify the container runtime. From 1.27 onwards, Amazon EKS will ignore the --container-runtime argument passed to any bootstrap scripts. It is important that you don't pass this argument to --kubelet-extra-args in order to prevent errors during the node bootstrap process. You must remove the --container-runtime argument from all of your node creation workflows and build scripts.

The kubelet in Kubernetes 1.27 increased the default kubeAPIQPS to 50 and kubeAPIBurst to 100. These enhancements allow the kubelet to handle a higher volume of API queries, improving response times and performance. When the demands for Pods increase, due to scaling requirements, the revised defaults ensure that the kubelet can efficiently manage the increased workload. As a result, Pod launches are quicker and cluster operations are more effective.

You can use more fine grained Pod topology to spread policies such as minDomain. This parameter gives you the ability to specify the minimum number of domains your Pods should be spread across. nodeAffinityPolicy and nodeTaintPolicy allow for an extra level of granularity in governing Pod distribution. This is in accordance to node affinities, taints, and the matchLabelKeys field in the topologySpreadConstraints of your Pod's specification. This permits the selection of Pods for spreading calculations following a rolling upgrade.

Kubernetes1.27 promoted to beta a new policy mechanism for StatefulSets that controls the lifetime of their PersistentVolumeClaims(PVCs). The new PVC retention policy lets you specify if the PVCs generated from the StatefulSet spec template will be automatically deleted or retained when the StatefulSet is deleted or replicas in the StatefulSet are scaled down.

The goaway-chance option in the Kubernetes API server helps prevent HTTP/2 client connections from being stuck on a single API server instance, by randomly closing a connection. When the connection is closed, the client will try to reconnect, and will likely land on a different API server as a result of load balancing. Amazon EKS version 1.27 has enabled goaway-chance flag. If your workload running on Amazon EKS cluster uses a client that is not compatible with HTTP GOAWAY, we recommend that you update your client to handle GOAWAY by reconnecting on connection termination.

For the complete Kubernetes 1.27 changelog, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1260.

HarshadRanganathan commented 5 months ago

Kubernetes 1.28

Kubernetes 1.28 is now available in Amazon EKS. For more information about Kubernetes 1.28, see the official release announcement.

Kubernetes v1.28 expanded the supported skew between core node and control plane components by one minor version, from n-2 to n-3, so that node components (kubelet and kube-proxy) for the oldest supported minor version can work with control plane components (kube-apiserver, kube-scheduler, kube-controller-manager, cloud-controller-manager) for the newest supported minor version.

Metrics force_delete_pods_total and force_delete_pod_errors_total in the Pod GC Controller are enhanced to account for all forceful pods deletion. A reason is added to the metric to indicate whether the pod is forcefully deleted because it's terminated, orphaned, terminating with the out-of-service taint, or terminating and unscheduled.

The PersistentVolume (PV) controller has been modified to automatically assign a default StorageClass to any unbound PersistentVolumeClaim with the storageClassName not set. Additionally, the PersistentVolumeClaim admission validation mechanism within the API server has been adjusted to allow changing values from an unset state to an actual StorageClass name.

For the complete Kubernetes 1.28 changelog, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v1270.

HarshadRanganathan commented 5 months ago

Kubernetes 1.29 is now available in Amazon EKS. For more information about Kubernetes 1.29, see the official release announcement.

Important The deprecated flowcontrol.apiserver.k8s.io/v1beta2 API version of FlowSchema and PriorityLevelConfiguration are no longer served in Kubernetes v1.29. If you have manifests or client software that uses the deprecated beta API group, you should change these before you upgrade to v1.29.

The .status.kubeProxyVersion field for Node objects is now deprecated, and the Kubernetes project is proposing to remove that field in a future release. The deprecated field is not accurate and has historically been managed by kubelet - which does not actually know the kube-proxy version, or even whether kube-proxy is running. If you've been using this field in client software, stop - the information isn't reliable and the field is now deprecated.

In Kubernetes 1.29 to reduce potential attack surface, the LegacyServiceAccountTokenCleanUp feature labels legacy auto-generated secret-based tokens as invalid if they have not been used for a long time (1 year by default), and automatically removes them if use is not attempted for a long time after being marked as invalid (1 additional year by default). To identify such tokens, a you can run:

kubectl get cm kube-apiserver-legacy-service-account-token-tracking -nkube-system For the complete Kubernetes 1.29 changelog, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#changelog-since-v1280.