aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.2k stars 316 forks source link

[EKS] [request]: Event-driven notification, auditing, and integration with Amazon EventBridge #2203

Open embano1 opened 9 months ago

embano1 commented 9 months ago

Community Note

Tell us about your request

tl;dr Reliably subscribe to events across EKS Kubernetes clusters to audit, notify, and react to changes of arbitrary Kubernetes cluster resources, such as pods, deployments, and custom resources (CRD), using Amazon EventBridge Event Buses and low-code integrations with EventBridge rules and targets.

Background Events represent immutable and observable state changes in a system. Kubernetes’ design is inherently event-driven where its components, referred to as controllers, use event-driven logic to continuously reconcile the desired with the observed state of resources within the cluster. Controllers subscribe to events using the Kubernetes Watch API (ListWatch pattern), a continuous stream of state change events from observed resources e.g., “ADD”, “UPDATE”, and “DELETE”. Controllers then act on these events and the whole system resembles a choreography without a central orchestrator. For example, when a Deployment resource is created, the Deployment controller receives a deployment added event and creates a new ReplicaSet. Then, the ReplicaSet controller is triggered with an replicaset added event and creates the desired number of Pods, which will trigger the Kubernetes Scheduler to select a worker node, and so on...

Today, developers use Kubernetes events (or higher-level abstractions, such as controller-runtime and kubebuilder) to add custom logic and extend the Kubernetes API (Custom Resource Definitions). Operators rely on Kubernetes events for troubleshooting, debugging, auditing, automation, and notification.

At a high-level, events in Kubernetes can be grouped into four categories:

  1. Events emitted by Kubernetes cloud services, such as Amazon EKS, e.g., when a new cluster is provisioned or a new version is available
  2. Events emitted by the various Kubernetes and custom controllers, i.e., the Kubernetes events.k8s.io resource
  3. Events emitted by the Kubernetes Watch API for every state change in a resource, i.e., low-level ADD/UPDATE/DELETE events used to implement event-driven reconciliation logic in controllers
  4. Events emitted directly by an application running on Kubernetes, i.e., business events generated by a microservice (out of scope in this issue)

The idea is to leverage those events (categories 1-3 above) to simplify operations and development in Kubernetes (EKS) environments, and unlock new integrations with AWS and third party services. The proposed user experience is to enable the integration with a single click during the EKS cluster creation. Then, Kubernetes events will be continuously streamed to a fully-managed Event Bus, using Amazon EventBridge. Event Buses enable developers and operators to easily filter, transform, and route events to many Amazon targets, such as Amazon CloudWatch, AWS Lambda, Amazon Step Functions, Amazon Kinesis, as well as external targets, such as partner integrations and webhooks, without writing code.

Operators can use this integration to 1/ audit state changes in their Kubernetes clusters, such as when secrets, namespaces, roles, or custom resources are modified, 2/ send notifications to external monitoring or ChatOps systems (Slack), 3/ create alerts using Amazon CloudWatch from event details, such as the number of pod restarts, 4/ trigger remediation actions, or 5/ combine these use cases with AI services like Amazon Bedrock to assist support personnel with explanations e.g., what might have caused a NodeDiskPressure event.

Developers can use this integration to build custom integration logic between EKS and their AWS infrastructure, using AWS Lambda functions and the language of their choice, or orchestrate complex workflows using Step Functions. For example, updating a Route53 zone when an Ingress resource is changed, updating VPC subnets, IP pools, or security groups when Nodes are added/removed, or reacting to IAM CloudTrail activities from their Kubernetes environment. Another benefit for developers is to be able to subscribe and react to events from multiple Kubernetes clusters through a single Event Bus instead of managing controllers in each Kubernetes cluster. Furthermore, Kubernetes events can be easily distributed across accounts, and even external systems, using multiple Event Buses and API Destinations in EventBridge.

When combining event-driven systems like Kubernetes with Amazon EventBridge, the integrations and use cases for developers and operators are endless, and often only require minimal effort to implement. Furthermore, EventBridge is tightly integriert with Amazon IAM, allowing for fine-grained and centralized control over event consumption and routing across Kubernetes clusters.

Call to action

We are looking for your feedback on this proposal, and to hear any additional pain points encountered today that would not be solved by such a solution.

Which service(s) is this request for? EKS

Are you currently working around this issue? For event type [1] above, there is no workaround but the service team needs to add these events to EKS and send them to EventBridge. For event types [2] and [3] different community projects and some commercial offerings exist, however:

Additional context n/a

Attachments

Namespace ADDED event delivered to EventBridge using CloudWatch logs as target:

image

Integration idea when deploying an EKS cluster (mock):

image
jackivanov commented 3 months ago

We've been using Argo Events' resource in production, a more or less stable solution with good community support, for quite a while as we await native EKS events.

embano1 commented 3 months ago

@jackivanov this sounds like a great use case and I definitely want to hear/learn more about this from you. Can we somehow connect?

trevermckee commented 2 months ago

@embano1 An event-bridge integration with EKS events (categories 1-3) would be immensely helpful to build some security features in our product around EKS. I would be happy to connect around these use cases to describe them a bit more in-depth.

embano1 commented 2 months ago

@trevermckee thx for your comment! Let's connect on LI and set up some time.