aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 321 forks source link

[EKS] [Feature Request ]: Support for Image Streaming in EKS to Accelerate Large Container Image Pulls #2448

Open suyog1pathak opened 1 month ago

suyog1pathak commented 1 month ago

Feature Request: Support for Image Streaming in EKS to Accelerate Large Container Image Pulls

Background:

In environments where large container images are used frequently, the time it takes to pull these images can significantly impact application startup times and cluster performance. In GKE (Google Kubernetes Engine), Image Streaming has been introduced to address this issue. With Image Streaming, container images are pulled on-demand as needed, rather than being fully downloaded before the container starts. This dramatically reduces startup times for large images, especially when applications don’t need the entire image at launch.

Here is the reference to GKE’s Image Streaming feature: Image Streaming in GKE

Request:

I would like to propose adding a similar feature to Amazon EKS that allows for streaming container images directly from registries. This feature would benefit users who work with large container images and need to improve their application startup times.

Key Benefits:

Current Workarounds:

Currently, users like myself have to implement workaround solutions, such as:

While these solutions work, they add complexity and introduce unnecessary overhead into the cluster setup. A native EKS feature similar to GKE Image Streaming would provide a clean, scalable, and efficient way to handle large container images in the cluster.

jlbutler commented 1 month ago

Thanks for opening this issue. The Streaming OCI project is aligned functionally with this request, and we have considered how we might make that work for EKS customers.

To help us with that, we would be interested to hear about how customers think about the tradeoffs between starting containers faster vs long-term stability and performance.

Related to performance, how does starting a container within seconds contrast with potential impact to local IO performance in container instances? Put another way, what would be a rough threshold for acceptable performance impact related to streaming when getting super fast start times?

On stability, what would be the best way for EKS to help customers handle new types of errors not seen before in container runtime? For example if the repository becomes unreachable while a workload is running and streaming fails, what sorts of errors or retries would you expect?

Thanks in advance for any feedback.

suyog1pathak commented 1 month ago

Performance: Fast Container Start Times vs. Local IO Impact

Stability: Handling Streaming Errors

I would also like to suggest incorporating a local caching solution that leverages AWS services like EFS for faster image retrieval and scaling. This would be particularly useful for large images, such as those used in machine learning (ML) models.

jlbutler commented 4 weeks ago

@suyog1pathak thanks so much for the detailed response.

I'd like to leave this issue as a general recommendation (Streaming, vs a specific implementation). However, we do have an existing issue specifically related to Streaming OCI (SOCI) that I failed to mention in my response here.

Thanks again!