awslabs / amazon-eks-ami

Packer configuration for building a custom EKS AMI
https://awslabs.github.io/amazon-eks-ami/
MIT No Attribution
2.45k stars 1.15k forks source link

Easiest/most straightforward way to "cache" some additional, custom Docker images into the AMI Build? #1273

Open armenr opened 1 year ago

armenr commented 1 year ago

Question:

Is there an easy/simple way to feed a list of additional container images to the AMI builder or builder scripts?

Use-case:

Using something like Keda, we want to rapidly scale "worker" pods up and down, based on some message being published to a queue somewhere in redis or rabbit.

The constraint on our use-case:

The solution I'm contemplating:

Using the EKS-AMI builder, as it is, totally vanilla, but changing only 2 things:

  1. Increase the size of the root volume on the AMI (to accommodate the space needed by our gigantic images)
  2. Pass an additional list of those images to the builder so that it can do that in this stretch of the builder script: https://github.com/awslabs/amazon-eks-ami/blob/e39d71f6832221409cd9990ad85e870f6d621698/scripts/install-worker.sh#L435

Bonus (kind of a feature request)

It would be really cool to allow users/consumers of this repo to have some simple way to pass in a text file, or a comma-separated list of additional images that the "Cache Images" section of the install-worker.sh script 😬

Environment:

Chili-Man commented 1 year ago

Hey @armenr, we ran into this issue as well; We actually ended up baking custom AMIs with container images pre-cached on there, but even though we did that, the performance gains were not realized due to how the root EBS volumes are lazy loaded. So even though we booted new EC2 instances the the pre-cached container images, the root volume parts containing those images would still have to get downloaded on demand, which was effectively the same amount of time (sometimes worse) as if the container image was being downloaded from scratch. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html ;

armenr commented 1 year ago

@Chili-Man ^^ I went down this rabbit-hole 3 weeks ago, after posting here. You are right, and thanks for chiming in. :)

Just like you mentioned, I ran into the same issue. In some cases, certain images were pulled faster from ECR when we compared it to having the node start it from cached images.

I was thinking about this: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-fast-snapshot-restore.html

^^ But it's too complicated and I'd like to keep whatever little hair there is left on my head.

I think, in this case, we're kinda stuck.

armenr commented 1 year ago

@Chili-Man - you think there's a way to put tarballs of images on EFS (or JuiceFS), and then mount that as a volume into the nodes where the pods run, and have that be where the EKS nodes look for/cache/store/use their container images from? 😈

bryantbiggs commented 1 year ago

I think what you might be looking for is https://github.com/awslabs/soci-snapshotter - at least, thats one possible that does not require baking images into the AMI

bryantbiggs commented 1 year ago

xref:

armenr commented 1 year ago

@bryantbiggs - Thank you for sharing this. Sorry for sounding like a child, but could you explain how it solves the issue or fits in?

IF I've understood correctly - we customize the EKS AMI (ourselves, until this is implemented into the default image), ensure that the plugin is being used by containerd, and then launch nodes and see them pull and start images much faster, magically, since they're kinda "streaming" lazily from the ECR repo, AT time of creation?