Open HakjunMIN opened 6 months ago
Could you describe the process you use to build custom node image? (Especially interested in whether it is based off an AKS node image, since it affects bootstrap.)
Known as AKS node image is not able to customize, but would like to do as AWS way such as
I don't have an idea yet to build custom node image, but if there is a Packer or other tools to build AKS node image, want to let it have pre cached ML image.
My goal is to reduce time to pull large image of AI/ML using local cache in environment of Karpenter.
The template and scripts used to build aks node images via packer compiling are all open source: https://github.com/Azure/AgentBaker.
That being said step 1 is to enable artifact streaming on karpenter nodes. I have a POC for this just didn't have the time to setup the e2e test as its a bit more involved. https://github.com/Azure/karpenter-provider-azure/pull/121 was the POC.
Custom Node Image isn't on the immediate plans as we are first making things reliable and stable, but artifact streaming may be a start.
One older project that may be worth mentioning is kamino: https://github.com/jackfrancis/kamino?tab=readme-ov-file The idea here IIRC is that we follow a prototype pattern. This prototype would have a conceptual "golden node". This golden node would have your cached images, then we snapshot that node, and use that node image for all of your nodes. This "golden node image" would have the things you need cached on the node.
When we do tackle something like this I imagine we will go into a direction like that so that the node image you are using has everything we need on the aks side and isn't doing too much but you still get that cache performance improvement.
@Bryce-Soghigian Thank you much. As you guided will try artifact streaming first then move to kamino. I believe kamino can be worked well with karpenter as well. Certainly I'll test it.
Kamino will not work with karpenter in the projects current state for a couple of reasons.
I will get started on adding artifact streaming support. There is a fair bit of work to do before we can support a kamino style node image cache layer in karpenter.
@Bryce-Soghigian Oh. understood. But Artifact Streaming doesn't support Karpenter now? What is approximate ETA to add Artifact Streaming to Karpenter?
@HakjunMIN I created a separate issue to track the artifact streaming work https://github.com/Azure/karpenter-provider-azure/issues/266.
Long term, it would be great to do something similar to what you are describing here in Karpenter. We still need to work through many other things first, however. Please subscribe to the artifact streaming issue for further updates there.
@Bryce-Soghigian
Below AWS link is perfect way to implement this. Beside of artifact streming, it would be great that a custom snapshot can be used for node class image. Could you add it to your backlogs?
https://github.com/aws-samples/bottlerocket-images-cache?tab=readme-ov-file#with-karpenter
Tell us about your request
Currently it looks only predefined image can be supported on
nodeclaim
CRD. Can custom image which has pre cached container image layer be support there?Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Using karpenter, have to use AI/ML images using NVIDIA image. When scaling out by karpenter, faster image pull is necessary using cached node cluster.
Are you currently working around this issue?
Artifact streaming in AKS but this is very slow than local cache. Also local cluster registry can be utilized but it is a burden.
Additional Context
No response
Attachments
No response
Community Note