Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 304 forks source link

[Feature] Allow configuring `serializeImagePulls` kubelet config for Windows nodes #3879

Closed lippertmarkus closed 2 weeks ago

lippertmarkus commented 1 year ago

Is your feature request related to a problem? Please describe. Windows images are big and especially after autoscaling when a lot of Pods scheduled on the new node simultaneously the pull time can be huge when done in serial. Example event (compare pull to wait time):

Pulled     Successfully pulled image "<image>" in 5m0343499s (22m24.5159081s including waiting)

Describe the solution you'd like I want to be able to set serializeImagePulls=false in kubelet.

Describe alternatives you've considered none

Additional context see also https://github.com/kubernetes/kubernetes/issues/108405

artificial-aidan commented 12 months ago

Could this be updated to allow configuring serializeImagePulls on all node types?

lippertmarkus commented 12 months ago

As far as I understand the docs it should be already configurable for Linux. Windows nodes are limited to only 4 configurable kubelet settings, where serializeImagePulls is not one of it.

artificial-aidan commented 11 months ago

As I read them serializeImagePulls isn't one of the linux settings either: https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration?tabs=linux-node-pools#linux-kubelet-custom-configuration

Unless I'm misreading something.

lippertmarkus commented 11 months ago

Oh good catch, then we can generalize the request :)

MikeFear commented 11 months ago

Any updates on this? We run our e2e tests on Kubernetes on Linux nodes and therefore need to deploy a lot of microservices in a short timespan. We are currently bottlenecked by this limitation.

allyford commented 11 months ago

I've captured the feature request to add the serializeImagePulls parameter to custom kubelet config. We currently do not have an ETA for when this change will be made, but I'll update this issue when we do.

In the meantime, we're planning a release for #3928 for Linux this year.

Rob19999 commented 9 months ago

Would love this one as well. I would recommend adding the maxParallelImagePulls setting as well to not overload network/disc as described in the official documentation https://kubernetes.io/blog/2023/05/15/speed-up-pod-startup/#maximum-parallel-image-pulls-will-help-secure-your-node-from-overloading-on-image-pulling.

In the mean time can we change the kubelet in a different manner then described above to test these settings?

PlagueHO commented 7 months ago

@allyford - I've also got teams who need this feature as well. They are currently investigating Artifact Streaming in Azure Container Registry as a stop gap, but not all images are in ACR (many on DockerHub).

ravnalexquinn commented 4 months ago

+1 for this! My team would also find this really useful for linux nodes

Rob19999 commented 4 months ago

After having used this feature with a on-premise kubernetes platform and having used Artifact streaming on a AKS. I think the impact of this feature is higher then Artifact Streaming in regards to image pull performance. I noticed that you're also working on Artifact streaming for Windows. I would like that feature but exposing these parameters seems like a small change in comparison with a lot off gain.

k-koleda commented 4 months ago

Hey, im also interested in this feature. windows images extremely big, we have some performance gaps due to slow scaling a new node. so it would be great to have this feature available.

gabrimonfa commented 2 months ago

Any update on this? An ETA maybe?

Sometimes big image pulls in aks are unbearably slow, and other pulls are waiting for it to complete. Parallel pulls may help mitigate also this issue. I've opened a ticket with support with no real workarounds or solutions, apart from suggesting the use of ACR, which in certain situations is not feasible (and no, the issue does not depend on the registry, other well-known k8s managed service have no issues at all).

artificial-aidan commented 2 months ago

This also makes using spot nodes almost impossible. Because bringing up a new node can take longer than the spot warning is.

pawelpabich commented 1 month ago

Thanks!

allyford commented 2 weeks ago

AKS is planning to release this feature as default in k8s 1.31. This feature will support linux and windows. See #4499 for details and updates. Closing this feature as duplicate.