Open alexisbel1 opened 2 years ago
Hi alexisbel1, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.
I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!
Triage required from @Azure/aks-pm
Action required from @Azure/aks-pm
+1
+1
+1 would also love to see this.
+1
+1
+1
+1 !
Karpenter is an open-source node provisioning project built for Kubernetes.
Looking at the project, it doesn't seem generic enough to be used across all of kubernetes, seems to only work with AWS.
One prerequisite would be that AKS can manage multiple instance types without defining multiple node pools.
Unfortunately, VMSS today only supports one type, but there is work being done by the VMSS team to allow for this.
All the bullet points you mentioned are in scope for cluster autoscaler, but you mentioned Karpenter has many advantages over CA. Could you be a bit more specific on those? What things would you like to accomplish on AKS
All the bullet points you mentioned are in scope for cluster autoscaler, but you mentioned Karpenter has many advantages over CA. Could you be a bit more specific on those? What things would you like to accomplish on AKS
The main advantage over CA is the ability to provision new VM types based on workload requirements (resources, taints...). CA will only up and down VM of the same type in a VMSS (that why it would require to allow multiple VM types in the same node group). In case, the VM type does not match workload requirements (e.g. GPU), the pod won't be able to start.
Cluster API supports several providers: https://cluster-api.sigs.k8s.io/
For some context for others here:
Karpenter is new. And it was written by Amazon. However it's intended that additional cloud providers be added to it, just like cluster auto scaler. The source is open, and it's waiting for engineers to contribute.
It would ideally be the task of the Azure/AKS teams to provide the necessary resources to implement the Azure provider.
What differentiates it from the cluster auto scaler is it has no concept of Node Groups. There is no need to allocate a classification of a Node Group up front. Instead, it examines the requirements, and expects the cloud provider to be able to allocate exactly what it needs, from the smorgasbord of offerings the cloud provider might have.
That means if you need a machine with 32GB of ram, it'll go make one for you. If you need a spot instance it'll go make one for you. If it needs a node on AZ 2, it'll go make one for you. It doesn't require you to define all of the possible classes up front. Or it can consult the cloud provider for the most cost effective option that meets the requirements at the moment.
This does present some architectural challenges as to how this would be surfaced in AKS. Would it just go and create VMs one by one? Would it still use a VMSS, but require arbitrary resource request support within a VMSS? How will network topology be defined in the former? Etc.
But it is a much more extensible approach than the way CA is built. At least for cloud providers. Azure obviously has hundreds of VM family, series, size, and disk capabilities, operating systems, etc, and the cartesian product of them all is massive.
๐ I lead the Karpenter project. We'd love to collaborate on additional cloud providers and have done our best to factor out a simple and extensible cloud provider API to minimize the effort for other providers to adopt. If you're interested in chatting about the project, feel free to join in at our working group.
This would really be an interesting feature to support Azure/AKS
interesting and following this for future. Cannot wait to test this in AKS, whenever this feature is supported.
+1
+1 Really interesting feature๐
+1 for it. happy to collaborate
+1 looking forward to having Karpenter onboarded to AKS
Would be very helpful
+1 It would be amazing to have karpenter in AKS
Having recently implemented Karpeneter on all of our non-prod EKS clusters and also moving to spot instances we are seeing significant improvement in orchestration and cost savings with these often spikey workloads. Node counts and cost are down, and there has been no downside over 3 months now. We deploy a few nodes for karpenter and set affinity, and let karpenter do the rest from the karpenter helm recipes and some node pool definitions. Karpenter's methods of determining node size and aligning nodes with pods for best behavior is really just unseen before on kube (imho, debates welcome!). this same logical handling for spot vm's/nodes for AKS would be incredibly helpful and useful for AKS. It doesn't really compare to other CA because it takes out so much guesswork, and could be used alongside CA if done properly. Its an Apache 2.0 license. hard +1
+1 This would really be an interesting feature to support Azure/AKS
+1, this would be a really nice improvement for AKS
+1 ๐
+1 After using Karpenter with AWS EKS, it is extremely painful to manage cluster autoscaling on Azure...
+1
It would be a massive improvement for AKS if AKS had Karpenter support.
+1 It would be amazing to use Karpanter on Azure.
Check out spot.io Ocean. It does bin packing on Azure w/ additional features on top.
Big +1 for Karpenter on AKS
Hard +1
+1
Hard +1
+1
Yes, I agree.
Provisioning nodes that meet the requirements of the pods should save a lot of overprovisioned CPUs and keep the planet healthy.
It's always a balance between small/large node pool SKUs. (balanced workload should be better) Yes, multi-node pools, with Scale User node pools to 0 helps a bit. Assigning pods to nodes using node affinity can not solve the below user story.
example, SKU 16CPU
{[POD-4][POD-4][POD-4][POD-4]}{[POD-4][SPARE-12CPU]}
Waste of 12CPU.
cc: @palma21 / @brendandburns
+1
+1
I can't describe enough how badly this is needed.
+1
+1
+1
+1
+1
+1
+1
+1
๐
If AKS could use https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/spot-priority-mix which I understand not possible at the moment, would it not resolve many of the issues related to this?
Karpenter is an open-source node provisioning project built for Kubernetes. Its goal is to improve the efficiency and cost of running workloads on Kubernetes clusters. Karpenter works by:
Karpenter has many advantages over cluster autoscaler. One prerequisite would be that AKS can manage multiple instance types without defining multiple node pools.
Currently the only cloud provider which support Karpenter is AWS.
It would be awesome to have AKS support it.