Open mateusmuller opened 2 months ago
@mateusmuller Thanks for reporting. We are looking into this.
Hey @sud82, thanks for looking into this.
Sorry to be pushy, but would you have any updates about this issue?
Karpenter is our official tool for compute autoscaling (so as for pretty much every EKS user). This might be a go/no-go decision for us.
Thanks!
Hi @mateusmuller, We have looked into this. I am adding our findings
When a user populates the k8sNodeBlockList, AKO sets node affinity with the key kubernetes.io/hostname
to prevent pods from being scheduled on the nodes listed in the k8sNodeBlockList.
However, Karpenter restricts the use of the kubernetes.io/hostname
label as it may interfere with its internal scheduling mechanisms. You can find the relevant code reference here: karpenter/pkg/apis/v1beta1/labels.go at d5660acf4472db796d5f4fac58a147d14b320451 · kubernetes-sigs/karpenter
This issue does not occur when using the Kubernetes Cluster Autoscaler.
Recovery:
Once nodes are removed from the k8sNodeBlockList, the Karpenter autoscaler resumes normal operation and can scale the pods as expected.
Suggestion:
After migrating Aerospike pods from all nodes listed in the k8sNodeBlockList to other nodes, users should clear the k8sNodeBlockList from the spec.
I know, the suggestion seems a bit inconvenient but Karpenter and K8sNodeBlockList features have conflicting requirements. Therefore, our options are very limited.
Can you please explain your use case where you want to use k8sNodeBlockList along with auto-scaling, what kind of node storage you have, and so on?
Hello @sud82,
Can you please explain your use case where you want to use k8sNodeBlockList along with auto-scaling, what kind of node storage you have, and so on?
The usecase for k8sNodeBlockList
is the same as described on your doc.
List of Kubernetes nodes that are disallowed for scheduling the Aerospike pods. Pods are not scheduled on these nodes and migrated from these nodes if already present.
When I would use this? When I want to rotate the nodes. Upon rotation, Karpenter can pull the latest AMI from AWS with security patches and/or new features. AFAIK, that's a basic usage of Kubernetes ecossytem.
We use ebs-csi for /opt/aerospike
and local-static to expose EC2 instance store NVMe. Although, the underlying storage system doesn't seem relevant here, unless I misunderstood something.
To be clear how to move forward, there will be no changes from AKO perspective to be compatible with Karpenter, is that correct?
If yes, I would recommend removing Karpenter from your autoscaling doc here since it's not fully compatible with your features, and keep only Cluster Autoscaler.
Thanks for all the details @mateusmuller.
I wanted to say that at present there is no workaround or a quick fix for this. We got to know about it when you reported this. But, we will try to find a solution for this in the future. We definitely want to support Karpenter.
The main issue here is that we are using hostName
for ensuring that pods are not scheduled in the given k8s nodes (your requirement is also that). But, the Karpenter doesn't allow using hostName
. That's why this feature is not working.
We need to find out a new way to disallow pods in the given k8s nodes. That will take some time. We will also reach out to the Karpenter team to get their perspective. At present, it seems like a sweeping check in the Karpenter.
Folks,
I updated a static
AerospikeCluster
manifest with a bunch of EKS nodes onk8sNodeBlockList
. This triggered an update as expected:Although pod
aerospike-1-2
keeps there forever. This is the error message from Karpenter:Basically they don't allow
kubernetes.io/hostname
withNodeAffinity
. This is what happens with that flag:I found this issue under Karpenter repo with the same issue where they say the usage is wrong.
Can you please share your thoughts if this can be improved somehow? Thanks.