Closed JohnPolansky closed 3 weeks ago
Not natively, but you can do something like this:
{{- if .Values.scaledown.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
name: scale
namespace: karpenter
data:
scale.sh: |
#!/bin/sh
NODEPOOL_PREFIX={{ .Values.scaledown.nodepoolPrefix }}
if [ "$1" = "down" ] ; then
echo "Patching nodepools to scale down to 0"
echo "Only considering pools named default for scale down"
for i in $(kubectl get nodepools --no-headers -o NAME | grep $NODEPOOL_PREFIX) ; do
kubectl patch $i --type merge --patch '{"spec": {"limits": {"cpu": "0"}}}'
done
kubectl delete nodeclaims --all &
echo "Waiting for claims to be deleted... Sleeping for 300 seconds"
sleep 300
echo "Removing straggler pods that block node deletions"
kubectl get pods --no-headers -A -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name | grep -v karpenter |\
while read -r line; do
kubectl delete pod --grace-period=0 --force -n $line
done
fi
if [ "$1" = "up" ] ; then
for i in $(kubectl get nodepools --no-headers -o NAME | grep $NODEPOOL_PREFIX) ; do
#TODO: figure out how to templatize the upper limit.
# its always 1000 in dev but in prod, its different
echo "Patching nodepools to scale up to 1000"
kubectl patch $i --type merge --patch '{"spec": {"limits": {"cpu": "{{ .Values.scaledown.originalNodepoolSize}}"}}}'
done
echo "Waiting for claims to be created... Sleeping for 300 seconds"
sleep 300
echo "Deleting all pods to force fair scheduling"
kubectl get pods --no-headers -A -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name | grep -v karpenter |\
while read -r line; do
kubectl delete pod --grace-period=0 --force -n $line
done
fi
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down-nodepools
namespace: karpenter
spec:
schedule: "{{ .Values.scaledown.cronjob.downSchedule }}"
{{- if .Values.scaledown.cronjob.timeZone }}
timeZone: "{{ .Values.scaledown.cronjob.timeZone}}"
{{- end }}
jobTemplate:
spec:
template:
spec:
priorityClassName: system-node-critical
serviceAccount: karpenter
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- key: "arch"
operator: "Equal"
value: "arm64"
effect: "NoSchedule"
volumes:
- name: config
configMap:
name: scale
defaultMode: 0777
containers:
- name: kubectl
image: {{ .Values.scaledown.cronjob.image }}
imagePullPolicy: IfNotPresent
volumeMounts:
- name: config
mountPath: "/scripts"
command:
- /bin/sh
- -c
- /scripts/scale.sh down
restartPolicy: OnFailure
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-up-nodepools
namespace: karpenter
spec:
schedule: "{{ .Values.scaledown.cronjob.upSchedule }}"
{{- if .Values.scaledown.cronjob.timeZone }}
timeZone: "{{ .Values.scaledown.cronjob.timeZone }}"
{{- end }}
jobTemplate:
spec:
template:
spec:
priorityClassName: system-node-critical
serviceAccount: karpenter
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- key: "arch"
operator: "Equal"
value: "arm64"
effect: "NoSchedule"
volumes:
- name: config
configMap:
name: scale
defaultMode: 0777
containers:
- name: kubectl
image: {{ .Values.scaledown.cronjob.image }}
imagePullPolicy: IfNotPresent
volumeMounts:
- name: config
mountPath: "/scripts"
command:
- /bin/sh
- -c
- /scripts/scale.sh up
restartPolicy: OnFailure
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: karpenter-pod-admin
labels:
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: karpenter-pod-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: karpenter-pod-admin
subjects:
- kind: ServiceAccount
name: karpenter
namespace: {{ .Release.Namespace }}
{{- end }}
Could you just delete and re-apply your NodePools? If all of your NodePools are deleting/deleted, Karpenter won't have a place to launch, and eventually all the NodeClaims would be garbage collected, drained, and deleted. You wouldn't have to scale down your deployments, and you could simply just spin up your NodePools when you're ready to re-enable compute provisioning.
@njtran Hrm, I did try and delete the nodepools in one attempt but i didn't think that would have enough impact on it's own but I didn't wait for long. I could give it a try and see how that works. To "re-enable" you said "spin up your nodepools" I assume you mean to kubectl apply -f nodepool.yaml
to re apply them.
@dcherniv Thanks for your detailed option. I need to take some time to sort out how it works and try it out but it looks promising.
Hey all, first I wanted to say thanks for the various ideas, they were great. in the end we went with a slightly similar solution that so far appears to be working very well for us. I can't say it will work for everyone because it does assume our setup and node-group usage for karpenter.
In our case we use an EKS node-group which has a AWS auto-scaling-group set to 5 nodes, this houses some of course core-services like Karpenter, coredns, etc. Karpenter is then responsible for turning up nodes for everything.
Our solution was this:
Then to resume we simply:
For us this solution appears to be working well. But I do want to stress 2 things.
Hope this helps someone and thanks!
Description
What problem are you trying to solve? We currently use Karpenter to manage around 15 clusters, however only 5 of them really need to run 24/7. The rest are various clusters used for development/testing. We are trying to find a way to reduce costs by stopping the clusters when not needed for example turn them off outside business hours. While it's possible to use the
--replicas 0
feature to stop pods and karpenter will eventually remove the nodes that are not required, this requires updating dozen's of deployments/sts and to scale it back up you have to specify the right number of replicas. What we are trying to find is a way to simply tell karpenter to "pause" and shutdown all nodes until we "resume" karpenter.As an example for AWS node-groups you could use a command like:
This causes the AWS nodegroup to shutdown all its nodes and all deployments/sts that should be running go into a PENDING state. Is there a feature of karpenter to do a similar function? If we remove all nodes then we are effectively remove 90% of the cluster cost so it's seems like it could be a valuable feature.
Having a scheduled feature for this would be great too, but even a manual command we could build a cronjob around would be handy.
How important is this feature to you?