GoogleCloudPlatform / k8s-stackdriver

Apache License 2.0
390 stars 211 forks source link

event-exporter prevents scaledown by cluster-autoscaler #359

Open matti opened 4 years ago

matti commented 4 years ago

with message no.scale.down.node.pod.has.local.storage -- apparently the hostPath ssl certs prevent this.

related: https://github.com/kubernetes/kubernetes/issues/69696

acondrat commented 3 years ago

Same issue with stackdriver-metadata-agent. Is there a way to change the deployment spec in order to annotate PODs with cluster-autoscaler.kubernetes.io/safe-to-evict: "true"?

It seems to get reset every time we upgrade the cluster.

matti commented 3 years ago

I'm currently tainting my nodes with googleStayAwayWithYourBrokenSuff=yes:NoExecute

guyguy333 commented 3 years ago

Is finally a controller the last option to " force auto inject" this annotation on event-exporter and stackdriver-metadata-agent when missing ? I could write one if there is no better option.

guyguy333 commented 3 years ago

I pushed a controller here for anyone interested in a fix waiting GKE fix : https://github.com/guyguy333/sorry-gke

kube-system PDBs with this controller results now in no.scale.down.node.no.place.to.move.pods

matti commented 3 years ago

Doesnt this just apply it once...? When the GKE pods are updated, then the annotation is lost?

On 3. Sep 2021, at 16.06, Guillaume Delbergue @.***> wrote:

 I pushed a controller here for anyone interested in a fix waiting GKE fix : https://github.com/guyguy333/sorry-gke

kube-system PDBs with this controller results now in no.scale.down.node.no.place.to.move.pods

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

guyguy333 commented 3 years ago

As it's a controller, it tracks changes every time Deployment is edited (manually, by GKE, ...). Every time an update is detected, it will check if annotations is present. If not, it will add it. Maybe I missed something and so let me know, but according to my tests, it should be robust to GKE updates.

matti commented 3 years ago

Okay I'm not familiar with kopf, but will it also update if that controller is momentarily not running? The code says "on update" which sounds like it will miss the event if its not running at the time of update.

Anyway good work.

On 3. Sep 2021, at 20.39, Guillaume Delbergue @.***> wrote:

 As it's a controller, it tracks changes every time Deployment is edited (manually, by GKE, ...). Every time an updated is detected, it will check if annotations is present. If not, it will add it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

guyguy333 commented 3 years ago

Good point. Indeed, a better solution would be to add replicas and implement leader election.

matti commented 3 years ago

Just put that (well almost) code in while true loop and keep the pod running.

On 3. Sep 2021, at 20.45, Guillaume Delbergue @.***> wrote:

 Good point. Indeed, a better solution would be to add replicas and implement leader election.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

pentago commented 2 years ago

Any progress on this one? It's annoying as hell..

MalibuKoKo commented 2 years ago

I'have the same issue.

I try:

---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: calming-down
  namespace: kube-system
spec:
  schedule: "00 19 * * 1-5"
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 0
  successfulJobsHistoryLimit: 0
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            imagePullPolicy: IfNotPresent
            env:
              - name: REPLICAS
                value: "0"
            command:
              - sh
              - "-c"
              - |
                /bin/bash <<'EOF'
                  set +e
                  set -x
                  for n in $(kubectl get ns -ojson | jq -r '.items[].metadata.name'); do
                    for d in $(kubectl get deployments.apps -n $n -ojson|jq -r '.items[].metadata.name'); do
                      kubectl patch deployments.apps -n $n $d --type='json' -p='[{"op": "add", "path": "/spec/template/metadata/annotations/cluster-autoscaler.kubernetes.io~1safe-to-evict", "value":"true"}]'
                    done
                    for app in $(kubectl get deployments.apps --show-labels -n $n|sed -n "/k8s-app=/p"|sed -r "s/.*k8s-app=([^(,|$)]+).*/\1/g"|sort -u); do
                      jq -n --arg app $app  --arg apiVersion policy/v1beta1 --arg namespace $n '{"apiVersion":$apiVersion,"kind":"PodDisruptionBudget","metadata":{"name":("pdb-"+$app),"namespace":$namespace},"spec":{"maxUnavailable":1,"selector":{"matchLabels":{"k8s-app":$app}}}}'|kubectl apply -f -
                    done
                    for app in $(kubectl get replicasets.apps --show-labels -n $n|sed -n "/k8s-app=/p"|sed -r "s/.*k8s-app=([^(,|$)]+).*/\1/g"|sort -u); do
                      jq -n --arg app $app  --arg apiVersion policy/v1beta1 --arg namespace $n '{"apiVersion":$apiVersion,"kind":"PodDisruptionBudget","metadata":{"name":("pdb-"+$app),"namespace":$namespace},"spec":{"maxUnavailable":1,"selector":{"matchLabels":{"k8s-app":$app}}}}'|kubectl apply -f -
                    done
                    for app in $(kubectl get statefulsets.apps --show-labels -n $n|sed -n "/k8s-app=/p"|sed -r "s/.*k8s-app=([^(,|$)]+).*/\1/g"|sort -u); do
                      jq -n --arg app $app  --arg apiVersion policy/v1beta1 --arg namespace $n '{"apiVersion":$apiVersion,"kind":"PodDisruptionBudget","metadata":{"name":("pdb-"+$app),"namespace":$namespace},"spec":{"maxUnavailable":1,"selector":{"matchLabels":{"k8s-app":$app}}}}'|kubectl apply -f -
                    done
                    for app in $(kubectl get daemonsets.apps --show-labels -n $n|sed -n "/k8s-app=/p"|sed -r "s/.*k8s-app=([^(,|$)]+).*/\1/g"|sort -u); do
                      jq -n --arg app $app  --arg apiVersion policy/v1beta1 --arg namespace $n '{"apiVersion":$apiVersion,"kind":"PodDisruptionBudget","metadata":{"name":("pdb-"+$app),"namespace":$namespace},"spec":{"maxUnavailable":1,"selector":{"matchLabels":{"k8s-app":$app}}}}'|kubectl apply -f -
                    done
                    kubectl scale deploy      -n $n --replicas=$REPLICAS --all
                    kubectl scale replicasets -n $n --replicas=$REPLICAS --all
                    kubectl scale statefulset -n $n --replicas=$REPLICAS --all
                    for d in $(kubectl get daemonset -n $n -ojson | jq -r '.items[].metadata.name'); do 
                      kubectl patch daemonset -n $n $d -p '{"spec": {"template": {"spec": {"nodeSelector": {"google-are-you-kidding-me": "true"}}}}}'
                    done
                    kubectl delete pod -n $n --field-selector=status.phase==Succeeded
                  done
                  exit 0
                EOF
          serviceAccountName: calming-down-sa

NOTHING WORKS !!!

I found a lot of information on this topic. I share them below

FAQ:

Issues:

Questions:

Guideline:

Kubernetes Controller:

Actually i decide to use OPTIMIZE_UTILIZATION autoscalingprofile and remove services: logging.googleapis.com/kubernetes and monitoring.googleapis.com/kubernetes.

I use cronjobs in order to scale down / up and that's works !

---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: calming-down
  namespace: kube-system
spec:
  schedule: "00 19 * * 1-5"
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 0
  successfulJobsHistoryLimit: 0
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            imagePullPolicy: IfNotPresent
            env:
              - name: REPLICAS
                value: "0"
            command:
              - sh
              - "-c"
              - |
                /bin/bash <<'EOF'
                  set +e
                  for n in $(kubectl get ns -ojson | jq -r '.items[].metadata.name'); do
                    kubectl scale deploy      -n $n --replicas=$REPLICAS --all
                    kubectl scale statefulset -n $n --replicas=$REPLICAS --all
                  done
                  exit 0
                EOF
          serviceAccountName: calming-down-sa

scale-down-k8s-cluster

I find it very regrettable that the quality of the implementation of these services is not there, the behavior of these goes against the proper functioning of the autoscaler.

Looking at the number of web pages that talk about this problem, we can only note that it wasted a lot of people's time.

rbabyuk-vs commented 2 years ago

what if you add 'PDB' for those apps? like here https://github.com/kubernetes/kubernetes/issues/69696#issuecomment-1018627861

P.S. yeah, I see this is different, you need to edit deployment

rbabyuk-vs commented 2 years ago

I am about to try https://github.com/redhat-cop/resource-locker-operator but it looks overhead for me. neither argocd can do patch nor terraform patch resource does not look convenient to me.

ledmonster commented 2 years ago

I have same issue.

even if I scale down pods with

kubectl scale deploy --replicas=3 --all

It will scale up to max in seconds. I only use CPU utilization and the value is like 2%/50%.

araminian commented 2 years ago

According to GKE Release Notes , by upgrading to GKE 1.22 this problem will be solved.

red8888 commented 2 years ago

I'm on 1.22.10-gke.600 and Im seeing this still: no.scale.down.node.pod.kube.system.unmovable for event exporter

pentago commented 2 years ago

does creating a PDB for event-exporter helps?

red8888 commented 2 years ago

@pentago we should not have to manage event-exporter at all. GKE should be creating it in a non broken state that doesn't require customer intervention

pentago commented 2 years ago

Exactly. The problem is that it does and its the reality as company wants to earn money of of you.

alexppg commented 2 years ago

Another solution would be to just be able to disable it and just create it in the good way. But when destroying it or changing it, it's updated...

kalhaarsavaj commented 1 month ago

Check here: https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-autoscaler-visibility#:~:text=%22no.scale.down.node.pod.kube,enable%20cluster%20autoscaler%20to%20move%20Pods%20in%20the%20kube%2Dsystem%20namespace.