Closed whites11 closed 11 months ago
I1120 15:52:53.833531 1 static_autoscaler.go:235] Starting main loop
I1120 15:52:53.834103 1 filter_out_schedulable.go:65] Filtering out schedulables
I1120 15:52:53.834126 1 filter_out_schedulable.go:137] Filtered out 0 pods using hints
I1120 15:52:53.834137 1 filter_out_schedulable.go:176] 0 pods were kept as unschedulable based on caching
I1120 15:52:53.834145 1 filter_out_schedulable.go:177] 0 pods marked as unschedulable can be scheduled.
I1120 15:52:53.834154 1 filter_out_schedulable.go:87] No schedulable pods
I1120 15:52:53.834169 1 static_autoscaler.go:437] No unschedulable pods
I1120 15:52:53.834197 1 static_autoscaler.go:484] Calculating unneeded nodes
I1120 15:52:53.834211 1 pre_filtering_processor.go:57] Node ip-10-1-2-221.eu-central-1.compute.internal should not be processed by cluster autoscaler (no node group config)
I1120 15:52:53.834224 1 pre_filtering_processor.go:57] Node ip-10-1-1-57.eu-central-1.compute.internal should not be processed by cluster autoscaler (no node group config)
I1120 15:52:53.834233 1 pre_filtering_processor.go:57] Node ip-10-1-2-110.eu-central-1.compute.internal should not be processed by cluster autoscaler (no node group config)
I1120 15:52:53.834238 1 pre_filtering_processor.go:57] Node ip-10-1-2-102.eu-central-1.compute.internal should not be processed by cluster autoscaler (no node group config)
I1120 15:52:53.834243 1 pre_filtering_processor.go:57] Node ip-10-1-2-14.eu-central-1.compute.internal should not be processed by cluster autoscaler (no node group config)
I1120 15:52:53.834272 1 static_autoscaler.go:538] Scale down status: unneededOnly=false lastScaleUpTime=2023-11-20 14:42:22.632355361 +0000 UTC m=-3575.960199665 lastScaleDownDeleteTime=2023-11-20 14:42:22.632355361 +0000 UTC m=-3575.960199665 lastScaleDownFailTime=2023-11-20 14:42:22.632355361 +0000 UTC m=-3575.960199665 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
I1120 15:52:53.834309 1 static_autoscaler.go:551] Starting scale down
I1120 15:52:53.834369 1 scale_down.go:917] No candidates for scale down
Works, as you can see nodes are not considered for downscaling.
And after the upgrade it's scaling down
I1120 16:14:26.574106 1 cluster.go:167] node ip-10-1-2-110.eu-central-1.compute.internal may be removed
I1120 16:14:26.574114 1 cluster.go:139] ip-10-1-2-102.eu-central-1.compute.internal for removal
I1120 16:14:26.574206 1 cluster.go:150] node ip-10-1-2-102.eu-central-1.compute.internal cannot be removed: non-daemonset, non-mirrored, non-pdb-assigned kube-system pod present: hubble-relay-77f95d8cdf-pcpzw
I1120 16:14:26.574229 1 scale_down.go:612] 1 nodes found to be unremovable in simulation, will re-check them at 2023-11-20 16:19:26.365123138 +0000 UTC m=+2247.772568188
I1120 16:14:26.574280 1 static_autoscaler.go:527] ip-10-1-2-130.eu-central-1.compute.internal is unneeded since 2023-11-20 16:14:26.365123138 +0000 UTC m=+1947.772568188 duration 0s
I1120 16:14:26.574303 1 static_autoscaler.go:527] ip-10-1-2-110.eu-central-1.compute.internal is unneeded since 2023-11-20 16:14:26.365123138 +0000 UTC m=+1947.772568188 duration 0s
I1120 16:14:26.574331 1 static_autoscaler.go:538] Scale down status: unneededOnly=false lastScaleUpTime=2023-11-20 14:42:22.632355361 +0000 UTC m=-3575.960199665 lastScaleDownDeleteTime=2023-11-20 14:42:22.632355361 +0000 UTC m=-3575.960199665 lastScaleDownFailTime=2023-11-20 14:42:22.632355361 +0000 UTC m=-3575.960199665 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
I1120 16:14:26.574365 1 static_autoscaler.go:551] Starting scale down
I1120 16:14:26.574405 1 scale_down.go:828] ip-10-1-2-130.eu-central-1.compute.internal was unneeded for 0s
I1120 16:14:26.574422 1 scale_down.go:828] ip-10-1-2-110.eu-central-1.compute.internal was unneeded for 0s
I1120 16:14:26.574444 1 scale_down.go:917] No candidates for scale down
I1120 16:14:26.584675 1 delete.go:103] Successfully added DeletionCandidateTaint on node ip-10-1-2-130.eu-central-1.compute.internal
I1120 16:14:26.594633 1 delete.go:103] Successfully added DeletionCandidateTaint on node ip-10-1-2-110.eu-central-1.compute.internal
I1120 16:14:38.240438 1 reflector.go:536] k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:320: Watch close - *v1.DaemonSet total 40 items received
I1120 16:14:48.808921 1 reflector.go:536] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Node total 19 items received
I1120 16:14:54.236736 1 reflector.go:536] k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246: Watch close - *v1.Node total 20 items received
I1120 16:14:56.602084 1 static_autoscaler.go:235] Starting main loop
I1120 16:14:56.602556 1 taints.go:77] Removing autoscaler soft taint when creating template from node
I1120 16:14:56.624196 1 filter_out_schedulable.go:65] Filtering out schedulables
I1120 16:14:56.624213 1 filter_out_schedulable.go:137] Filtered out 0 pods using hints
I1120 16:14:56.624221 1 filter_out_schedulable.go:176] 0 pods were kept as unschedulable based on caching
I1120 16:14:56.624227 1 filter_out_schedulable.go:177] 0 pods marked as unschedulable can be scheduled.
I1120 16:14:56.624236 1 filter_out_schedulable.go:87] No schedulable pods
I1120 16:14:56.624256 1 static_autoscaler.go:437] No unschedulable pods
I1120 16:14:56.624285 1 static_autoscaler.go:484] Calculating unneeded nodes
I1120 16:14:56.624308 1 pre_filtering_processor.go:57] Node ip-10-1-1-57.eu-central-1.compute.internal should not be processed by cluster autoscaler (no node group config)
I1120 16:14:56.624345 1 scale_down.go:448] Node ip-10-1-2-110.eu-central-1.compute.internal - cpu utilization 0.446571
I1120 16:14:56.624386 1 scale_down.go:448] Node ip-10-1-2-130.eu-central-1.compute.internal - cpu utilization 0.386571
I1120 16:14:56.624400 1 scale_down.go:509] Scale-down calculation: ignoring 1 nodes unremovable in the last 5m0s
I1120 16:14:56.624440 1 cluster.go:139] ip-10-1-2-110.eu-central-1.compute.internal for removal
I1120 16:14:56.624719 1 cluster.go:322] Pod security-bundle/exception-recommender-866fd4dc6-nn97s can be moved to ip-10-1-2-102.eu-central-1.compute.internal
I1120 16:14:56.624785 1 cluster.go:322] Pod kube-system/vertical-pod-autoscaler-recommender-5488676cf9-p8zj7 can be moved to ip-10-1-2-130.eu-central-1.compute.internal
I1120 16:14:56.624834 1 cluster.go:322] Pod kube-system/external-dns-665f9b69df-h67mb can be moved to ip-10-1-2-102.eu-central-1.compute.internal
I1120 16:14:56.624876 1 cluster.go:322] Pod kube-system/vertical-pod-autoscaler-updater-7548f4f59d-4ls4k can be moved to ip-10-1-2-130.eu-central-1.compute.internal
I1120 16:14:56.624918 1 cluster.go:322] Pod kube-system/aws-pod-identity-webhook-app-76d7ccbf76-q7xr9 can be moved to ip-10-1-2-130.eu-central-1.compute.internal
I1120 16:14:56.624935 1 cluster.go:167] node ip-10-1-2-110.eu-central-1.compute.internal may be removed
I1120 16:14:56.624942 1 cluster.go:139] ip-10-1-2-130.eu-central-1.compute.internal for removal
I1120 16:14:56.625293 1 cluster.go:322] Pod kube-system/cert-manager-app-cainjector-6554fdb9b6-5vkjs can be moved to ip-10-1-2-102.eu-central-1.compute.internal
I1120 16:14:56.625340 1 cluster.go:322] Pod kube-system/cert-exporter-deployment-866d987dff-5wr67 can be moved to ip-10-1-2-110.eu-central-1.compute.internal
I1120 16:14:56.625378 1 cluster.go:322] Pod kube-system/vertical-pod-autoscaler-admission-controller-6987f7bcff-jjx2f can be moved to ip-10-1-2-102.eu-central-1.compute.internal
I1120 16:14:56.625426 1 cluster.go:322] Pod kube-system/prometheus-operator-app-kube-state-metrics-947999bf6-gwqkm can be moved to ip-10-1-2-110.eu-central-1.compute.internal
I1120 16:14:56.625552 1 cluster.go:322] Pod kube-system/cert-manager-app-webhook-78d4f6464d-b2k4w can be moved to ip-10-1-2-102.eu-central-1.compute.internal
I1120 16:14:56.625640 1 cluster.go:322] Pod kube-system/aws-pod-identity-webhook-app-76d7ccbf76-8p94z can be moved to ip-10-1-2-110.eu-central-1.compute.internal
I1120 16:14:56.625700 1 cluster.go:322] Pod kube-system/cert-manager-app-778c9d78f6-g4cvj can be moved to ip-10-1-1-57.eu-central-1.compute.internal
I1120 16:14:56.625760 1 cluster.go:322] Pod kube-system/coredns-workers-7c7f49dcf6-fgkv8 can be moved to ip-10-1-2-102.eu-central-1.compute.internal
I1120 16:14:56.625804 1 cluster.go:322] Pod kube-system/prometheus-operator-app-operator-f9fd58cdb-l9h8k can be moved to ip-10-1-2-102.eu-central-1.compute.internal
I1120 16:14:56.625854 1 cluster.go:322] Pod security-bundle/kyverno-policy-operator-5646d858cf-4vmff can be moved to ip-10-1-2-110.eu-central-1.compute.internal
I1120 16:14:56.625954 1 cluster.go:322] Pod kube-system/cilium-operator-85dd5884bb-hbr4p can be moved to ip-10-1-1-57.eu-central-1.compute.internal
I1120 16:14:56.626024 1 cluster.go:322] Pod kube-system/metrics-server-7f6744c45-hgzz6 can be moved to ip-10-1-2-110.eu-central-1.compute.internal
I1120 16:14:56.626067 1 cluster.go:167] node ip-10-1-2-130.eu-central-1.compute.internal may be removed
I1120 16:14:56.626106 1 static_autoscaler.go:527] ip-10-1-2-110.eu-central-1.compute.internal is unneeded since 2023-11-20 16:14:26.365123138 +0000 UTC m=+1947.772568188 duration 30.236939674s
I1120 16:14:56.626131 1 static_autoscaler.go:527] ip-10-1-2-130.eu-central-1.compute.internal is unneeded since 2023-11-20 16:14:26.365123138 +0000 UTC m=+1947.772568188 duration 30.236939674s
I1120 16:14:56.626152 1 static_autoscaler.go:538] Scale down status: unneededOnly=false lastScaleUpTime=2023-11-20 14:42:22.632355361 +0000 UTC m=-3575.960199665 lastScaleDownDeleteTime=2023-11-20 14:42:22.632355361 +0000 UTC m=-3575.960199665 lastScaleDownFailTime=2023-11-20 14:42:22.632355361 +0000 UTC m=-3575.960199665 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
I1120 16:14:56.626175 1 static_autoscaler.go:551] Starting scale down
I1120 16:14:56.626211 1 scale_down.go:828] ip-10-1-2-110.eu-central-1.compute.internal was unneeded for 30.236939674s
I1120 16:14:56.626225 1 scale_down.go:828] ip-10-1-2-130.eu-central-1.compute.internal was unneeded for 30.236939674s
I1120 16:14:56.626246 1 scale_down.go:917] No candidates for scale down
Towards: https://github.com/giantswarm/giantswarm/issues/28720
goal of this PR is disable autoscaler on a node pool while the asg is rolling nodes.
Cluster creation:
CF stack for a NP does not contain the
k8s.io/cluster-autoscaler/enabled
tag as desiredASG gets correctly created without the tag as well
AWS operator does another reconciliation loop and adds the tag:
Upgrade
The tag is removed fine and CF does not add it back.
after CF is correctly updated, the tag is added back. works nicely!
Checklist