Open Forestsoft-de opened 9 months ago
Probably you have reached the max cpu/memory limit by default with autoscaler. Add parameters in your manifest
- command:
- ./cluster-autoscaler
- --v=3
- ...
- --max-nodes-total=100
- --cores-total=0:36 # Total max 36 vCPUs into cluster
- --memory-total=0:48 # Total max 48G of Memory into cluster
- ...
Regards
Thank you for the quick answer but unfortunately not. If i define -
The message is still the same. 1 orchestrator.go:546] Pod enbitcon/enbitcon-shopware6-64cfd4bbc8-w9jh2 can't be scheduled on vmware-ca-k8s, predicate checking error: Too many pods, Insufficient cpu, Insufficient memory; predicateName=NodeResourcesFit; reasons: Too many pods, Insufficient cpu, Insufficient memory; debugInfo=
We have overall 13 nodes configured including 3 master nodes.
We also get the hint from the autoscaler:
20945 1 orchestrator.go:168] No expansion options
Is it required to define the nodepools on the command line to the cluster autoscaler?
I think that you reach the max limit pods:by default 110, it's a kubelet option.
Your cluster is build with kubeadm or with k3?
the cluster is build wirth rke2 kubeversion 1.28.3 The highest podcount on a node is currently 28.
rke2: but vmware autoscaler can pilot only cluster build with kubeadm or k3s.
So the problem is in the autoscaler process won't autoscale.
So I need the full log (verbose) of autoscaler saying start a new node.
Hey @Fred78290 it seems that it trys to scaledown first.
I1211 15:54:11.997326 1 orchestrator.go:546] Pod shopware6-5856fc9f6d-lrbqn can't be scheduled on vmware-ca-k8s, predicate checking error: Too many pods, Insufficient cpu, Insufficient memory; predicateName=NodeResourcesFit; reasons: Too many pods, Insufficient cpu, Insufficient memory; debugInfo=
I1211 15:54:11.997342 1 orchestrator.go:548] 5 other pods similar to enbitcon-shopware6-5856fc9f6d-lrbqn can't be scheduled on vmware-ca-k8s
I1211 15:54:11.997356 1 orchestrator.go:157] No pod can fit to vmware-ca-k8s
I1211 15:54:11.997371 1 orchestrator.go:168] No expansion options
I1211 15:54:11.997453 1 static_autoscaler.go:570] Calculating unneeded nodes
I1211 15:54:11.997461 1 externalgrpc_cloud_provider.go:70] Returning cached NodeGroups
I1211 15:54:11.997468 1 externalgrpc_node_group.go:64] Performing gRPC call NodeGroupTargetSize for node group vmware-ca-k8s
I1211 15:54:11.998217 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork07 - vsphere://423b23ba-96d4-bbf1-95a7-06471cd3c1ad
I1211 15:54:11.998658 1 pre_filtering_processor.go:57] Node enbitkubwork07 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:11.998677 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork09 - vsphere://423b189c-2027-b956-82e6-41fcf2e13861
I1211 15:54:11.999057 1 pre_filtering_processor.go:57] Node enbitkubwork09 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:11.999074 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork11 - vsphere://423b3ece-2fb3-37ca-5942-5a9b67088f43
I1211 15:54:11.999477 1 pre_filtering_processor.go:57] Node enbitkubwork11 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:11.999488 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkub03 - vsphere://423b7094-c4aa-915d-b331-209f9a2d9369
I1211 15:54:11.999918 1 pre_filtering_processor.go:57] Node enbitkub03 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:11.999930 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork05 - vsphere://423bb3c5-6f01-0271-d3f9-a581f49a63ad
I1211 15:54:12.000319 1 pre_filtering_processor.go:57] Node enbitkubwork05 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.000330 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork08 - vsphere://423b7da6-2be4-795e-fdd6-142abae7daa5
I1211 15:54:12.000702 1 pre_filtering_processor.go:57] Node enbitkubwork08 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.000713 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork14 - vsphere://423b92cd-257c-6db6-74f8-e3550a448ffa
I1211 15:54:12.001090 1 pre_filtering_processor.go:57] Node enbitkubwork14 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.001101 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkub02 -
I1211 15:54:12.001488 1 pre_filtering_processor.go:57] Node enbitkub02 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.001499 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork02 - vsphere://423b12bb-0c43-d8ee-4bf6-1d92a01869e3
I1211 15:54:12.001872 1 pre_filtering_processor.go:57] Node enbitkubwork02 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.001882 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork06 - vsphere://423b91a5-35ef-f751-21d1-1f1a23b616a9
I1211 15:54:12.002267 1 pre_filtering_processor.go:57] Node enbitkubwork06 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.002279 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork15 - vsphere://423b8956-8c64-3530-2081-e2835d9bc104
I1211 15:54:12.002683 1 pre_filtering_processor.go:57] Node enbitkubwork15 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.002693 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork03 -
I1211 15:54:12.003003 1 pre_filtering_processor.go:57] Node enbitkubwork03 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.003013 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork04 - vsphere://423b3e9e-d3d7-5c2d-7212-5ae3762891df
I1211 15:54:12.003392 1 pre_filtering_processor.go:57] Node enbitkubwork04 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.003408 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork10 - vsphere://423b12a6-6718-ce91-f421-4c0fdc92ccff
I1211 15:54:12.003907 1 pre_filtering_processor.go:57] Node enbitkubwork10 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.003919 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork12 - vsphere://423b5008-8a4a-67b8-ec5f-20d1f6194218
I1211 15:54:12.004292 1 pre_filtering_processor.go:57] Node enbitkubwork12 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.004303 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork13 - vsphere://423b0113-cc89-3546-7867-1ca164717e9e
I1211 15:54:12.004637 1 pre_filtering_processor.go:57] Node enbitkubwork13 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.004657 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkub01 -
I1211 15:54:12.005043 1 pre_filtering_processor.go:57] Node enbitkub01 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.005054 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkubwork01 - vsphere://423b1baa-7c36-13ed-9375-55ae6472aeed
I1211 15:54:12.005487 1 pre_filtering_processor.go:57] Node enbitkubwork01 should not be processed by cluster autoscaler (no node group config)
I1211 15:54:12.005588 1 static_autoscaler.go:617] Scale down status: lastScaleUpTime=2023-12-11 14:41:47.796948167 +0000 UTC m=-3577.006949908 lastScaleDownDeleteTime=2023-12-11 14:41:47.796948167 +0000 UTC m=-3577.006949908 lastScaleDownFailTime=2023-12-11 14:41:47.796948167 +0000 UTC m=-3577.006949908 scaleDownForbidden=false scaleDownInCooldown=false
I1211 15:54:12.005636 1 static_autoscaler.go:642] Starting scale down
I1211 15:54:12.005666 1 externalgrpc_cloud_provider.go:70] Returning cached NodeGroups
I1211 15:54:12.005673 1 externalgrpc_node_group.go:64] Performing gRPC call NodeGroupTargetSize for node group vmware-ca-k8s
I1211 15:54:12.005969 1 externalgrpc_cloud_provider.go:70] Returning cached NodeGroups
I1211 15:54:12.006112 1 round_trippers.go:466] curl -v -XGET -H "Accept: application/json, */*" -H "User-Agent: cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer <masked>" 'https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/cluster-autoscaler-status'
I1211 15:54:12.009808 1 round_trippers.go:553] GET https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/cluster-autoscaler-status 200 OK in 3 milliseconds
I1211 15:54:12.009826 1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 3 ms Duration 3 ms
I1211 15:54:12.009835 1 round_trippers.go:577] Response Headers:
I1211 15:54:12.009846 1 round_trippers.go:580] Audit-Id: 4b15973f-2f9e-4861-9021-b3cc3a2163f4
I1211 15:54:12.009854 1 round_trippers.go:580] Cache-Control: no-cache, private
I1211 15:54:12.009862 1 round_trippers.go:580] Content-Type: application/json
I1211 15:54:12.009869 1 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: 2dc46cef-3a30-4b0c-83a4-99191c0c6d6a
I1211 15:54:12.009877 1 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: 1ca66005-8f45-4ee8-b668-8e9d2970b9fd
I1211 15:54:12.009884 1 round_trippers.go:580] Content-Length: 2272
I1211 15:54:12.009891 1 round_trippers.go:580] Date: Mon, 11 Dec 2023 15:54:10 GMT
I1211 15:54:12.010088 1 request.go:1212] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"cluster-autoscaler-status","namespace":"kube-system","uid":"9f431900-dd10-4c26-8c7e-647941098a0a","resourceVersion":"16578552","creationTimestamp":"2023-12-11T15:41:46Z","annotations":{"cluster-autoscaler.kubernetes.io/last-updated":"2023-12-11 15:54:01.800158103 +0000 UTC"},"managedFields":[{"manager":"cluster-autoscaler","operation":"Update","apiVersion":"v1","time":"2023-12-11T15:54:00Z","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:status":{}},"f:metadata":{"f:annotations":{".":{},"f:cluster-autoscaler.kubernetes.io/last-updated":{}}}}}]},"data":{"status":"Cluster-autoscaler status at 2023-12-11 15:54:01.800158103 +0000 UTC:\nCluster-wide:\n Health: Healthy (ready=18 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=18 longUnregistered=0)\n LastProbeTime: 2023-12-11 15:54:01.504364602 +0000 UTC m=+756.700466504\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n ScaleUp: NoActivity (ready=18 registered=18)\n LastProbeTime: 2023-12-11 15:54:01.504364602 +0000 UTC m=+756.700466504\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n ScaleDown: NoCandidates (candidates=0)\n LastProbeTime: 2023-12-11 15:54:01.504364602 +0000 UTC m=+756.700466504\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n\nNodeGroups:\n Name: vmware-ca-k8s\n Health: Healthy (ready=0 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=0 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=100))\n LastProbeTime: 0001-01-01 00:00:00 +0000 UTC\n LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC\n ScaleUp: NoActivity (ready=0 cloudProviderTarget=0)\n LastProbeTime: 0001-01-01 00:00:00 +0000 UTC\n LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC\n ScaleDown: NoCandidates (candidates=0)\n LastProbeTime: 2023-12-11 15:54:01.504364602 +0000 UTC m=+756.700466504\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n\n"}}
I1211 15:54:12.010271 1 request.go:1212] Request Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"cluster-autoscaler-status","namespace":"kube-system","uid":"9f431900-dd10-4c26-8c7e-647941098a0a","resourceVersion":"16578552","creationTimestamp":"2023-12-11T15:41:46Z","annotations":{"cluster-autoscaler.kubernetes.io/last-updated":"2023-12-11 15:54:12.006039829 +0000 UTC"},"managedFields":[{"manager":"cluster-autoscaler","operation":"Update","apiVersion":"v1","time":"2023-12-11T15:54:00Z","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:status":{}},"f:metadata":{"f:annotations":{".":{},"f:cluster-autoscaler.kubernetes.io/last-updated":{}}}}}]},"data":{"status":"Cluster-autoscaler status at 2023-12-11 15:54:12.006039829 +0000 UTC:\nCluster-wide:\n Health: Healthy (ready=18 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=18 longUnregistered=0)\n LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n ScaleUp: NoActivity (ready=18 registered=18)\n LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n ScaleDown: NoCandidates (candidates=0)\n LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n\nNodeGroups:\n Name: vmware-ca-k8s\n Health: Healthy (ready=0 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=0 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=100))\n LastProbeTime: 0001-01-01 00:00:00 +0000 UTC\n LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC\n ScaleUp: NoActivity (ready=0 cloudProviderTarget=0)\n LastProbeTime: 0001-01-01 00:00:00 +0000 UTC\n LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC\n ScaleDown: NoCandidates (candidates=0)\n LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n\n"}}
I1211 15:54:12.010332 1 round_trippers.go:466] curl -v -XPUT -H "Accept: application/json, */*" -H "Content-Type: application/json" -H "User-Agent: cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer <masked>" 'https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/cluster-autoscaler-status'
I1211 15:54:12.015852 1 round_trippers.go:553] PUT https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/cluster-autoscaler-status 200 OK in 5 milliseconds
I1211 15:54:12.015886 1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 5 ms Duration 5 ms
I1211 15:54:12.015895 1 round_trippers.go:577] Response Headers:
I1211 15:54:12.015904 1 round_trippers.go:580] Audit-Id: f4f3ea2c-efac-4fab-bf8f-5c1f2299acf6
I1211 15:54:12.015912 1 round_trippers.go:580] Cache-Control: no-cache, private
I1211 15:54:12.015919 1 round_trippers.go:580] Content-Type: application/json
I1211 15:54:12.015927 1 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: 2dc46cef-3a30-4b0c-83a4-99191c0c6d6a
I1211 15:54:12.015934 1 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: 1ca66005-8f45-4ee8-b668-8e9d2970b9fd
I1211 15:54:12.015942 1 round_trippers.go:580] Content-Length: 2272
I1211 15:54:12.015950 1 round_trippers.go:580] Date: Mon, 11 Dec 2023 15:54:10 GMT
I1211 15:54:12.016363 1 request.go:1212] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"cluster-autoscaler-status","namespace":"kube-system","uid":"9f431900-dd10-4c26-8c7e-647941098a0a","resourceVersion":"16578643","creationTimestamp":"2023-12-11T15:41:46Z","annotations":{"cluster-autoscaler.kubernetes.io/last-updated":"2023-12-11 15:54:12.006039829 +0000 UTC"},"managedFields":[{"manager":"cluster-autoscaler","operation":"Update","apiVersion":"v1","time":"2023-12-11T15:54:10Z","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:status":{}},"f:metadata":{"f:annotations":{".":{},"f:cluster-autoscaler.kubernetes.io/last-updated":{}}}}}]},"data":{"status":"Cluster-autoscaler status at 2023-12-11 15:54:12.006039829 +0000 UTC:\nCluster-wide:\n Health: Healthy (ready=18 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=18 longUnregistered=0)\n LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n ScaleUp: NoActivity (ready=18 registered=18)\n LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n ScaleDown: NoCandidates (candidates=0)\n LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n\nNodeGroups:\n Name: vmware-ca-k8s\n Health: Healthy (ready=0 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=0 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=100))\n LastProbeTime: 0001-01-01 00:00:00 +0000 UTC\n LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC\n ScaleUp: NoActivity (ready=0 cloudProviderTarget=0)\n LastProbeTime: 0001-01-01 00:00:00 +0000 UTC\n LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC\n ScaleDown: NoCandidates (candidates=0)\n LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571\n LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592\n\n"}}
I1211 15:54:12.016579 1 status.go:127] Successfully wrote status configmap with body "Cluster-autoscaler status at 2023-12-11 15:54:12.006039829 +0000 UTC:
Cluster-wide:
Health: Healthy (ready=18 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=18 longUnregistered=0)
LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571
LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592
ScaleUp: NoActivity (ready=18 registered=18)
LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571
LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571
LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592
NodeGroups:
Name: vmware-ca-k8s
Health: Healthy (ready=0 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=0 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=100))
LastProbeTime: 0001-01-01 00:00:00 +0000 UTC
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
ScaleUp: NoActivity (ready=0 cloudProviderTarget=0)
LastProbeTime: 0001-01-01 00:00:00 +0000 UTC
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2023-12-11 15:54:11.807516668 +0000 UTC m=+767.003618571
LastTransitionTime: 2023-12-11 15:41:57.89611969 +0000 UTC m=+33.092221592
These messages are repeated constantly. No scaleup in the logline found.
I saw this: (no node group config)
I1211 15:54:12.004657 1 externalgrpc_cloud_provider.go:115] Performing gRPC call NodeGroupForNode for node enbitkub01 -
I1211 15:54:12.005043 1 pre_filtering_processor.go:57] Node enbitkub01 should not be processed by cluster autoscaler (no node group config)
It means that nodes aren't attached to the nodegroup handled by vwmware scaler.
Could you post the output of kubectl get no --out yaml
to see annotations/labels attached on on node.
for your information autoscaler try always to scale down and flood log.
@Forestsoft-de
About your deployment i just readed that rke2 use k3s, so it's means that you written a derived method to create the cluster. Do you expect to share it as open source?
regards
Hey @Fred78290 its ok enbitkub01 shouldnt be managed by autoscaler. its one of our manual provisioned node and a master node also :) I used ansible to bootstrap the cluster. Its already opensource :)
Hey @Fred78290 its ok enbitkub01 shouldnt be managed by autoscaler. its one of our manual provisioned node and a master node also :) I used ansible to bootstrap the cluster. Its already opensource :)
I understand but vmware autscaler need some requirements, such as image to clone conform to k3s or kubeadm, some annotations in the cluster to find managed node, etc.
Beacuse you deployed the cluster with another method i need to play the inspector to help you.
If you share the ansible file i can try to reproduce the cluster creation process.
I'm not fan of ansible too but i can extract the essential.
we build the cluster based on this ansible role: https://github.com/lablabs/ansible-role-rke2 The vm was manually deployed with help of uuids.
Thx. But The reason for any scale up is that your cluster created by ansible, nodes are not tagged to be member of node group expected by autoscaler. So when the autoscaler need to upscale the cluster it need to know which nodegroup own nodes. The vmware autoscaler can't answer to the request becuase the cluster is not created with required annotations.
As explained in the readme the cluster creation from vanilla kubeadm or k3s is done as example with autoscaled-masterkube-vmware
By your request , I'll integrated rke2 support nearbly.
But The reason for any scale up is that your cluster created by ansible, nodes are not tagged to be member of node group expected by autoscaler.
That shouldnt be a problem because autoscaler shouldnt care about the other nodes. It should scale only my newly defined nodegroup.
The vmware autoscaler can't answer to the request becuase the cluster is not created with required annotations.
Where can i find the required annotations? The masterkube project is very big :)
@Forestsoft-de
Yo, I have integrated rke2 support into vmware-autoscaler. A docker image: v1.27.11-rke2 is available to test.
In you case I have updated README with some informations to integrate foreign cluster and described mandatory nodes annotations for integrations.
Remark the config file has some changes specific to kubernetes deployment method.
in your case, the most important is to add mandatory annotations to your cluster and build an image ready to use by vmware-autoscaler.
Thank you for the support. I appriciate that.
With the new config and new image we have a new error message from cluster autoscaler:
Error on gRPC call Refresh: rpc error: code = Unimplemented desc = unknown service clusterautoscaler.cloudprovider.v1.externalgrpc.CloudProvider ││ E1218 16:10:08.673691 1 static_autoscaler.go:326] Failed to refresh cloud provider config: rpc error: code = Unimplemented desc = unknown service clusterautoscaler.cloudprovider.v1.externalgrpc.CloudProvider
Argh, is a gRPC error. I have updated gRPC version in vmware-autoscaler and probably it's a compatibility error. I'll investigate.
Can you repost the config & deployment updated.
only changed the distribution and the rke2 key instead of kubeadm.
@Forestsoft-de
Hello, I have tested my docker image 1.27.11-rke2 with vanilla cluster-autoscaler v1.28.2
It works perfectly. So i provide here my config to see the differences.
Remarks: You use template: true & link: true, probably VM clone will fail.
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/containerID: b2dc7a298dc5d69efa4b91cc58f2e2cad27cbf0bf815e62b509e78eea8aa4891
cni.projectcalico.org/podIP: 10.42.0.2/32
cni.projectcalico.org/podIPs: 10.42.0.2/32
creationTimestamp: "2023-12-19T10:57:29Z"
generateName: cluster-autoscaler-7d45bb6f4d-
labels:
k8s-app: cluster-autoscaler
pod-template-hash: 7d45bb6f4d
name: cluster-autoscaler-7d45bb6f4d-x7xtr
namespace: kube-system
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: cluster-autoscaler-7d45bb6f4d
uid: 751cfd1d-c3c2-4893-98f3-b75f3d085f49
resourceVersion: "1311"
uid: 962c6f0b-f702-44fb-b904-94f4d0f60c0d
spec:
containers:
- command:
- /usr/local/bin/vsphere-autoscaler
- --no-use-external-etcd
- --use-vanilla-grpc
- --use-controller-manager
- --src-etcd-ssl-dir=/etc/etcd/ssl
- --dst-etcd-ssl-dir=/etc/etcd/ssl
- --config=/etc/cluster/kubernetes-vmware-autoscaler.json
- --save=/var/run/cluster-autoscaler/vmware-autoscaler-state.json
- --log-level=info
image: fred78290/vsphere-autoscaler:v1.27.11-rke2
imagePullPolicy: Always
name: vsphere-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/cluster-autoscaler
name: cluster-socket
- mountPath: /etc/cluster
name: config-cluster-autoscaler
- mountPath: /etc/ssh
name: autoscaler-ssh-keys
- mountPath: /etc/etcd/ssl
name: etcd-ssl
- mountPath: /etc/kubernetes/pki
name: kubernetes-pki
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-wk5q8
readOnly: true
- command:
- ./cluster-autoscaler
- --v=1
- --stderrthreshold=info
- --cloud-provider=externalgrpc
- --cloud-config=/etc/cluster/grpc-config.yaml
- --nodes=0:9:true/vmware-dev-rke2
- --max-nodes-total=9
- --cores-total=0:16
- --memory-total=0:48
- --node-autoprovisioning-enabled
- --max-autoprovisioned-node-group-count=1
- --scale-down-enabled=true
- --scale-down-delay-after-add=1m
- --scale-down-delay-after-delete=1m
- --scale-down-delay-after-failure=1m
- --scale-down-unneeded-time=1m
- --scale-down-unready-time=1m
- --unremovable-node-recheck-timeout=1m
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.2
imagePullPolicy: Always
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/cluster-autoscaler
name: cluster-socket
- mountPath: /etc/ssl/certs/ca-certificates.crt
name: ssl-certs
readOnly: true
- mountPath: /etc/cluster
name: config-cluster-autoscaler
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-wk5q8
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
initContainers:
- command:
- /bin/sh
- -c
- rm -f /var/run/cluster-autoscaler/vmware.sock
image: busybox
imagePullPolicy: Always
name: cluster-autoscaler-init
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/cluster-autoscaler
name: cluster-socket
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-wk5q8
readOnly: true
nodeName: vmware-dev-rke2-masterkube
nodeSelector:
master: "true"
preemptionPolicy: PreemptLowerPriority
priority: 2000000000
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65532
fsGroupChangePolicy: OnRootMismatch
runAsGroup: 65532
runAsUser: 65532
serviceAccount: cluster-autoscaler
serviceAccountName: cluster-autoscaler
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: cluster-socket
- configMap:
defaultMode: 420
name: config-cluster-autoscaler
name: config-cluster-autoscaler
- hostPath:
path: /etc/ssl/certs/ca-certificates.crt
type: ""
name: ssl-certs
- name: autoscaler-ssh-keys
secret:
defaultMode: 416
secretName: autoscaler-ssh-keys
- name: etcd-ssl
secret:
defaultMode: 416
secretName: etcd-ssl
- configMap:
defaultMode: 420
name: kubernetes-pki
name: kubernetes-pki
- name: kube-api-access-wk5q8
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-12-19T10:57:57Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-12-19T10:58:04Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-12-19T10:58:04Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-12-19T10:57:42Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://1add3ff9e6313424c475d5a8d6a4aec74e3d756e1d976eb1d5a2942d57c2a9a0
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.2
imageID: registry.k8s.io/autoscaling/cluster-autoscaler@sha256:83de5778b666329b7fff80ec233e4a986d859792f0f7e3ae4bb2e3329cd2ff03
lastState: {}
name: cluster-autoscaler
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-12-19T10:58:03Z"
- containerID: containerd://45a461747e6905f22dded40c3e7688ffae7f368c5a9615607db6cb0ad5df667e
image: docker.io/fred78290/vsphere-autoscaler:v1.27.11-rke2
imageID: docker.io/fred78290/vsphere-autoscaler@sha256:2ae98fe3799c065481bfc9f551f64ab4fa289325c13c67cb83e432e4fe13d069
lastState: {}
name: vsphere-autoscaler
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-12-19T10:58:00Z"
hostIP: 10.0.0.120
initContainerStatuses:
- containerID: containerd://d6a1a2b4f7305a267e32324c529830e30f90160783ef1f2f8ff427ab7272fec0
image: docker.io/library/busybox:latest
imageID: docker.io/library/busybox@sha256:5c63a9b46e7139d2d5841462859edcbbf57f238af891b6096578e5894cfe5ae2
lastState: {}
name: cluster-autoscaler-init
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://d6a1a2b4f7305a267e32324c529830e30f90160783ef1f2f8ff427ab7272fec0
exitCode: 0
finishedAt: "2023-12-19T10:57:56Z"
reason: Completed
startedAt: "2023-12-19T10:57:56Z"
phase: Running
podIP: 10.42.0.2
podIPs:
- ip: 10.42.0.2
qosClass: Burstable
startTime: "2023-12-19T10:57:42Z"
/etc/cluster/grpc-config.yaml
address: unix:/var/run/cluster-autoscaler/vmware.sock
/etc/cluster/kubernetes-vmware-autoscaler.json
{
"use-external-etcd": false,
"src-etcd-ssl-dir": "/etc/etcd/ssl",
"dst-etcd-ssl-dir": "/etc/kubernetes/pki/etcd",
"distribution": "rke2",
"kubernetes-pki-srcdir": "/etc/kubernetes/pki",
"kubernetes-pki-dstdir": "/etc/kubernetes/pki",
"network": "unix",
"listen": "/var/run/cluster-autoscaler/vmware.sock",
"secret": "vmware",
"minNode": 0,
"maxNode": 9,
"maxPods": 110,
"maxNode-per-cycle": 2,
"node-name-prefix": "autoscaled",
"managed-name-prefix": "managed",
"controlplane-name-prefix": "master",
"nodePrice": 0.0,
"podPrice": 0.0,
"image": "jammy-kubernetes-rke2-v1.27.8+rke2r1-amd64",
"optionals": {
"pricing": false,
"getAvailableMachineTypes": false,
"newNodeGroup": false,
"templateNodeInfo": false,
"createNodeGroup": false,
"deleteNodeGroup": false
},
"rke2": {
"address": "192.168.1.120:9345",
"token": "xyz",
"ca": "sha256:xyz",
"extras-args": [
"--ignore-preflight-errors=All"
],
"datastore-endpoint": "",
"extras-commands": []
},
"default-machine": "large",
"machines": {
"tiny": {
"memsize": 2048,
"vcpus": 2,
"disksize": 10240
},
"small": {
"memsize": 4096,
"vcpus": 2,
"disksize": 20480
},
"medium": {
"memsize": 4096,
"vcpus": 4,
"disksize": 20480
},
"large": {
"memsize": 8192,
"vcpus": 4,
"disksize": 51200
},
"xlarge": {
"memsize": 16384,
"vcpus": 4,
"disksize": 102400
},
"2xlarge": {
"memsize": 16384,
"vcpus": 8,
"disksize": 102400
},
"4xlarge": {
"memsize": 32768,
"vcpus": 8,
"disksize": 102400
}
},
"node-labels": [
"topology.kubernetes.io/region=home",
"topology.kubernetes.io/zone=office",
"topology.csi.vmware.com/k8s-region=home",
"topology.csi.vmware.com/k8s-zone=office"
],
"cloud-init": {
"package_update": false,
"package_upgrade": false,
"runcmd": [
"echo 1 > /sys/block/sda/device/rescan",
"growpart /dev/sda 1",
"resize2fs /dev/sda1",
"echo '192.168.1.120 vmware-dev-rke2-masterkube vmware-dev-rke2-masterkube.aldunelabs.fr' >> /etc/hosts"
]
},
"ssh-infos": {
"wait-ssh-ready-seconds": 180,
"user": "kubernetes",
"ssh-private-key": "/etc/ssh/id_rsa"
},
"vmware": {
"vmware-dev-rke2": {
"url": "https://administrator@aldunelabs.com:redacted@10.0.0.61/sdk",
"uid": "administrator@aldunelabs.com",
"password": "redacted",
"insecure": true,
"dc": "DC01",
"datastore": "datastore",
"resource-pool": "ALDUNE/Resources/FR",
"vmFolder": "HOME",
"timeout": 300,
"template-name": "jammy-kubernetes-rke2-v1.27.8+rke2r1-amd64",
"template": false,
"linked": false,
"allow-upgrade": false,
"customization": "",
"network": {
"domain": "acme.com",
"dns": {
"search": [
"acme.com"
],
"nameserver": [
"10.0.0.5"
]
},
"interfaces": [
{
"primary": false,
"exists": true,
"network": "VM Network",
"adapter": "vmxnet3",
"mac-address": "generate",
"nic": "eth0",
"dhcp": true,
"use-dhcp-routes": true,
"routes": []
},
{
"primary": true,
"exists": true,
"network": "VM Private",
"adapter": "vmxnet3",
"mac-address": "generate",
"nic": "eth1",
"dhcp": true,
"use-dhcp-routes": false,
"address": "192.168.1.124",
"gateway": "10.0.0.1",
"netmask": "255.255.255.0",
"routes": []
}
]
}
}
}
}
@Forestsoft-de I found the error, you miss this argument for vmware-autoscaler: - --use-vanilla-grpc
By default it use my custom grpc implementation.
Regards
Hey @Fred78290
it looks better now but the vmware autoscaler has trouble to create the nodegroup:
time="2023-12-19T12:28:41Z" level=info msg="Auto provision for nodegroup:vmware-ca-k8s, minSize:1, maxSize:100"
time="2023-12-19T12:28:41Z" level=info msg="New node group, ID:vmware-ca-k8s minSize:1, maxSize:100, machineType:large, node labels:map[], map[]"
time="2023-12-19T12:28:41Z" level=info msg="Create node group, ID:vmware-ca-k8s"
time="2023-12-19T12:28:41Z" level=debug msg="AutoScalerServerNodeGroup::addNodes, nodeGroupID:vmware-ca-k8s"
time="2023-12-19T12:28:41Z" level=debug msg="AutoScalerServerNodeGroup::addNodes, nodeGroupID:vmware-ca-k8s -> g.status != nodegroupCreated"
time="2023-12-19T12:28:41Z" level=error msg="node group vmware-ca-k8s not found"
time="2023-12-19T12:28:41Z" level=error msg="warning can't autoprovision node group, reason: node group vmware-ca-k8s not found"
time="2023-12-19T12:28:41Z" level=fatal msg="failed to create externalgrpc: node group vmware-ca-k8s not found"
whats wrong with my setup?
Current config kubernetes-vmware-autoscaler.json::
{
"distribution": "rke2",
"use-external-etcd": false,
"use-vanilla-grpc": true,
"use-controller-manager": true,
"src-etcd-ssl-dir": "/etc/etcd/ssl",
"dst-etcd-ssl-dir": "/etc/kubernetes/pki/etcd",
"kubernetes-pki-srcdir": "/etc/kubernetes/pki",
"kubernetes-pki-dstdir": "/etc/kubernetes/pki",
"network": "unix",
"listen": "/var/run/cluster-autoscaler/vmware.sock",
"secret": "vmware",
"minNode": {{ .Values.vsphere.minNodes }},
"maxNode": {{ .Values.vsphere.maxNodes }},
"maxNode-per-cycle": 1,
"node-name-prefix": "autoscaled",
"managed-name-prefix": "enbitkubwork",
"controlplane-name-prefix": "enbitkub0",
"nodePrice": 0,
"podPrice": 0,
"image": "enbitconkubworker",
"optionals": {
"pricing": false,
"getAvailableMachineTypes": false,
"newNodeGroup": false,
"templateNodeInfo": false,
"createNodeGroup": false,
"deleteNodeGroup": false
},
"rke2": {
"address": "{{.Values.kubeadm.address}}",
"token": "{{.Values.kubeadm.token}}",
"datastore-endpoint": "",
"extras-args": [
"--ignore-preflight-errors=All"
]
},
"default-machine": "large",
"machines": {
"tiny": {
"memsize": 2048,
"vcpus": 2,
"disksize": 10240
},
"small": {
"memsize": 4096,
"vcpus": 2,
"disksize": 20480
},
"medium": {
"memsize": 4096,
"vcpus": 4,
"disksize": 20480
},
"large": {
"memsize": 8192,
"vcpus": 4,
"disksize": 51200
},
"xlarge": {
"memsize": 16384,
"vcpus": 4,
"disksize": 102400
},
"2xlarge": {
"memsize": 16384,
"vcpus": 8,
"disksize": 102400
},
"4xlarge": {
"memsize": 32768,
"vcpus": 8,
"disksize": 102400
}
},
"node-labels": [
"topology.kubernetes.io/region=k8s-region",
"topology.kubernetes.io/zone=k8s-zone",
"topology.csi.vmware.com/k8s-region=k8s-region",
"topology.csi.vmware.com/k8s-zone=k8s-zone"
],
"cloud-init": {
"package_update": false,
"package_upgrade": false,
"runcmd": [
"/home/enbitconkub/nodesetup.sh"
]
},
"ssh-infos": {
"user": "root",
"ssh-private-key": "/etc/ssh/id_rsa"
},
"autoscaling-options": {
"scaleDownUtilizationThreshold": 0.5,
"scaleDownGpuUtilizationThreshold": 0.5
},
"vmware": {
"vmware-ca-k8s": {
"url": "https://{{.Values.vsphere.user}}:{{.Values.vsphere.password}}@{{.Values.vsphere.host}}/sdk",
"uid": "{{.Values.vsphere.user}}@{{.Values.vsphere.host}}",
"password": "{{.Values.vsphere.password}}",
"insecure": true,
"dc": "{{.Values.vsphere.datacenter}}",
"datastore": "{{.Values.vsphere.datastore}}",
"resource-pool": "{{.Values.vsphere.resourcePool}}",
"vmFolder": "{{.Values.vsphere.vmFolder}}",
"timeout": 300,
"template-name": "{{.Values.worker.image}}",
"template": false,
"linked": false,
"customization": "",
"network": {
"domain": "{{.Values.vsphere.domain}}",
"dns": {
"search": [
"{{.Values.vsphere.domain}}"
],
"nameserver": [
"{{.Values.vsphere.nameserver}}"
]
},
"interfaces": [
{
"primary": true,
"exists": true,
"network": "DMZ",
"adapter": "vmxnet3",
"mac-address": "generate",
"nic": "eth0",
"dhcp": true,
"use-dhcp-routes": true,
"routes": []
}
]
}
}
}
}
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "17"
meta.helm.sh/release-name: cluster-autoscaler
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2023-12-06T13:45:35Z"
generation: 17
labels:
app.kubernetes.io/instance: cluster-autoscaler
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: vsphere-autoscaler
app.kubernetes.io/version: 1.16.0
helm.sh/chart: vsphere-autoscaler-0.1.0
name: cluster-autoscaler-vsphere-autoscaler
namespace: kube-system
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: cluster-autoscaler
app.kubernetes.io/name: vsphere-autoscaler
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: cluster-autoscaler
app.kubernetes.io/name: vsphere-autoscaler
spec:
containers:
- command:
- /usr/local/bin/vsphere-autoscaler
- --no-use-external-etcd
- --src-etcd-ssl-dir=/etc/etcd/ssl
- --dst-etcd-ssl-dir=/etc/etcd/ssl
- --config=/etc/cluster/kubernetes-vmware-autoscaler.json
- --save=/var/run/cluster-autoscaler/vmware-autoscaler-state.json
- --log-level=debug
image: fred78290/vsphere-autoscaler:v1.27.1
imagePullPolicy: IfNotPresent
name: vsphere-autoscaler
resources: {}
securityContext: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/cluster-autoscaler
name: cluster-socket
- mountPath: /etc/cluster
name: config-cluster-autoscaler
- mountPath: /etc/ssh
name: autoscaler-ssh-keys
- mountPath: /etc/etcd/ssl
name: etcd-ssl
- mountPath: /etc/kubernetes/pki
name: kubernetes-pki
- command:
- ./cluster-autoscaler
- --v=3
- --stderrthreshold=info
- --cloud-provider=externalgrpc
- --cloud-config=/etc/cluster/cloud-config
- --max-nodes-total=100
- --node-autoprovisioning-enabled
- --max-autoprovisioned-node-group-count=1
- --scale-down-enabled=true
- --scale-down-delay-after-add=1m
- --scale-down-delay-after-delete=1m
- --scale-down-delay-after-failure=1m
- --scale-down-unneeded-time=1m
- --scale-down-unready-time=1m
- --unremovable-node-recheck-timeout=1m
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.2
imagePullPolicy: IfNotPresent
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/cluster-autoscaler
name: cluster-socket
- mountPath: /etc/ssl/certs/ca-certificates.crt
name: ssl-certs
readOnly: true
- mountPath: /etc/cluster
name: config-cluster-autoscaler
readOnly: true
dnsPolicy: ClusterFirst
initContainers:
- command:
- /bin/sh
- -c
- rm -f /var/run/cluster-autoscaler/vmware.sock
image: busybox
imagePullPolicy: Always
name: cluster-autoscaler-init
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/cluster-autoscaler
name: cluster-socket
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: cluster-autoscaler
serviceAccountName: cluster-autoscaler
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: cluster-socket
- name: config-cluster-autoscaler
secret:
defaultMode: 420
secretName: config-cluster-autoscaler
- hostPath:
path: /etc/ssl/certs/ca-certificates.crt
type: ""
name: ssl-certs
- name: autoscaler-ssh-keys
secret:
defaultMode: 420
secretName: autoscaler-ssh-keys
- name: etcd-ssl
secret:
defaultMode: 384
secretName: etcd-ssl
- configMap:
defaultMode: 420
name: kubernetes-pki
name: kubernetes-pki
@Forestsoft-de I think that your cluster have missing annotations cluster.autoscaler.nodegroup/XX.
My cluster with annotations.
apiVersion: v1
kind: Node
metadata:
annotations:
alpha.kubernetes.io/provided-node-ip: 10.0.0.120
cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
cluster.autoscaler.nodegroup/autoprovision: "false"
cluster.autoscaler.nodegroup/instance-id: 42379ee0-8bf6-8176-2f9f-98662124a6e7
cluster.autoscaler.nodegroup/name: vmware-dev-rke2
cluster.autoscaler.nodegroup/node-index: "0"
csi.volume.kubernetes.io/nodeid: '{"csi.vsphere.vmware.com":"42379ee0-8bf6-8176-2f9f-98662124a6e7"}'
etcd.rke2.cattle.io/local-snapshots-timestamp: "2023-12-19T14:07:17+01:00"
etcd.rke2.cattle.io/node-address: 10.0.0.120
etcd.rke2.cattle.io/node-name: vmware-dev-rke2-masterkube-453ab16f
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"8e:bd:89:74:3f:fa"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 10.0.0.120
node.alpha.kubernetes.io/ttl: "0"
rke2.io/encryption-config-hash: start-3fae24f157ea6fe411a6c24a79f8d59b4418197007f5f87c90c50fa5d52110b5
rke2.io/node-args: '["server","--kubelet-arg","cloud-provider=external","--kubelet-arg","fail-swap-on=false","--kubelet-arg","provider-id=vsphere://42379ee0-8bf6-8176-2f9f-98662124a6e7","--kubelet-arg","max-pods=110","--node-name","vmware-dev-rke2-masterkube","--advertise-address","192.168.1.120","--disable-cloud-controller","true","--cloud-provider-name","external","--disable","rke2-ingress-nginx","--disable","rke2-metrics-server","--disable","servicelb","--tls-san","192.168.1.120","--tls-san","vmware-dev-rke2-masterkube.aldunelabs.fr","--tls-san","vmware-dev-rke2-masterkube","--tls-san","192.168.1.121","--tls-san","vmware-dev-rke2-master-02.aldunelabs.fr","--tls-san","192.168.1.122","--tls-san","vmware-dev-rke2-master-03.aldunelabs.fr"]'
rke2.io/node-config-hash: O2OSXOLPRMI3CUOLPYQHGMYLW6QAYMWLREAAJDZEXSGVIOU4GVOQ====
rke2.io/node-env: '{}'
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-12-19T13:07:12Z"
finalizers:
- wrangler.cattle.io/node
- wrangler.cattle.io/managed-etcd-controller
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: vsphere-vm.cpu-2.mem-4gb.os-ubuntu
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: home
failure-domain.beta.kubernetes.io/zone: office
kubernetes.io/arch: amd64
kubernetes.io/hostname: vmware-dev-rke2-masterkube
kubernetes.io/os: linux
master: "true"
node-role.kubernetes.io/control-plane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/master: "true"
node.kubernetes.io/instance-type: vsphere-vm.cpu-2.mem-4gb.os-ubuntu
topology.csi.vmware.com/k8s-region: home
topology.csi.vmware.com/k8s-zone: office
topology.kubernetes.io/region: home
topology.kubernetes.io/zone: office
name: vmware-dev-rke2-masterkube
resourceVersion: "1518"
uid: 55f2c6c7-429c-4b2d-9d25-582acbebb3ce
spec:
apiVersion: v1
kind: Node
metadata:
annotations:
alpha.kubernetes.io/provided-node-ip: 10.0.0.123
cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
cluster.autoscaler.nodegroup/autoprovision: "false"
cluster.autoscaler.nodegroup/instance-id: 42379773-ee0a-0949-8412-df21612777e0
cluster.autoscaler.nodegroup/managed: "false"
cluster.autoscaler.nodegroup/name: vmware-dev-rke2
cluster.autoscaler.nodegroup/node-index: "1"
csi.volume.kubernetes.io/nodeid: '{"csi.vsphere.vmware.com":"42379773-ee0a-0949-8412-df21612777e0"}'
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"76:90:dd:8c:b4:8c"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 10.0.0.123
node.alpha.kubernetes.io/ttl: "0"
rke2.io/node-args: '["agent","--kubelet-arg","cloud-provider=external","--kubelet-arg","fail-swap-on=false","--kubelet-arg","provider-id=vsphere://42379773-ee0a-0949-8412-df21612777e0","--kubelet-arg","max-pods=110","--node-name","vmware-dev-rke2-worker-01","--server","https://192.168.1.120:9345","--token","********"]'
rke2.io/node-config-hash: VIHTC2YEJGW5F2YCX5UNJGUOJJA5TQC6JODPETJNCCQPENKBP6EQ====
rke2.io/node-env: '{}'
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-12-19T13:07:24Z"
finalizers:
- wrangler.cattle.io/node
- wrangler.cattle.io/managed-etcd-controller
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: vsphere-vm.cpu-4.mem-4gb.os-ubuntu
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: home
failure-domain.beta.kubernetes.io/zone: office
kubernetes.io/arch: amd64
kubernetes.io/hostname: vmware-dev-rke2-worker-01
kubernetes.io/os: linux
node-role.kubernetes.io/worker: "true"
node.kubernetes.io/instance-type: vsphere-vm.cpu-4.mem-4gb.os-ubuntu
topology.csi.vmware.com/k8s-region: home
topology.csi.vmware.com/k8s-zone: office
topology.kubernetes.io/region: home
topology.kubernetes.io/zone: office
worker: "true"
name: vmware-dev-rke2-worker-01
resourceVersion: "2257"
uid: d0331edb-18f9-40d9-a58d-9ff4144dbf6d
nope ive added it.
apiVersion: v1
kind: Node
metadata:
annotations:
alpha.kubernetes.io/provided-node-ip: 10.30.2.9
cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
cluster.autoscaler.nodegroup/autoprovision: "false"
cluster.autoscaler.nodegroup/instance-id: 423b9c94-7b4a-1ff9-eef5-e02ec9e9dc18
cluster.autoscaler.nodegroup/managed: "false"
cluster.autoscaler.nodegroup/name: vmware-ca-k8s
cluster.autoscaler.nodegroup/node-index: "0"
csi.volume.kubernetes.io/nodeid: '{"csi.vsphere.vmware.com":"423b9c94-7b4a-1ff9-eef5-e02ec9e9dc18"}'
.....
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: enbitkub01
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/master: "true"
I apologyze, you find a bug.
For workaround set minNodeSize to zero not 1, because the status created of the group is setted after to try add nodes to reach the minSize but add node need that the nodegroup have the status created....
Hey,
we enabled the debugging option and see following output:
Call server TargetSize: id:\"vmware-ca-k8s\"
cluster autoscaler tells us following:
I1207 11:23:17.741823 1 orchestrator.go:546] Pod enbitcon/enbitcon-shopware6-64cfd4bbc8-45m4v can't be scheduled on vmware-ca-k8s, predicate checking error: Too many pods, Insufficient cpu, Insufficient memory; predicateName=NodeResourcesFit; reasons: Too many pods, Insufficient cpu, Insufficient memory; debugInfo=
Manifests:
Thanks a lot.