estafette / estafette-gke-node-pool-shifter

Kubernetes controller that can shift nodes from one node pool to another, to favour for example preemptibles over regular VMs
https://helm.estafette.io
MIT License
128 stars 16 forks source link

Support GKE regional cluster type #10

Open joaquindlz opened 5 years ago

joaquindlz commented 5 years ago

Hi,

In GKE cluster version 1.10.6-gke.11 this tool doesn't work. Deployed with Helm.

Config:

nodePool:
  # The name of the node pool to shift from
  from: "pool-1"

  # The name of the node pool to shift to
  to: "pool-2-preemptible"

  # The minimum number of node to keep for the node pool to shift
  fromMinNode: 6

Log:

{"time":"2018-11-25T19:13:03Z","severity":"info","app":"estafette-gke-node-pool-shifter","version":"1.0.11","branch":"master","revision":"4976666d587383eed14694aa00764a5c6db37aea","buildDate":"2017-09-12T15:50:30Z","goVersion":"go1.9","nodePooldFrom":"pool-1","nodePooldTo":"pool-2-preemptible","message":"Starting estafette-gke-node-pool-shifter..."}
{"time":"2018-11-25T19:13:03Z","severity":"info","app":"estafette-gke-node-pool-shifter","version":"1.0.11","port":":9001","path":"/metrics","message":"Serving Prometheus metrics..."}
{"time":"2018-11-25T19:13:06Z","severity":"info","app":"estafette-gke-node-pool-shifter","version":"1.0.11","message":"Checking node pool to shift..."}
{"time":"2018-11-25T19:13:06Z","severity":"info","app":"estafette-gke-node-pool-shifter","version":"1.0.11","node-pool":"pool-1","message":"Node pool has 8 node(s), minimun wanted: 6 node(s)"}
{"time":"2018-11-25T19:13:06Z","severity":"info","app":"estafette-gke-node-pool-shifter","version":"1.0.11","node-pool":"pool-2-preemptible","message":"Attempting to shift one node..."}
{"time":"2018-11-25T19:13:06Z","severity":"info","app":"estafette-gke-node-pool-shifter","version":"1.0.11","node-pool":"pool-2-preemptible","message":"Adding 1 node to the pool, currently 0 node(s), expecting 1 node(s)"}
{"time":"2018-11-25T19:13:06Z","severity":"error","app":"estafette-gke-node-pool-shifter","version":"1.0.11","error":"googleapi: Error 404: Not found: projects/[my-project-name]/zones/us-central1-b/clusters/[my-cluster-name]., notFound","node-pool":"pool-2-preemptible","message":"Error resizing node pool"}
joaquindlz commented 5 years ago

Sorry, the issue is related to the cluster zone configuration. My cluster has a regional configuration, therefore, it runs in us-central1. Does this tool support regional cluster configuration?

etiennetremel commented 5 years ago

Good to hear you found the root cause. I just had a look and with the current way we deal with the groups it would not work. For a regional cluster, GCloud create 1 node pool per zone which for us-central would be:

$ gcloud compute instance-groups list
NAME                                                LOCATION       SCOPE  NETWORK  MANAGED  INSTANCES
gke-standard-cluster-1-default-pool-bd01c69c-grp    us-central1-a  zone   default  Yes      1
gke-standard-cluster-preemptible-pool-47ae2a9d-grp  us-central1-a  zone   default  Yes      1
gke-standard-cluster-1-default-pool-940826f8-grp    us-central1-c  zone   default  Yes      1
gke-standard-cluster-preemptible-pool-dcb56b72-grp  us-central1-c  zone   default  Yes      1
gke-standard-cluster-1-default-pool-4ce86136-grp    us-central1-b  zone   default  Yes      1
gke-standard-cluster-preemptible-pool-45a6ba9a-grp  us-central1-b  zone   default  Yes      1

There is 2 way we could tackle this problem: a. estafette-gke-node-pool-shifter change the logic to handle a regional cluster b. deploy one estafette-gke-node-pool-shifter instance per node pool pair, in the case above it would be 3

joaquindlz commented 5 years ago

Good to hear you found the root cause. I just had a look and with the current way we deal with the groups it would not work. For a regional cluster, GCloud create 1 node pool per zone which for us-central would be:

$ gcloud compute instance-groups list
NAME                                                LOCATION       SCOPE  NETWORK  MANAGED  INSTANCES
gke-standard-cluster-1-default-pool-bd01c69c-grp    us-central1-a  zone   default  Yes      1
gke-standard-cluster-preemptible-pool-47ae2a9d-grp  us-central1-a  zone   default  Yes      1
gke-standard-cluster-1-default-pool-940826f8-grp    us-central1-c  zone   default  Yes      1
gke-standard-cluster-preemptible-pool-dcb56b72-grp  us-central1-c  zone   default  Yes      1
gke-standard-cluster-1-default-pool-4ce86136-grp    us-central1-b  zone   default  Yes      1
gke-standard-cluster-preemptible-pool-45a6ba9a-grp  us-central1-b  zone   default  Yes      1

There is 2 way we could tackle this problem: a. estafette-gke-node-pool-shifter change the logic to handle a regional cluster b. deploy one estafette-gke-node-pool-shifter instance per node pool pair, in the case above it would be 3

Thank you for your reply @etiennetremel. Taking the example you mentioned, how would you deploy one estafette-gke-node-pool-shifter instance per node pool pair?

etiennetremel commented 5 years ago

Unfortunately we still need to change the logic in the app, right now we assume the cluster make only use of 2 node pools. GetProjectDetailsFromNode in the gcloud.go file would be the first place to look at. There is little chance that I can help in the coming few weeks so if you feel like looking at it, be my guess.

zvictor commented 5 years ago

deploy one estafette-gke-node-pool-shifter instance per node pool pair, in the case above it would be 3

I tried doing that, but I ran into naming conflicts:

Error: release pool-shifter-default-pool failed: secrets "estafette-gke-node-pool-shifter" already exists

orishoshan commented 5 years ago

It doesn't look like there's a way to handle moving just 1 node for a regional cluster (node pool) - you can't control each zone independently.

However it seems the current code after PR #4 actually fixes this issue by using the correct API. Could you please rebuild and upload a new helm package? @etiennetremel @JorritSalverda

ademariag commented 4 years ago

Hi there, is there any intention to support the regional clusters?

sangamgo commented 3 years ago

are regional clusters still not supported? @ademariag did you find any luck for regional clsuters? @orishoshan @zvictor @joaquindlz