Node Pools aren't being set w/ automatic node repair, node updates, or Autoscaling

lesv commented 6 years ago

My input:

module "gke-cluster" {
  source = "google-terraform-modules/kubernetes-engine/google"
  version = "1.19.1"

  general = {
    name = "${var.cluster_name}"
    env  = "${var.environment}"
    zone = "${var.gcp_zone}"
  }

  master = {
    enable_kubernetes_alpha = true
    username = "admin"
    password = "${random_string.password.result}"
  }

  default_node_pool = {
    node_count = 3
    machine_type = "${var.node_machine_type}"
    disk_size_gb = "${var.node_disk_size}"
    disk_type = "pd-ssd"
    oauth_scopes =   "https://www.googleapis.com/auth/compute,https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management,https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/pubsub,https://www.googleapis.com/auth/datastore"

#    autoscaling {   ## I've tried with both this, and this commented out.
#      min_node_count = 1
#      max_node_count = 10
#    }

#    management {  ##  DEFAULTS to TRUE so it should just work, but it's not on 10/30pm
#     auto_repair = true
#     auto_upgrade= true
#    }
  }

  node_pool = []
}

I get: screen shot 2018-10-30 at 10 46 52 pm

With things commented out looking in terraform.tfstate:

                "google_container_cluster.new_container_cluster": {
                    "type": "google_container_cluster",
                    "depends_on": [
                        "data.google_container_engine_versions.region",
                        "local.name_prefix"
                    ],
                    "primary": {
                        "id": "knative-dev-us-west1-c-master",
                        "attributes": {
 .
 .
 .
                            "id": "knative-dev-us-west1-c-master",
 .
 .
 .
                            "name": "knative-dev-us-west1-c-master",
 .
 .
 .
                            "node_pool.#": "1",
                            "node_pool.0.autoscaling.#": "0",
                            "node_pool.0.initial_node_count": "3",
                            "node_pool.0.instance_group_urls.#": "1",
                            "node_pool.0.instance_group_urls.0": "https://www.googleapis.com/compute/v1/projects/lesv-008/zones/us-west1-c/instanceGroupManagers/gke-knative-dev-us-west1-default-pool-68956134-grp",
                            "node_pool.0.management.#": "1",
                            "node_pool.0.management.0.auto_repair": "false",
                            "node_pool.0.management.0.auto_upgrade": "false",
                            "node_pool.0.max_pods_per_node": "0",
                            "node_pool.0.name": "default-pool",
 .
 .
 ;

I would expect to either set it, or following the comments in the code get that as the default, will look again in the AM incase of operator error, as I'm very much a nube w/ terraform, GKE, and kNative. (Though I've built several clusters by hand)

lesv commented 6 years ago

I also tried just setting:

      min_node_count = 1
      max_node_count = 10

      auto_repair = true
      auto_upgrade= true

It failed inside default_node_pool, but worked inside a node_pool.

I tried just creating a single node_pool and commenting out default_node_pool, but that gave me two node pools, where the default had some really bad defaults.

lesv commented 6 years ago

So, I tried again, and still no success.

module "gke-cluster" {
  source = "google-terraform-modules/kubernetes-engine/google"
  version = "1.19.1"

  general = {
    name = "${var.cluster_name}"
    env  = "${var.environment}"
    zone = "${var.gcp_zone}"
  }

  master = {
#    enable_kubernetes_alpha = true # disables autoRepair & autoUpdate
    username = "admin"
    password = "${random_string.password.result}"

    disable_kubernetes_dashboard = false
    monitoring_service = "monitoring.googleapis.com"
    maintenance_window = "02:15"
  }

  default_node_pool = {
    node_count = 3
    machine_type = "${var.node_machine_type}"
    disk_size_gb = "${var.node_disk_size}"
    disk_type = "pd-ssd"
    oauth_scopes = "${join(",", var.scopes )}"

    min_node_count = 1
    max_node_count = 10

    auto_repair = true
    auto_upgrade= true
  }
}

perriea commented 6 years ago

Currently there is no possibility to activate autoscaling or auto repair on the default node pool on the provider Google ...

Nothing in the doc: https://www.terraform.io/docs/providers/google/r/container_cluster.html#disk_size_gb

And nothing in the code: https://github.com/terraform-providers/terraform-provider-google/blob/51e63bfff2d2acba78bdbb35227669b820a4d61e/google/node_config.go

Personally I often delete the pool default but I think it should be an issue on the provider.

lesv commented 6 years ago

I can do it with the gcloud command. (I get the right result) when I do:

gcloud container clusters create $CLUSTER_NAME \
  --zone=$CLUSTER_ZONE \
  --cluster-version=latest \
  --machine-type=n1-standard-4 \
  --enable-autoscaling --min-nodes=1 --max-nodes=10 \
  --enable-autorepair \
  --scopes=service-control,service-management,compute-rw,storage-ro,cloud-platform,logging-write,monitoring-write,pubsub,datastore \
  --num-nodes=3

lesv commented 6 years ago

Ah - I think I understand, we need to fix the go code.

lesv commented 6 years ago

I ended up switching to the beta provider and using resources directly (and that worked for me):

resource "google_container_cluster" "gke_cluster" {
  name               = "${var.cluster_name}"
  zone               = "${var.gcp_zone}"
  min_master_version = "${var.master_version}"

  master_auth {
    username = "admin"
    password = "${random_string.password.result}"
  }

  addons_config {
    kubernetes_dashboard {
      disabled = false
    }
  }

  logging_service    = "logging.googleapis.com/kubernetes"
  monitoring_service = "monitoring.googleapis.com/kubernetes"

  maintenance_policy {
    daily_maintenance_window {
      start_time = "02:10"
    }
  }

  lifecycle {
    ignore_changes = ["node_pool"]
  }

  node_pool {
    name       = "default-pool"
    node_count = "${var.min_node_count}"

    autoscaling {
      min_node_count = "${var.min_node_count}"
      max_node_count = "${var.max_node_count}"
    }

    management {
      auto_upgrade = true
      auto_repair  = true
    }

    node_config {
      oauth_scopes = "${var.scopes}"

      machine_type = "${var.node_machine_type}"
      disk_size_gb = "${var.node_disk_size}"
      disk_type = "pd-ssd"
    }
  }
}

perriea commented 6 years ago

Thank you @lesv, I will look at this on the beta provider to see if I have not missed something on the stable version 👍

nhooyr commented 5 years ago

The standard provider seems to work fine for me with @lesv's solution.

google-terraform-modules / terraform-google-kubernetes-engine

Node Pools aren't being set w/ automatic node repair, node updates, or Autoscaling #26