GoogleCloudPlatform / gke-enterprise-mt

This repository hosts the terraform module that helps setup a GKE cluster and environment based on the Enterprise Multi-Tenancy Best Practices Guide.
https://cloud.google.com/kubernetes-engine/docs/best-practices/enterprise-multitenancy
Apache License 2.0
26 stars 12 forks source link

container-engine-robot account race #2

Open mattcary opened 4 years ago

mattcary commented 4 years ago

The container-engine-robot account on the service account seems to be created very late. The first run through, the IAM bindings for subnet stuff can't be applied. But later on it seems to work. Note it is not due to the GKE cluster not being created, as that is done later than IAM in the original helmsman create script.

mattcary commented 4 years ago

This is happening very consistently now in the platform_basic test. TODO: create small example and pass off to CFT or, more likely, the Google Graphite team.

mattcary commented 4 years ago

Here is a self-contained example. I'm shopping this around to see what the best way to fix or workaround this might be.

# This shows a race condition between service enabling on a project and creating
# a binding on the robot account associated with the service. The error is:
#
# Error: Batch "iam-project-mattcary-race-test-265e modifyIamPolicy" for request "Create IAM Members roles/container.developer serviceAccount:service-42573466548@container-engine-robot.iam.gserviceaccount.com for \"project \\\"mattcary-race-test-265e\\\"\"" returned error: Error applying IAM policy for project "mattcary-race-test-265e": Error setting IAM policy for project "mattcary-race-test-265e": googleapi: Error 400: Service account service-42573466548@container-engine-robot.iam.gserviceaccount.com does not exist., badRequest. To debug individual requests, try disabling batching: https://www.terraform.io/docs/providers/google/guides/provider_reference.html#enable_batching

variable "organization_id" {
  type = string
}

variable "billing_account" {
  type = string
}

locals {
  main_name = format("mattcary-race-test-%s", random_id.suffix.hex)
}

resource "random_id" "suffix" {
  byte_length = 2
}

resource "google_project" "main" {
  name            = local.main_name
  project_id      = local.main_name
  org_id          = var.organization_id
  billing_account = var.billing_account
}

resource "google_project_service" "main_services" {
  project = google_project.main.project_id
  service = "container.googleapis.com"
}

resource "google_project_iam_member" "iam-binding" {
  project = google_project.main.project_id
  role    = "roles/container.developer"
  member  = "serviceAccount:service-${google_project.main.number}@container-engine-robot.iam.gserviceaccount.com"
}
mattcary commented 4 years ago

Fixed (I think) in the project-factory module with https://github.com/terraform-google-modules/terraform-google-project-factory/pull/387.

A release of that should go out this week or next, when that happens I will update the dependency here and see if it fixes it.