gocrane / crane

Crane is a FinOps Platform for Cloud Resource Analytics and Economics in Kubernetes clusters. The goal is not only to help users to manage cloud cost easier but also ensure the quality of applications.
https://gocrane.io
Apache License 2.0
1.86k stars 379 forks source link

I can use helm cli to deploy the crane to a GKE cluster but not via terraform #802

Open csharpknife2002 opened 1 year ago

csharpknife2002 commented 1 year ago

Describe the bug In the same GKE cluster, I can use helm cli to install crane successfuly via the following cmd:

helm install gocrane -n crane-test --create-namespace  crane/crane --set craned.containerArgs.prometheus-address=http://chen-prometheus-server.prometheus.svc.cluster.local:8080  --debug

But it always failed when I use terraform scripts, (I have no problem to deploy prometheus, granfana and other apps via TF to the same cluster


data "terraform_remote_state" "gke" {
  backend = "gcs" 
  config = {
    bucket  = "xxx-terraform-states"
    prefix  = "cluster"
  }
}

provider "google" {
  project = data.terraform_remote_state.gke.outputs.project_id
  region  = data.terraform_remote_state.gke.outputs.region
}

data "google_client_config" "default" {}

data "google_container_cluster" "my_cluster" {
  name     = data.terraform_remote_state.gke.outputs.kubernetes_cluster_name
  location = data.terraform_remote_state.gke.outputs.zone
  project = data.terraform_remote_state.gke.outputs.project_id
}

provider "helm" {
  kubernetes {
    host = data.terraform_remote_state.gke.outputs.kubernetes_cluster_host
    token = data.google_client_config.default.access_token
    cluster_ca_certificate = base64decode(data.google_container_cluster.my_cluster.master_auth[0].cluster_ca_certificate)
  }
  debug = true
}

provider "kubernetes" {
  host                   = "https://${data.terraform_remote_state.gke.outputs.kubernetes_cluster_host}"
  token                  = "${data.google_client_config.default.access_token}"
  cluster_ca_certificate = "${base64decode(data.google_container_cluster.my_cluster.master_auth.0.cluster_ca_certificate)}"
}

provider "kubectl" {
  load_config_file       = false
  host                   = "https://${data.terraform_remote_state.gke.outputs.kubernetes_cluster_host}"
  token                  = "${data.google_client_config.default.access_token}"
  cluster_ca_certificate = "${base64decode(data.google_container_cluster.my_cluster.master_auth.0.cluster_ca_certificate)}"
}

resource "kubernetes_namespace" "gocrane_ns" {
  metadata {
    name = "crane-system"
  }
}

resource "helm_release" "grafana-gocrane" {
  name  = "grafana-gocrane"
  repository = "https://grafana.github.io/helm-charts"
  chart = "grafana"

  timeout = 120
  cleanup_on_fail = true
  force_update    = false
  namespace       = kubernetes_namespace.gocrane_ns.metadata.0.name
  # version = "6.11.0"

  depends_on = [ kubernetes_namespace.gocrane_ns]

  values = [
    file("${path.module}/grafana_override_values.yaml")
  ]
}

resource "helm_release" "gocrane" {
  name  = "gocrane"
  repository = "https://gocrane.github.io/helm-charts"
  chart = "crane"

  timeout = 300
  cleanup_on_fail = true
  force_update    = false
  namespace       = kubernetes_namespace.gocrane_ns.metadata.0.name

  set {
    name = "craned.containerArgs.prometheus-address"
    value = "http://chen-prometheus-server.prometheus.svc.cluster.local:8080"
  }

  depends_on = [ helm_release.grafana-gocrane ]
}

resource "helm_release" "fadvisor" {
  name  = "fadvisor"
  repository = "https://gocrane.github.io/helm-charts"
  chart = "fadvisor"

  timeout = 120
  cleanup_on_fail = true
  force_update    = false
  namespace       = kubernetes_namespace.gocrane_ns.metadata.0.name

  set {
    name = "craned.containerArgs.prometheus-address"
    value = "http://chen-prometheus-server.prometheus.svc.cluster.local:8080"
  }

  depends_on = [helm_release.gocrane]
}

it always failed with the following logs, regardless how long I set the timeout. (with the cli, the installatio take a few seconds)

Warning: Helm release "gocrane" was created but has a failed status. Use the helm command to investigate the error, correct it, then run Terraform again.

│ 
│   with helm_release.gocrane,
│   on main.tf line 101, in resource "helm_release" "gocrane":
│  101: resource "helm_release" "gocrane" {
│ 
╵
╷
│ Error: timed out waiting for the condition
│ 
│   with helm_release.gocrane,
│   on main.tf line 101, in resource "helm_release" "gocrane":
│  101: resource "helm_release" "gocrane" 

Reproduce steps

Expected behavior

Screenshots image

Environment (please complete the following information):

qmhu commented 1 year ago

Is the chart download blocked by network?

clin4 commented 1 year ago

@qmhu , no, the download was success. Actually I didn't see any differences between use helm cli and terraform. I got all the things as expected, Except terraform will fail after a while, which I have not clue what it is failing for... (or what it is waiting for...)