digitalocean / container-blueprints

DigitalOcean Kubernetes(DOKS) Solution Blueprints
92 stars 46 forks source link

[create-doks-with-terraform-flux] Avoid querying DOKS cluster metadata in the main TF module, via the `digitalocean_kubernetes_cluster` data source #25

Open v-ctiutiu opened 2 years ago

v-ctiutiu commented 2 years ago

Overview

Seems that this combination is behaving like a poison pill:

data "digitalocean_kubernetes_cluster" "primary" {
  name = var.doks_cluster_name
  depends_on = [
    digitalocean_kubernetes_cluster.primary
  ]
}

When used with the following provider:

provider "kubernetes" {
  host  = data.digitalocean_kubernetes_cluster.primary.endpoint
  token = data.digitalocean_kubernetes_cluster.primary.kube_config[0].token
  cluster_ca_certificate = base64decode(
    data.digitalocean_kubernetes_cluster.primary.kube_config[0].cluster_ca_certificate
  )
}

When you spin up a cluster for the first time, the above combination will work. But, subsequent runs of terraform plan fail with:

Error: Get "http://localhost/api/v1/namespaces/flux-system": dial tcp [::1]:80: connect: connection refused
│ 
│   with module.doks_flux_cd.kubernetes_namespace.flux_system,
│   on .terraform/modules/doks_flux_cd/create-doks-with-terraform-flux/main.tf line 52, in resource "kubernetes_namespace" "flux_system":
│   52: resource "kubernetes_namespace" "flux_system" {

My assumption is that it has to do on how Terraform evaluates resources, providers, data sources, etc. Seems that on subsequent runs, after the DOKS cluster is created, the depends_on condition is causing the digitalocean_kubernetes_cluster data source to not re-evaluate, or to not return valid data. The kubernetes provider will default to localhost instead, if not receiving a valid Kubernetes cluster configuration from the remote.

On the other hand, we don't need to lookup data using the digitalocean_kubernetes_cluster data source. The digitalocean_kubernetes_cluster resource, is already exposing everything we need after successful creation.

Proposed Solution

Avoid lookup using the digitalocean_kubernetes_cluster data source, and rely on the digitalocean_kubernetes_cluster resource instead.

v-ctiutiu commented 2 years ago

We shall keep this open. Changing the node count works as expected now.

On the other hand, I am able to reproduce the issue again after the fix. This time when I change the cluster region, or pool size, same thing happens.

More than that, the digitalocean provider and Terraform should detect that the DOKS cluster must be recreated, but it doesn't. I tried every possible thing, like splitting the main configuration code into submodules, having the providers in separate modules, or inherit from the root module - still nothing !

I also followed the official kubernetes example from the digitalocean TF provider repo - the issue still reproduces.

Interesting though, if I use a random name for the cluster, it behaves as it should. But this seems like a workaround for me. Seems that some users are complaining about same issue as well on the official repo.

ramwolken9 commented 2 years ago

Hi @v-ctiutiu I face the same kind of issue, Can you please help me fix this ?

module.doks_flux_cd.github_repository_file.install: Refreshing state... /clusters/do/development/flux-system/gotk-components.yaml
╷
│ Error: serializer for text/html; charset=utf-8 doesn't exist
│
│   with module.doks_flux_cd.kubernetes_namespace.flux_system,
│   on .terraform/modules/doks_flux_cd/create-doks-with-terraform-flux/main.tf line 52, in resource "kubernetes_namespace" "flux_system":
│   52: resource "kubernetes_namespace" "flux_system" {
│
╵
ramwolken9 commented 2 years ago

I got this issue while terraform plan -out starter_kit_flux_cluster.out for updating doks_cluster_pool_size

v-ctiutiu commented 2 years ago

Hi @ramwolken9,

I'm not sure it it's the same issue, but can you share some more details please? Like the Kubernetes version you're using, Terraform version, and maybe give some other relevant information or steps to help me reproduce the issue first ?

Are there any other moving parts in your setup ? What I want to know here is if you changed anything else in the TF module itself (like the Flux CD provider version). Or, did you change anything by hand in the Flux CD system configuration on the Kubernetes cluster ?

Thanks.

ramwolken9 commented 2 years ago

@v-ctiutiu Thanks! You are correct, Issue was due to direct modification to cluster recourse from DOKS dashboard.