fluxcd / terraform-provider-flux

Terraform and OpenTofu provider for bootstrapping Flux
https://registry.terraform.io/providers/fluxcd/flux/latest
Apache License 2.0
363 stars 87 forks source link

Dynamic kubeconfig support #370

Closed scottiepippen840 closed 1 year ago

scottiepippen840 commented 1 year ago

When running the flux_boostrap_git resource, it appears that the kubeconfig referenced in the config_path parameter is loaded at terraform initialization time (ie. before any resources are applied). For example if you try to bootstrap with:

provider "flux" {
  config_path = "./kubeconfig"
}

resource "flux_bootstrap_git" "this" {
  url = var.flux_repository_url
  path = "kubernetes/clusters/homelab/"
  branch = "flux-test"
  http = {
    username = var.flux_repository_username
    password = var.flux_repository_password
  }
}

and there is no file .kubeconfig or that file exists, but does not contain valid credentials for the target cluster, you receive an error for "Unconfigured clients".

This causes this module to be unusable in scenarios where the cluster does not exist yet. This defeats the purpose of the module in many cases, since you would want to create the cluster and bootstrap it with the same terraform code.

Ideally kubernetes client creation should be late-binding and only be created when the flux_bootstrap_git resource is actually applied.

scottiepippen840 commented 1 year ago

I neglected to mention that this is not impacted by the terraform dependsOn property, which is the core of the problem.

  1. I added dependsOn to the flux_bootstrap_git resource for a local_file resource containing the new clusters kubeconfig.
  2. The flux_bootstrap_git resource correctly executed after all the dependencies, but it could not find or did not use the local_file resource that it depended on.

This is what lead me to the conclusion that the kubeconfig is loaded during provider initialization, rather than when the bootstrapping is run. You can confirm this by specifying a pre-existing kubeconfig for an invalid cluster. The module will complain that it cannot connect before any other resources are applied.

phillebaba commented 1 year ago

You are probably right about this. Automated tests for this are on my todo list so there is something that probably broke during development. I will have a look at if can get a quick test for this going. The issue right now is that the provider init function is called multiple times and it is hard to determine which state we are in.

What should occur is that the client should be initialized as late as possible.

phillebaba commented 1 year ago

@scottiepippen840 so because I was so unsure about if I broke this because there is no test I setup a test which creates a Kind cluster and boostraps the cluster in the same state.

terraform {
  required_version = ">= 1.1.5"
  required_providers {
    flux = {
      source  = "registry.terraform.io/fluxcd/flux"
      version = "0.0.0-dev"
    }
    kind = {
      source  = "kyma-incubator/kind"
      version = "0.0.11"
    }
  }
}

provider "kind" {}

resource "kind_cluster" "this" {
  name           = "test-cluster"
  wait_for_ready = true
}

provider "flux" {
  host                   = kind_cluster.this.endpoint
  client_certificate     = kind_cluster.this.client_certificate
  client_key             = kind_cluster.this.client_key
  cluster_ca_certificate = kind_cluster.this.cluster_ca_certificate
}

resource "flux_bootstrap_git" "this" {
  url = "https://github.com/${var.username}/fleet-infra"
  http = {
    username = var.username
    password = var.password
  }
}

So my best guess is that the kubeconfig that you are referencing may be different or contains unexpected data. Currently the error message is not that great because it does not expose the underlying error which caused this. Something that I will work on when we can remove the old datasource.

Could you share your Terraform and the error message that you are receiving?

scottiepippen840 commented 1 year ago

Oh cool! I don't see it documented anywhere that you can use the host/client config pattern for the flux provider? That will definitely help in cases where that's available.

Here's what I'm using currently, but I admit the local_file nonsense is solely to try and figure out how to pass the kubeconfig to flux:

resource "talos_cluster_kubeconfig" "kubeconfig" {
  talos_config = talos_client_configuration.talosconfig.talos_config
  endpoint     = [for k, v in var.node_data.masters: v.ip][0]
  node         = [for k, v in var.node_data.masters: v.ip][0]
  depends_on       = [proxmox_vm_qemu.talos-masters, proxmox_vm_qemu.talos-workers]
}

resource "local_file" "talos_kubeconfig" {
    content  = talos_cluster_kubeconfig.kubeconfig.kube_config
    filename = "./kubeconfig"
    depends_on       = [talos_cluster_kubeconfig.kubeconfig]
}

Then my flux configuration

provider "flux" {
  config_path = "./kubeconfig"
}

resource "time_sleep" "wait_for_cluster_init" {
  create_duration = "300s"

  depends_on = [local_file.talos_kubeconfig, talos_machine_configuration_apply.worker_config_apply, talos_machine_configuration_apply.cp_config_apply, talos_machine_bootstrap.bootstrap]
}

resource "flux_bootstrap_git" "this" {
  url = var.flux_repository_url
  path = "kubernetes/clusters/homelab/"
  branch = "flux-test"
  http = {
    username = var.flux_repository_username
    password = var.flux_repository_password
  }
  depends_on = [time_sleep.wait_for_cluster_init]
}

I can use the kubeconfig to access the cluster fine and if I re-run terraform the flux bootstrap git applies.. so it's something about the initialization of the local_file.

phillebaba commented 1 year ago

So the provider takes the same input as the Kubernetes and Helm Terraform provider. This is by design because I thought it was good to not reinvent the wheel again. You can checkout the provider docs here for more info. https://registry.terraform.io/providers/fluxcd/flux/latest/docs

It might help to debug this issue by understanding a bit more how Terraform works. When you run plan Terraform parses the HCL and resolves variable and resource references in each block. With this it creates a DAG representing dependencies between all resources. It also determines if values are known at plan time or known sometime during apply. This is the reason why circular references are not possible in Terraform.

The issue you have is that the provider does not have a reference to the resource which fetches the kubeconfig. So Terraform determines that the provider has no dependencies and will immediately initialize it, which fails because the file is not present. Now the challenge is that as far as I know you can use depends_on in a provider block. So you have two options in this case.

The first solution is to create a reference to the local_file resource in the provider, which should make sure that the file is created before the Flux provider is initiated. That could look something like this.

resource "talos_cluster_kubeconfig" "kubeconfig" {
  talos_config = talos_client_configuration.talosconfig.talos_config
  endpoint     = [for k, v in var.node_data.masters: v.ip][0]
  node         = [for k, v in var.node_data.masters: v.ip][0]
  depends_on       = [proxmox_vm_qemu.talos-masters, proxmox_vm_qemu.talos-workers]
}

resource "local_file" "talos_kubeconfig" {
    content  = talos_cluster_kubeconfig.kubeconfig.kube_config
    filename = "./kubeconfig"
}

provider "flux" {
  config_path = local_file.talos_kubeconfig.filename
}

resource "flux_bootstrap_git" "this" {
  url = var.flux_repository_url
  path = "kubernetes/clusters/homelab/"
  branch = "flux-test"
  http = {
    username = var.flux_repository_username
    password = var.flux_repository_password
  }
}

Now I am not a fan of localfile, escpecially becaasue there is no support for temporary directories. My suggestion for you is that you parse the kubeconfig output from talos, and feed it into the provider. Check this response in this issue for more details of how that should be implemented. Your solution may have to be a bit different depending on the authentication method used in the kubeconfig returned by talos. https://github.com/hashicorp/terraform-provider-kubernetes/issues/917#issuecomment-737885191

phillebaba commented 1 year ago

I consider this issue closed as it is a matter of Terraform configuration dependencies, rather than an issue with the provider init.