[Bug]: v1.3.0 breaks scenarios where the kube config doesn't exist until the terraform run

CaptainRedHat commented 6 months ago

Describe the bug

Something has changed in v1.3.0 that adds some sort of validation during provider setup. This breaks scenarios where the kube config doesn't exist during the planning phase but is created during the terraform run.

Example: I run my terraform to create my kubernetes cluster in a gitlab pipeline. The flux provider is configured to use a kube config that exists at a specific path. The issue is that this config file doesn't get created until terraform runs because I use the rancher2_cluster kube_config resource to create the file during the terraform run. In v1.3.0, I get an error that says "stat /tmp/.kube/config: no such file or directory" I have no issues with this when running v1.2.3.

Steps to reproduce

Configure flux provider to v1.3.0 and to use a kube_config file.
Make sure that kube_config file doesn't exist
Write some terraform that creates a kube config file during the terraform run resource "local_sensitive_file" "kube_config" { source = "./demo_kube_config" filename = "/tmp/.kube/config" }
Run terraform plan
You should receive an error about the specified kube_config file not existing.

Expected behavior

Add an option to skip kubernetes config validation:

skip_kube_config_validation = true

Screenshots and recordings

No response

Terraform and provider versions

terraform 1.4 flux provider 1.3.0

Terraform provider configurations

provider "flux" { kubernetes = { config_path = "/tmp/.kube/config" } git = { url = "ssh://git@gitlab.com//fluxcd.git" branch = "master" ssh = { username = "git" private_key = tls_private_key.flux.private_key_pem } } }

flux_bootstrap_git resource

resource "flux_bootstrap_git" "flux_gitlab" { depends_on = [gitlab_deploy_key.flux, rancher2_namespace., tls_private_key.flux, local_sensitive_file.kube_config] path = "cluster" registry = "/fluxcd" interval = "10m0s" }

Flux version

none specified; use default

Additional context

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Would you like to implement a fix?

None

swade1987 commented 6 months ago

@CaptainRedHat,

I don't recall any validation being implemented, as we reverted the kubeconfig validation I added earlier (see PR #660). This seems like it might be a Terraform race condition rather than an issue specific to this provider.

To address this, try using the depends_on attribute in the flux_bootstrap_git resource to ensure it waits until the kubeconfig is present.

stefanprodan commented 6 months ago

I don't recall any validation being implemented

We sure did, drift detection for in-cluster objects means we connect to the cluster

CaptainRedHat commented 6 months ago

Well something changed as the terraform has worked flawlessly for the past several months. I only noticed this issue in the last week since 1.3.0 of the flux provider has been released. I set terraform to use flux version 1.2.3 and it started working again immediately.

Also I do not believe this to be a race condition as the error occurs during provider setup during the planning phase and as you can see from the provided terraform, I do have a depends_on attribute with the local_sensitive_file specified.

Here is the full error that gets output.

flux_bootstrap_git.flux_gitlab: Refreshing state... [id=flux-system]
╷
│ Error: Kubernetes Client
│ 
│   with flux_bootstrap_git.flux_gitlab,
│   on main.tf line 231, in resource "flux_bootstrap_git" "flux_gitlab":
│  231: resource "flux_bootstrap_git" "flux_gitlab" {
│ 
│ stat /tmp/.kube/config: no such file or directory

swade1987 commented 6 months ago

I don't recall any validation being implemented

We sure did, drift detection for in-cluster objects means we connect to the cluster

Good point @stefanprodan

swade1987 commented 6 months ago

@CaptainRedHat,

As @stefanprodan mentioned, we need a validate kubeconfig to perform drift detection. In previous provider releases, our drift detection was not as comprehensive. Implementing logic to bypass this validation doesn't make sense, as it's crucial for state management.

Therefore, I highly recommend using the depends_on attribute to make sure the kubeconfig exists before proceeding.

CaptainRedHat commented 6 months ago

resource "flux_bootstrap_git" "flux_gitlab" { depends_on = [gitlab_deploy_key.flux, rancher2_namespace., tls_private_key.flux, local_sensitive_file.kube_config] path = "cluster" registry = "/fluxcd" interval = "10m0s" }

As stated before, I do that already, but it is failing during terraform plan

resource "flux_bootstrap_git" "flux_gitlab" { depends_on = [gitlab_deploy_key.flux, rancher2_namespace., tls_private_key.flux, local_sensitive_file.kube_config] path = "cluster" registry = "/fluxcd" interval = "10m0s" }

CaptainRedHat commented 6 months ago

I guess the problem is that I currently have no way to persist the kube config file between the initial terraform apply and any subsequent terraform plans (including a terraform plan -destroy) since terraform is running in a Gitlab pipeline. I can work around this if I absolutely have to but I would prefer not to, considering this worked fine with previous versions of the flux provider.

stefanprodan commented 6 months ago

I guess the problem is that I currently have no way to persist the kube config file between the initial terraform apply and any subsequent terraform plans (including a terraform plan -destroy) since terraform is running in a Gitlab pipeline.

Why use terraform if you don’t persist the state between runs? I recommend using a shell command in CI to run flux bootstrap instead of this provider.

CaptainRedHat commented 6 months ago

I guess the problem is that I currently have no way to persist the kube config file between the initial terraform apply and any subsequent terraform plans (including a terraform plan -destroy) since terraform is running in a Gitlab pipeline.

Why use terraform if you don’t persist the state between runs? I recommend using a shell command in CI to run flux bootstrap instead of this provider.

The terraform state is persisted in Gitlab's terraform state management, but any files that are created in the runner's filesystem are lost. And since the rancher2 provider doesn't output the cluster's certificates or api endpoints, I am forced to use a kube config file for auth.

ekristen commented 6 months ago

@CaptainRedHat I see 3 options for you right now. Arranged easiest to hardest.

You can use the artifacts feature of gitlab runner to persist the kubeconfig between runs, this would allow you to use the config_path attribute.
You can try and use yamldecode directly on the rancher2_cluster kubeconfig attribute and try and pass host, token and cabundle to the kubernetes provider that way
Submit a PR to the rancher 2 provider to make the host, token, and cabundle attributes on the cluster resource so you can pass it into the kubernetes provider.

swade1987 commented 4 months ago

@CaptainRedHat have you tried any of the proposals @ekristen listed above? If so please let me know how you got on.

swade1987 commented 3 months ago

I am closing this issue because there has been no activity on it. If this issue is still important to you, please raise another one or comment on this one if you would like to re-open it.

fluxcd / terraform-provider-flux