fluxcd / terraform-provider-flux

Terraform and OpenTofu provider for bootstrapping Flux
https://registry.terraform.io/providers/fluxcd/flux/latest
Apache License 2.0
365 stars 87 forks source link

[Bug]: flux provider not inheriting correctly ssh section when flux provider is passed via module #714

Open dempo93 opened 2 weeks ago

dempo93 commented 2 weeks ago

Describe the bug

I am using the flux provider by initializing it with the output of a module

provider "flux" {
  kubernetes = {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
    token                  = data.aws_eks_cluster_auth.cluster.token
  }
  git = {
    url    = module.setup.ssh_url_to_repo
    branch = module.setup.default_branch
    ssh = {
      username    = module.setup.gitlab_ssh_username
      private_key = module.setup.gitlab_ssh_private_key
    }
  }
}

but this results in

╷
│ Error: Git Client
│ 
│ could not create git client: ssh scheme cannot be used without private key
╵

My problem, and my code, is a duplicate of #531. I tested it on the latest version (1.3.0 at the time of writing)

Steps to reproduce

Follow the reproduction steps on #531

Expected behavior

flux_bootstrap_git does not error out and gets correctly initialized with the module output

Screenshots and recordings

No response

Terraform and provider versions

Terraform v1.8.3
on linux_amd64
+ provider registry.terraform.io/fluxcd/flux v1.3.0

Terraform provider configurations

provider "flux" {
  kubernetes = {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
    token                  = data.aws_eks_cluster_auth.cluster.token
  }
  git = {
    url    = module.setup.ssh_url_to_repo
    branch = module.setup.default_branch
    ssh = {
      username    = module.setup.gitlab_ssh_username
      private_key = module.setup.gitlab_ssh_private_key
    }
  }
}

flux_bootstrap_git resource

resource "flux_bootstrap_git" "this" {
  namespace        = var.namespace
  components_extra = ["image-reflector-controller", "image-automation-controller"]
}

Flux version

v2.3.0 (default)

Additional context

No response

Code of Conduct

Would you like to implement a fix?

None

swade1987 commented 2 weeks ago

Hi @dempo93 can you provide the values that are being passed into the provider from the module, please redact anything sensitive!

dempo93 commented 2 weeks ago

Hi @swade1987 thanks for picking this up. To answer your question, the values are as follows:

  + branch   = "main"
  + url      = "ssh://git@<our_git_url>/platform/state/sres-state.git"
  + username = "git"
  + private_key = <redacted> (but correct private key)

However after further investigation I believe the problem is not the modularization, but the fact that we are trying to update from 0.23.0 to 1.3.0, more specifically from this config:

#0.23.0
provider "flux" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}
...

resource "flux_bootstrap_git" "this" {
  namespace        = var.namespace
  branch           = data.gitlab_project.cluster_state.default_branch
  components_extra = ["image-reflector-controller", "image-automation-controller"]
  depends_on = [gitlab_deploy_key.main]
  url        = "ssh://${local.formatted_ssh_url}"
  ssh = {
    username    = "git"
    private_key = tls_private_key.flux_sync.private_key_pem
  }
}

to this config

#1.3.0
provider "flux" {
  kubernetes = {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
    token                  = data.aws_eks_cluster_auth.cluster.token
  }
  git = {
    url    = module.setup.ssh_url_to_repo
    branch = module.setup.default_branch
    ssh = {
      username    = module.setup.gitlab_ssh_username
      private_key = module.setup.gitlab_ssh_private_key
    }
  }
}
...
resource "flux_bootstrap_git" "this" {
  namespace        = var.namespace
  components_extra = ["image-reflector-controller", "image-automation-controller"]
}

Now my suspicion is that when the provider 1.3.0 is used to update the 0.23.0 resource flux_bootstrap_git.this to 1.3.0, this fails with the reported error message. Suspicion further confirmed by the workaround I found:

DISCLAIMER: This will kill the flux namespace and all the flux managed helmreleases in your cluster. Also the terraform destroy step is quite pesky and needed manual resource deletion/finalizers removal. This is by no mean recommended on any production env

  1. git checkout a previous version of your code, where you are using the provider that last successfully applied the flux_bootstrap_git resource your state (in my case 0.23.0)
  2. terraform destroy -target="flux_bootstrap_git.this"
  3. Go back to the latest code changes (in my case the ones with provider 1.3.0)
  4. terraform apply

This succeeds for me and I get flux up and running. However I cannot really use it to update flux in production. Is there a safe migration procedure that allows me to move from 0.23.0 to 1.3.0?

swade1987 commented 2 weeks ago

Oh upgrading to v1 from v0 is not a straightforward process. I haven't done it myself but there is documentation https://github.com/fluxcd/terraform-provider-flux/blob/v1.0.0/docs/guides/migrating-to-resource.md on the upgrade process.

swade1987 commented 3 days ago

@dempo93 this sounds more like an issue during the upgrade process from v0 to v1. Would that be accurate? If so please see my comment above.