fluxcd / terraform-provider-flux

Terraform and OpenTofu provider for bootstrapping Flux
https://registry.terraform.io/providers/fluxcd/flux/latest
Apache License 2.0
336 stars 89 forks source link

[Bug]: Provider should not delete namespace if it did not create it #654

Closed dempo93 closed 1 month ago

dempo93 commented 1 month ago

Describe the bug

Our deployment of flux looks like this

resource "kubernetes_namespace" "flux_system" {
  metadata {
    name = var.flux_namespace
  }
  ...<details>
}

resource "flux_bootstrap_git" "this" {
  namespace        = var.flux_namespace
  branch           = data.gitlab_project.cluster_state.default_branch
  components_extra = ["image-reflector-controller", "image-automation-controller"]
  ...<auth details>
}

We recently got an incident because flux decided to delete a namespace it didn't create upon destruction. I think the culprit is here: https://github.com/fluxcd/terraform-provider-flux/blob/main/internal/provider/resource_bootstrap_git.go#L652C77-L652C88

The terraform provider should attempt to delete the namespace only if it created it, not if it was created externally.

flux uninstall cli command provides an option to avoid this behavior: --keep-namespace

Steps to reproduce

As flux was stuck for a different reason those are the steps that I undertook. With the code above,

  1. terraform apply
  2. terraform state rm 'flux_bootstrap_git.this'
  3. terraform apply --target='flux_bootstrap_git.this'

I believe the same could be reproduced by simply

  1. terraform apply
  2. terraform destroy --target='flux_bootstrap_git.this'

Expected behavior

Namespace is not destroyed at any point

Screenshots and recordings

No response

Terraform and provider versions

Terraform v1.7.1
on linux_amd64
+ provider registry.terraform.io/fluxcd/flux v0.23.0

Terraform provider configurations

provider "aws" {

  region = var.region

  default_tags {
    tags = var.default_tags
  }
  assume_role {
    role_arn = "<role>"
  }
}

provider "kubernetes" {
  experiments {
    manifest_resource = true
  }

  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
    token                  = data.aws_eks_cluster_auth.cluster.token
  }
}

provider "flux" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

provider "gitlab" {
  token    = local.gitlab_admin_token
  base_url = var.gitlab_base_url
}

flux_bootstrap_git resource

resource "flux_bootstrap_git" "this" {
  namespace        = var.namespace
  branch           = data.gitlab_project.cluster_state.default_branch
  components_extra = ["image-reflector-controller", "image-automation-controller"]
  depends_on = [gitlab_deploy_key.main]
  url        = "ssh://${local.formatted_ssh_url}"
  ssh = {
    username    = local.username
    private_key = local.key_pem
  }
}

Flux version

0.23.0

Additional context

No response

Code of Conduct

Would you like to implement a fix?

None

swade1987 commented 1 month ago

@dempo93, great spot. Could you please try using the latest version of the provider and report back please.

required_providers {
    flux = {
      source = "fluxcd/flux"
      version = "1.2.3"
    }
  }
stefanprodan commented 1 month ago

The namespace is part of the Flux manifests and that's why TF deletes it. In the Flux CLI we have flux uninstall --keep-namespace.

swade1987 commented 1 month ago

@stefanprodan understood, we are going to need to look at all state and see if their is a namespace there for flux-system and if so set --keep-namespace on the uninstall flow.

ekristen commented 1 month ago

I would just add an option to the bootstrap resource for every CLI option, like --keep-namespace and document it so users can just choose to do that. Otherwise you are going to get into some very sticky situations trying to interpret state and what should be done vs just pushing to the user to decide during configuration.

There are too many ways the namespace could be created and not trackable by the provider and flux to know what to do.

dempo93 commented 3 weeks ago

Sorry, didn't get to update flux and test on the latest version yet. However great you already tackled the issue. Thank you!