fluxcd / terraform-provider-flux

Terraform and OpenTofu provider for bootstrapping Flux
https://registry.terraform.io/providers/fluxcd/flux/latest
Apache License 2.0
368 stars 86 forks source link

[Bug]: flux_bootstrap_git fails to push due to non-fast-forward update #662

Open gionn opened 6 months ago

gionn commented 6 months ago

Describe the bug

We have just completed the migration to v1.2.3 and using the new flux_bootstrap_git which works great except during a scheduled environments recreation we have setup every night. In this scenario multiple clusters (~5) are bootstrapping at the same time, and one terraform apply is failing with failed to push manifests: failed to push to remote: non-fast-forward update: refs/heads/main

Steps to reproduce

  1. Have a single flux repository pointed by multiple clusters
  2. Apply flux_bootstrap_git on multiple clusters concurrently

Expected behavior

non-fast-forward push can happen anytime in a git workflow, provider should be smart enough to do a configurable number of retries internally before giving up.

Screenshots and recordings

╷
│ Error: Bootstrap run error
│ 
│   with module.flux2[0].flux_bootstrap_git.this,
│   on ../../modules/flux2/main.tf line 1, in resource "flux_bootstrap_git" "this":
│    1: resource "flux_bootstrap_git" "this" {
│ 
│ failed to push manifests: failed to push to remote: non-fast-forward
│ update: refs/heads/main

Terraform and provider versions

Terraform v1.6.5 on linux_amd64

Terraform provider configurations

provider "flux" {
  kubernetes = {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
    exec = {
      api_version = "client.authentication.k8s.io/v1beta1"
      args        = local.aws_cli_args
      command     = "aws"
    }
  }
  git = {
    url          = "https://github.com/${var.github_org}/${var.flux_github_repository}.git"
    branch       = var.flux_branch_name
    author_name  = "fluxcdbot"
    author_email = "fluxcdbot@users.noreply.github.com"
    http = {
      username = var.github_email
      password = var.github_token
    }
  }
}

flux_bootstrap_git resource

resource "flux_bootstrap_git" "this" {
  path             = "./clusters/${var.cluster_name}"
  components_extra = var.flux_automated ? ["image-reflector-controller", "image-automation-controller"] : []
  network_policy   = false
  version          = "v2.2.3"
  namespace        = kubernetes_namespace.this.metadata[0].name

  disable_secret_creation = true
  secret_name             = kubernetes_secret.flux_git_deploy.metadata[0].name
}

Flux version

v2.2.3

Additional context

No response

Code of Conduct

Would you like to implement a fix?

No

stefanprodan commented 6 months ago

We have implemented reties for Git operations but looks like they are implemented only on Update and Delete https://github.com/fluxcd/terraform-provider-flux/pull/436

Are you getting the non-fast-forward error for newly provisioned clusters that don't have a directory in Git?

gionn commented 6 months ago

Are you getting the non-fast-forward error for newly provisioned clusters that don't have a directory in Git?

oh yes, those clusters are getting their dir wiped during the destroy process, so on the following creation there is no directory.

Thanks for the prompt reply 🙏🏻

gionn commented 6 months ago

This morning I've got a slightly different error message for the failed push but looks like is still the same issue as before.

╷
│ Error: Bootstrap run error
│ 
│   with module.flux2[0].flux_bootstrap_git.this,
│   on ../../modules/flux2/main.tf line 1, in resource "flux_bootstrap_git" "this":
│    1: resource "flux_bootstrap_git" "this" {
│ 
│ failed to push manifests: failed to push to remote: command error on
│ refs/heads/main: cannot lock ref 'refs/heads/main': is at
│ 91b665917e0b80eeeeca1ee8a866cb2a4b7f755e but expected
│ 9e324f9e38778a8bf8ff6bf46cf56c3645ca3bde
swade1987 commented 3 months ago

@gionn, we would need to move all the Go logic from TF to the bootstrap pkg in flux2 and rewrite the TF provider. This is not something the team is considering spending time on this year.

gionn commented 3 months ago

thanks for the update.

Anyway I would leave this issue open as the reported bug is not solved and will impact users using the latest version.

nishasati6oct commented 3 months ago

My TF code (using PAT token for this purpose)

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">=3.50.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.20.0"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.13.1"
    }
    flux = {
      source  = "fluxcd/flux"
      version = "=1.2.2"
    }
    github = {
      source  = "integrations/github"
      version = ">=5.26.0"
    }
  }
}

resource "flux_bootstrap_git" "this" {
  secret_name      = "kconfig"
  path             = "clusters/aks"
  toleration_keys  = ["CriticalAddonsOnly"]
  version          = "v2.3.0"
}

Getting error flux_bootstrap_git.this: Refreshing state... [id=flux-system]

Planning failed. Terraform encountered an error while generating this plan.

╷ │ Error: Git Client │ │ with flux_bootstrap_git.this, │ on main.tf line 65, in resource "flux_bootstrap_git" "this": │ 65: resource "flux_bootstrap_git" "this" { │ │ could not clone git repository: unable to clone 'https://github.com/XXXX/XXXX.git': authorization failed

Withe the same change, my team member is able to run the TF Plan. I am also able to clone locally with the user and PAT token. Flux version 2.3.0

How can i get the information what is wrong in my TF plan run?

mariot8 commented 1 month ago

I've encountered the same issue:

╷
│ Error: Bootstrap run error
│
│   with flux_bootstrap_git.this,
│   on fluxcd.tf line 63, in resource "flux_bootstrap_git" "this":
│   63: resource "flux_bootstrap_git" "this" {
│
│ failed to push manifests: failed to push to remote: non-fast-forward update: refs/heads/main
╵