fluxcd / terraform-provider-flux

Terraform and OpenTofu provider for bootstrapping Flux
https://registry.terraform.io/providers/fluxcd/flux/latest
Apache License 2.0
336 stars 89 forks source link

[Bug]: flux_bootstrap_git fails to push due to non-fast-forward update #662

Open gionn opened 4 weeks ago

gionn commented 4 weeks ago

Describe the bug

We have just completed the migration to v1.2.3 and using the new flux_bootstrap_git which works great except during a scheduled environments recreation we have setup every night. In this scenario multiple clusters (~5) are bootstrapping at the same time, and one terraform apply is failing with failed to push manifests: failed to push to remote: non-fast-forward update: refs/heads/main

Steps to reproduce

  1. Have a single flux repository pointed by multiple clusters
  2. Apply flux_bootstrap_git on multiple clusters concurrently

Expected behavior

non-fast-forward push can happen anytime in a git workflow, provider should be smart enough to do a configurable number of retries internally before giving up.

Screenshots and recordings

╷
│ Error: Bootstrap run error
│ 
│   with module.flux2[0].flux_bootstrap_git.this,
│   on ../../modules/flux2/main.tf line 1, in resource "flux_bootstrap_git" "this":
│    1: resource "flux_bootstrap_git" "this" {
│ 
│ failed to push manifests: failed to push to remote: non-fast-forward
│ update: refs/heads/main

Terraform and provider versions

Terraform v1.6.5 on linux_amd64

Terraform provider configurations

provider "flux" {
  kubernetes = {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
    exec = {
      api_version = "client.authentication.k8s.io/v1beta1"
      args        = local.aws_cli_args
      command     = "aws"
    }
  }
  git = {
    url          = "https://github.com/${var.github_org}/${var.flux_github_repository}.git"
    branch       = var.flux_branch_name
    author_name  = "fluxcdbot"
    author_email = "fluxcdbot@users.noreply.github.com"
    http = {
      username = var.github_email
      password = var.github_token
    }
  }
}

flux_bootstrap_git resource

resource "flux_bootstrap_git" "this" {
  path             = "./clusters/${var.cluster_name}"
  components_extra = var.flux_automated ? ["image-reflector-controller", "image-automation-controller"] : []
  network_policy   = false
  version          = "v2.2.3"
  namespace        = kubernetes_namespace.this.metadata[0].name

  disable_secret_creation = true
  secret_name             = kubernetes_secret.flux_git_deploy.metadata[0].name
}

Flux version

v2.2.3

Additional context

No response

Code of Conduct

Would you like to implement a fix?

No

stefanprodan commented 4 weeks ago

We have implemented reties for Git operations but looks like they are implemented only on Update and Delete https://github.com/fluxcd/terraform-provider-flux/pull/436

Are you getting the non-fast-forward error for newly provisioned clusters that don't have a directory in Git?

gionn commented 4 weeks ago

Are you getting the non-fast-forward error for newly provisioned clusters that don't have a directory in Git?

oh yes, those clusters are getting their dir wiped during the destroy process, so on the following creation there is no directory.

Thanks for the prompt reply 🙏🏻

gionn commented 3 weeks ago

This morning I've got a slightly different error message for the failed push but looks like is still the same issue as before.

╷
│ Error: Bootstrap run error
│ 
│   with module.flux2[0].flux_bootstrap_git.this,
│   on ../../modules/flux2/main.tf line 1, in resource "flux_bootstrap_git" "this":
│    1: resource "flux_bootstrap_git" "this" {
│ 
│ failed to push manifests: failed to push to remote: command error on
│ refs/heads/main: cannot lock ref 'refs/heads/main': is at
│ 91b665917e0b80eeeeca1ee8a866cb2a4b7f755e but expected
│ 9e324f9e38778a8bf8ff6bf46cf56c3645ca3bde