flux-iac / tofu-controller

A GitOps OpenTofu and Terraform controller for Flux
https://flux-iac.github.io/tofu-controller/
Apache License 2.0
1.3k stars 137 forks source link

My terraform resource is unknown and initializing status #1148

Open WangLinX opened 11 months ago

WangLinX commented 11 months ago

I think "detect drifts only mode" requires a terraform plan, otherwise it cannot compare the differences between online and terraform.tfstate file in oss. In the document, "detect drifts only mode" prompts that Terraform plan and apply will not be executed, so how did it perform configuration dirft detects?

I created an application on Flamingo. I set approvePlan: disable because I used it for drift checking. But my "wlx-terraform-share" object has always been in an initializing state. I manually modified the names of resources on the cloud, but it did not detect drift, which made me suspect that there might be a logical problem in this area.

The scenario I want to implement is that I manually modify the resouces on the cloud, Flamingo or tf-controller can detect the configuration drift.

Could you please help me take a look at this issue? I would greatly appreciate it.

TF controller version - v0.15.1 Flamingo - v2.8.4 flux - v2.1.2

apiVersion: infra.contrib.fluxcd.io/v1alpha2

kind: Terraform

metadata:

  annotations:

    reconcile.fluxcd.io/requestedAt: "2023-11-27T16:56:30.513899852+08:00"

  creationTimestamp: "2023-11-27T07:07:21Z"

  finalizers:

  - finalizers.tf.contrib.fluxcd.io

  generation: 4

  labels:

    kustomize.toolkit.fluxcd.io/name: wlx-terraform-share

    kustomize.toolkit.fluxcd.io/namespace: wlx-terraform-share

  name: wlx-terraform-share

namespace: wlx-terraform-share

  resourceVersion: "1658364114"

  uid: e7ecd57a-c808-4049-a964-bf77e4288c63

spec:

  alwaysCleanupRunnerPod: true

  approvePlan: disable

  backendConfig:

    customConfiguration: |

      backend "oss" {

        bucket              = "atc-cicd"

        prefix              = "terraform/wlx-test-share"

        key                 = "./terraform.tfstate"

        acl                 = "private"

        region              = "cn-beijing"

        encrypt             = "true"

        tablestore_endpoint = https://atc-cicd.cn-beijing.ots.aliyuncs.com/

        tablestore_table    = "terraform_remote_backend_lock_table_2879cd4b_abfd_567c_48de_e7c4be64bd02"

      }

  destroyResourcesOnDeletion: false

  disableDriftDetection: false

  force: false

  interval: 5m

  parallelism: 0

  path: ./terraform

  refreshBeforeApply: false

  runnerPodTemplate:

    spec:

      env:

      - name: ALICLOUD_ACCESS_KEY

        valueFrom:

          secretKeyRef:

            key: ALICLOUD_ACCESS_KEY

            name: alicloud-atc-terraform-id-key

      - name: ALICLOUD_SECRET_KEY

        valueFrom:

          secretKeyRef:

            key: ALICLOUD_SECRET_KEY

            name: alicloud-atc-terraform-id-key

      image: ghcr.io/weaveworks/tf-runner:v0.15.1

  runnerTerminationGracePeriodSeconds: 30

  serviceAccountName: tf-runner

  sourceRef:

    kind: GitRepository

    name: wlx-terraform-share

  storeReadablePlan: human

  workspace: default

  writeOutputsToSecret:

    name: wlx-terraform-outputs

status:

  conditions:

  - lastTransitionTime: "2023-11-27T08:56:30Z"

    message: Initializing

    reason: Progressing

    status: Unknown

    type: Ready

tf-controller log

{"level":"info","ts":"2023-11-27T13:17:28.979Z","msg":"before lookup runner: checking ready condition","controller":"terraform","controllerGroup":"infra.contrib.fluxcd

{"level":"info","ts":"2023-11-27T13:17:28.979Z","msg":"trigger namespace tls secret generation","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","c

{"level":"info","ts":"2023-11-27T13:17:28.979Z","logger":"cert-rotation","msg":"TLS already generated for ","namespace":"wlx-terraform-share"}                        

{"level":"info","ts":"2023-11-27T13:17:28.979Z","msg":"show runner pod state: ","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"

{"level":"info","ts":"2023-11-27T13:17:44.009Z","msg":"runner is running","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraf

{"level":"info","ts":"2023-11-27T13:17:44.009Z","msg":"setting up terraform","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Ter

{"level":"info","ts":"2023-11-27T13:17:44.023Z","msg":"write backend config: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":

{"level":"info","ts":"2023-11-27T13:17:44.023Z","msg":"new terraform","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform"

{"level":"info","ts":"2023-11-27T13:17:44.028Z","msg":"generate vars from tf: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind"

{"level":"info","ts":"2023-11-27T13:17:44.028Z","msg":"generated var files from spec","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerK

{"level":"info","ts":"2023-11-27T13:17:44.028Z","msg":"generate template: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Te

{"level":"info","ts":"2023-11-27T13:17:44.028Z","msg":"generated template","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terra

{"level":"info","ts":"2023-11-27T13:17:55.413Z","msg":"init reply: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform

{"level":"info","ts":"2023-11-27T13:17:55.413Z","msg":"tfexec initialized terraform","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKi

{"level":"info","ts":"2023-11-27T13:17:55.414Z","msg":"workspace select reply: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind

{"level":"info","ts":"2023-11-27T13:17:55.414Z","msg":"approve plan disabled","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Te

{"level":"info","ts":"2023-11-27T13:17:55.464Z","msg":"clean up dir: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terrafo

{"level":"info","ts":"2023-11-27T13:17:55.474Z","msg":"Reconciliation completed. Generation: 4","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","c

{"level":"info","ts":"2023-11-27T13:17:55.474Z","msg":"requeue after interval","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":
mcortinas commented 9 months ago

I have a very similar behaviour. My environment is pretty similar, GKE v1.24.16-gke.500 + Flux v2.1.0 + tf-controller v0.16.0-rc.3. This was solved when I delete the pod of tf-controller, it seems I should apply this workaround.

thejosephstevens commented 6 months ago

Same issue here, GKE 1.28 + Flux 2.2.3 + tofu-controller v0.16.0-rc.4. Rolling the tf-controller pods worked for me to unstick it, but following that I had to manually recover a state lock (I'm using GCS for remote state), my guess is a pod died non-gracefully. I've seen this a couple times in a the past week and we're not in prod just yet, so if I can be helpful on repros let me know.

hirenko-v commented 4 months ago

We have very similar behavior on EKS with tf-controller v0.16.0-rc.4.

It happens when we add a new Terraform CRD in drift-detection-only mode (approvePlan: disable) The only workaround for us is set approvePlan: "", wait for successful reconcile, and set it back to disable