flux-iac / tofu-controller

A GitOps OpenTofu and Terraform controller for Flux
https://flux-iac.github.io/tofu-controller/
Apache License 2.0
1.32k stars 139 forks source link

The controller should not perform drift detection if it's triggered by a planning request #890

Open tao-zhang-shell opened 1 year ago

tao-zhang-shell commented 1 year ago

User story As a TF-Controller user, I want to be able to run planning requests in manual approval mode without the controller triggering drift detection, so that my process does not get stuck in a loop or provide misleading feedback.

ACs


Manual approval is enabled, and the initial plan is approved and applied successfully.

Now the terraform code changes a little bit, it requires a small update. But I cannot find a new plan ID. Then how to approve the update after the initial plan is applied? I don't find this in the documentation either.

chanwit commented 1 year ago

Hi @tao-zhang-shell Have you tried tfctl replan to help triggering the re-plan process? Then, you would see the message in the object telling you how to approve the plan, similar to the previous apply process.

btw, we have a new planning system called the Branch Planner which has been released recently. Please feel free to give it a try.

tao-zhang-shell commented 1 year ago

Thank you for your response @chanwit !

I downloaded tfctl, unfortunately, MacOS blocks it because it is not from an identified developer. Even if I allow it anyway, it's still not runnable.

Actually, after the Custom Resource Terraform object is updated (only input vars are changed, no change to the Terraform module), the correct TF update message is shown up in the object. For example:

    - lastTransitionTime: '2023-08-22T15:18:36Z'
      message: >

        Terraform used the selected providers to generate the following
        execution

        plan. Resource actions are indicated with the following symbols:
          ~ update in-place

        Terraform will perform the following actions:

          # module.kafka_topic["test"].confluent_kafka_topic.kafka_topic will be updated in-place
          ~ resource "confluent_kafka_topic" "kafka_topic" {
              ~ config           = {
                  + "cleanup.policy" = "compact"
                  + "segment.ms"     = "604900000"
                }
                id               = "Kafka-id"
                # (2 unchanged attributes hidden)

                # (1 unchanged block hidden)
            }

        Plan: 0 to add, 1 to change, 0 to destroy. 

The plan is good, just want to know how to apply this update. As I said, there is no new plan ID for this update.

tao-zhang-shell commented 1 year ago

Hi @chanwit, I did more tests and found some interesting behaviours. Two scenarios:

Here is my guess. The new plan ID is re-generated only when the referenced TF module has some change. This of course makes sense. But if the values of input variables change, the new plan ID should be re-generated as well. Then it might not be good to use the TF module commit hash to form the plan ID.

chanwit commented 1 year ago

But if the values of input variables change, the new plan ID should be re-generated as well. Then it might not be good to use the TF module commit hash to form the plan ID.

In the current design, a plan ID is computed from a Git commit, yes. Maybe we need to come up with other ways to compute plan ids. For example, adding object generation as suffixes, etc.

tao-zhang-shell commented 1 year ago

Hi @chanwit,

I can apply the example at https://github.com/tf-controller/branch-planner-demo successfully when I manually approve the plan with the plan ID. But when I change the value of subject from default World to new-world and redeploy this Terraform object(via kubectl apply -f ), there is no new plan ID generated. Then I have no way to apply this change.

I also tried Branch Planner today. Apparently, it only watches the changes (PR in this case) of the TF module git repo. It doesn't watch the input variable value change either.

Do you have any plans to fix this? Thanks!

apiVersion: infra.contrib.fluxcd.io/v1alpha2
kind: Terraform
metadata:
  name: branch-planner-demo
  namespace: flux-system
spec:

  vars:
    - name: subject
      value: new-world
chanwit commented 1 year ago

A workaround would be putting Terraform CR above in a Git repo too. It's a common pattern to bootstrap Flux and having both Terraform CR and Terraform files in the same Git repo (but in different directories).

tao-zhang-shell commented 1 year ago

Hi @chanwit, I tried the workaround moving the Terraform CR to the same git repo as the Terraform module. You can find the forked git repo at https://github.com/tao-zhang-shell/branch-planner-demo.

However, it didn't work either when I pushed a new commit to change the input variable value. You can find the Terraform CR status after the commit is pushed.

I also tried to patch the CR with the new commit SHA approvePlan: plan-main-d55857071e, but it didn't work. So the workaround has the same behaviour as applying the change via kubectl apply -f.

status:
  availableOutputs:
    - hello_world
  conditions:
    - lastTransitionTime: '2023-08-24T12:40:12Z'
      message: >

        Terraform used the selected providers to generate the following
        execution

        plan. Resource actions are indicated with the following symbols:

        -/+ destroy and then create replacement

        Terraform will perform the following actions:

          # random_id.id must be replaced
        -/+ resource "random_id" "id" {
              ~ b64_std     = "lvErSw==" -> (known after apply)
              ~ b64_url     = "lvErSw" -> (known after apply)
              ~ dec         = "2532387659" -> (known after apply)
              ~ hex         = "96f12b4b" -> (known after apply)
              ~ id          = "lvErSw" -> (known after apply)
              ~ keepers     = { # forces replacement
                  ~ "trigger" = "New-world" -> "change-world"
                }
                # (1 unchanged attribute hidden)
            }

        Plan: 1 to add, 0 to change, 1 to destroy.

        Changes to Outputs:
          ~ hello_world = "Hello TF Controller v0.16.0-rc.2, New-world!" -> "Hello TF Controller v0.16.0-rc.2, change-world!"
      reason: DriftDetected
      status: 'False'
      type: Ready
    - lastTransitionTime: '2023-08-24T12:00:54Z'
      message: Plan no changes
      reason: TerraformPlannedNoChanges
      status: 'False'
      type: Plan
    - lastTransitionTime: '2023-08-24T11:58:27Z'
      message: Applied successfully
      reason: TerraformAppliedSucceed
      status: 'True'
      type: Apply
    - lastTransitionTime: '2023-08-24T11:58:27Z'
      message: Outputs available
      reason: TerraformOutputsAvailable
      status: 'True'
      type: Output
  lastAppliedRevision: main@sha1:5cbf6f02c736e008b03ceef03755de1185ee530e
  lastAttemptedRevision: main@sha1:d55857071e4a679c45a5bfa58a3de19a3ee516f0
  lastDriftDetectedAt: '2023-08-24T12:40:12Z'
  lastPlanAt: '2023-08-24T12:00:54Z'
  lastPlannedRevision: main@sha1:d55857071e4a679c45a5bfa58a3de19a3ee516f0
  observedGeneration: 4
  plan:
    lastApplied: plan-main-5cbf6f02c7
chanwit commented 1 year ago

It worked without problem in my setup. Here's the repo I tested: https://github.com/tf-controller/manual-approve-test/commits/main, where you could examine the commit logs. TF-Controller v0.16.0-rc.2 was used in the test.

I'll write down a quick tutorial to use the manual approval mode based on the above setup.

tao-zhang-shell commented 1 year ago

@chanwit I really appreciate your help, but still no luck for me. The following are the TF controller logs, no plan ID like plan-main-xxxx is found. I have seen it in the logs if it's the initial plan/apply.

By the way, I also used the v0.16.0-rc.2 version.

{"level":"info","ts":"2023-08-24T15:05:22.782Z","msg":">> Started Generation: 3","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:22.782Z","msg":"getting source","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:22.782Z","msg":"before lookup runner: checking ready condition","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","ready":"&Condition{Type:Ready,Status:False,ObservedGeneration:0,LastTransitionTime:2023-08-24 15:05:17 +0000 UTC,Reason:DriftDetected,Message:\nTerraform used the selected providers to generate the following execution\nplan. Resource actions are indicated with the following symbols:\n-/+ destroy and then create replacement\n\nTerraform will perform the following actions:\n\n  # random_id.id must be replaced\n-/+ resource \"random_id\" \"id\" {\n      ~ b64_std     = \"oG0rKA==\" -> (known after apply)\n      ~ b64_url     = \"oG0rKA\" -> (known after apply)\n      ~ dec         = \"2691509032\" -> (known after apply)\n      ~ hex         = \"a06d2b28\" -> (known after apply)\n      ~ id          = \"oG0rKA\" -> (known after apply)\n      ~ keepers     = { # forces replacement\n          ~ \"trigger\" = \"New-world\" -> \"Another-world\"\n        }\n        # (1 unchanged attribute hidden)\n    }\n\nPlan: 1 to add, 0 to change, 1 to destroy.\n\nChanges to Outputs:\n  ~ hello_world = \"Hello TF Controller v0.16.0-rc.2, New-world!\" -> \"Hello TF Controller v0.16.0-rc.2, Another-world!\"\n,}"}
{"level":"info","ts":"2023-08-24T15:05:22.782Z","msg":"before lookup runner: updating status","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","ready":"&Condition{Type:Ready,Status:False,ObservedGeneration:0,LastTransitionTime:2023-08-24 15:05:17 +0000 UTC,Reason:DriftDetected,Message:\nTerraform used the selected providers to generate the following execution\nplan. Resource actions are indicated with the following symbols:\n-/+ destroy and then create replacement\n\nTerraform will perform the following actions:\n\n  # random_id.id must be replaced\n-/+ resource \"random_id\" \"id\" {\n      ~ b64_std     = \"oG0rKA==\" -> (known after apply)\n      ~ b64_url     = \"oG0rKA\" -> (known after apply)\n      ~ dec         = \"2691509032\" -> (known after apply)\n      ~ hex         = \"a06d2b28\" -> (known after apply)\n      ~ id          = \"oG0rKA\" -> (known after apply)\n      ~ keepers     = { # forces replacement\n          ~ \"trigger\" = \"New-world\" -> \"Another-world\"\n        }\n        # (1 unchanged attribute hidden)\n    }\n\nPlan: 1 to add, 0 to change, 1 to destroy.\n\nChanges to Outputs:\n  ~ hello_world = \"Hello TF Controller v0.16.0-rc.2, New-world!\" -> \"Hello TF Controller v0.16.0-rc.2, Another-world!\"\n,}"}
{"level":"info","ts":"2023-08-24T15:05:22.788Z","msg":"before lookup runner: updated status","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","ready":"&Condition{Type:Ready,Status:Unknown,ObservedGeneration:0,LastTransitionTime:2023-08-24 15:05:22.782292929 +0000 UTC m=+3878.063992515,Reason:Progressing,Message:Reconciliation in progress,}"}
{"level":"info","ts":"2023-08-24T15:05:22.788Z","msg":"trigger namespace tls secret generation","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:22.788Z","logger":"cert-rotation","msg":"TLS already generated for ","namespace":"flux-system"}
{"level":"info","ts":"2023-08-24T15:05:22.788Z","msg":"show runner pod state: ","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","name":"branch-planner-demo","state":"not-found"}
{"level":"info","ts":"2023-08-24T15:05:37.839Z","msg":"runner is running","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.839Z","msg":"setting up terraform","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.853Z","msg":"write backend config: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.853Z","msg":"new terraform","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","workingDir":"/tmp/flux-system-branch-planner-demo"}
{"level":"info","ts":"2023-08-24T15:05:37.857Z","msg":"generate vars from tf: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.858Z","msg":"generated var files from spec","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.858Z","msg":"generate template: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.858Z","msg":"generated template","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.415Z","msg":"init reply: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.415Z","msg":"tfexec initialized terraform","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.415Z","msg":"workspace select reply: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.415Z","msg":"calling detectDrift ...","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.667Z","msg":"plan for drift: ok found drift: true","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.742Z","msg":"show plan: \nTerraform used the selected providers to generate the following execution\nplan. Resource actions are indicated with the following symbols:\n-/+ destroy and then create replacement\n\nTerraform will perform the following actions:\n\n  # random_id.id must be replaced\n-/+ resource \"random_id\" \"id\" {\n      ~ b64_std     = \"oG0rKA==\" -> (known after apply)\n      ~ b64_url     = \"oG0rKA\" -> (known after apply)\n      ~ dec         = \"2691509032\" -> (known after apply)\n      ~ hex         = \"a06d2b28\" -> (known after apply)\n      ~ id          = \"oG0rKA\" -> (known after apply)\n      ~ keepers     = { # forces replacement\n          ~ \"trigger\" = \"New-world\" -> \"Another-world\"\n        }\n        # (1 unchanged attribute hidden)\n    }\n\nPlan: 1 to add, 0 to change, 1 to destroy.\n\nChanges to Outputs:\n  ~ hello_world = \"Hello TF Controller v0.16.0-rc.2, New-world!\" -> \"Hello TF Controller v0.16.0-rc.2, Another-world!\"\n","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"error","ts":"2023-08-24T15:05:39.744Z","msg":"will not force / auto apply detected drift","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","error":"DriftDetected"}
{"level":"info","ts":"2023-08-24T15:05:39.748Z","msg":"clean up dir: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"error","ts":"2023-08-24T15:05:39.754Z","msg":"Drift detected after 16.971997007s, next try in 15s","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","revision":"main@sha1:9c5447cfa09ee401b2e848127d51e43a3be79048","error":"DriftDetected"}
{"level":"info","ts":"2023-08-24T15:05:44.758Z","msg":">> Started Generation: 3","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"b49c534b-4b99-4124-9284-abdb02d1148f","reconciliation-loop-id":"9d84ca3d-fc50-4e56-b70a-be94b8c67c8f","start-time":"2023-08-24T15:05:44.758Z"}
{"level":"info","ts":"2023-08-24T15:05:44.758Z","msg":"getting source","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"b49c534b-4b99-4124-9284-abdb02d1148f","reconciliation-loop-id":"9d84ca3d-fc50-4e56-b70a-be94b8c67c8f","start-time":"2023-08-24T15:05:44.758Z"}
chanwit commented 1 year ago

Found the problem. You used the random ID generator, which caused a drift (reason: DriftDetected). As a result, the controller never reached the planning step of the second round.

    - lastTransitionTime: '2023-08-24T12:40:12Z'
      message: >

        Terraform used the selected providers to generate the following
        execution

        plan. Resource actions are indicated with the following symbols:

        -/+ destroy and then create replacement

        Terraform will perform the following actions:

          # random_id.id must be replaced
        -/+ resource "random_id" "id" {
              ~ b64_std     = "lvErSw==" -> (known after apply)
              ~ b64_url     = "lvErSw" -> (known after apply)
              ~ dec         = "2532387659" -> (known after apply)
              ~ hex         = "96f12b4b" -> (known after apply)
              ~ id          = "lvErSw" -> (known after apply)
              ~ keepers     = { # forces replacement
                  ~ "trigger" = "New-world" -> "change-world"
                }
                # (1 unchanged attribute hidden)
            }

        Plan: 1 to add, 0 to change, 1 to destroy.

        Changes to Outputs:
          ~ hello_world = "Hello TF Controller v0.16.0-rc.2, New-world!" -> "Hello TF Controller v0.16.0-rc.2, change-world!"
      reason: DriftDetected
      status: 'False'
      type: Ready
tao-zhang-shell commented 1 year ago

Do you mean the TF controller doesn't support TF resource random_id? Or something else? I don't quite get your point. @chanwit

Just like the message shows, the controller actually made the correct plan.

chanwit commented 1 year ago

We have the concept of "drift detection" implemented.

When a drift on a live system occurs, it means that a resource is changed by something else, and not by what we described in our desired states.

The use of your random_id demonstrated the same behavior as a live system having drifts.

tao-zhang-shell commented 1 year ago

There is no any other resource than the random_id in my test TF project. As you can understand, nothing can change this random_id from live system.

I know what drift means in Terraform. But there is nothing that can change this random_id, which drifts from the backend state. Actually, before I changed and pushed the variable value to the git repo, the Terraform CR showed no drift at all.

tao-zhang-shell commented 1 year ago

Also one suggestion, can you please add one resource to your test project and try it again? I see currently in your test project, there is no TF resource created. Only one output is defined. That's why I added this random_id resource.

Maye for any resource update, TF controller mistakenly regards it as a drift?

chanwit commented 1 year ago

Yep, I confirmed that it's definitely a bug in the manual approval mode. It's working fine for me with approvePlan: auto, but causing the same error you're facing with the manual approve.

chanwit commented 1 year ago

I narrowed down the problem and will tackle this issue as a bug fix.

chanwit commented 1 year ago

https://github.com/weaveworks/tf-controller/blob/eb4c3adb49f9cd8b2e369b42d65dd516ec20d9fd/controllers/tf_controller_drift_detect.go#L15-L53

A fix can be done by adding a criterion in the above code to:

  1. detect if we are in the manual mode
  2. if we are in the manual mode and the intent is to plan + apply, do not perform drift detection.
tao-zhang-shell commented 1 year ago

Thank you very much for the confirmation @chanwit, I will be looking forward to the bug fix!

tao-zhang-shell commented 1 year ago

@chanwit @LappleApple Do you have any estimation of when this bug will be fixed? Thanks!

yitsushi commented 1 year ago

Related tests were disabled because they were flaky, now I tried to fix and enable them. Two related tests are still disabled^1, please enable them as part of the fix for this issue.

yitsushi commented 1 year ago

As it was flaky and not failing all the time when it was disabled, we can assume it was introduced after we disabled those tests, I'll try to git bisect and find the commit that introduced this bug.

yitsushi commented 1 year ago

git bisect says this commit is the first bad: https://github.com/weaveworks/tf-controller/commit/374232a60fa89067661d47cfafaa1420b3c7d2d6 (but I don't see why would it cause this)

yitsushi commented 1 year ago

After a bit of poking, if I remove all changed from the commit, except the patcher := ... and the patcher.Patch call (so the finalizeStatus does literally nothing, but patches the "expected nothing"), it still causes an issue.

If I keep everything from the commit, but move the patcher := line into the defer function, tests are happy.

I assume it some weird issue how the whole codebase tosses around the &terraform reference and made Patch calls all around the codebase.

gberenice commented 1 year ago

Hey! @tao-zhang-shell could you please confirm that the issue is fixed for you? I'm facing a similar problem on v0.16.0-rc.3 now. UPD: my issue probably relates to another root cause, but still would be great to confirm that this one is fixed.

gberenice commented 1 year ago

@chanwit after updating to v0.16.0-rc.3, I still see the similar behavior that is described in the issue. I use GitRepo source for my Terraform object, and this is a tag for a community module (terraform-aws-ecr v0.38.0). When I change spec.vars for the corresponding Terraform object, there is no plan ID generated. Some related details from the object:

  approvePlan: ""
    message: Plan no changes
    reason: TerraformPlannedNoChanges
  lastAppliedByDriftDetectionAt: "2023-09-20T18:14:33Z"
  lastAppliedRevision: 0.38.0@sha1:862fc85490c01d3c0f394e64c3cb025c444bdada
  lastAttemptedRevision: 0.38.0@sha1:862fc85490c01d3c0f394e64c3cb025c444bdada
  lastDriftDetectedAt: "2023-09-26T15:30:50Z"
  lastHandledReconcileAt: "2023-09-21T16:53:32.303215+03:00"
  lastPlanAt: "2023-09-21T13:54:08Z"
  lastPlannedRevision: 0.38.0@sha1:862fc85490c01d3c0f394e64c3cb025c444bdada
  plan:
    lastApplied: plan-0.38.0-862fc85490

UPD (2 Oct): I observe the same behavior for sourceRef kind OCIRepository.

chanwit commented 1 year ago

@LappleApple @yitsushi

tao-zhang-shell commented 1 year ago

@gberenice I haven't tested the new RC yet.

lasomethingsomething commented 1 year ago

Reopened in light of @gberenice's new comments.

RishuSinghS commented 1 year ago

@chanwit @luizbafilho @yiannistri could you please provide update on this on-going issue for avoiding drift detection when using manual mode?

Does this PR - https://github.com/weaveworks/tf-controller/pull/1076 solve the issue? Could you please review it and merge to main?