aiven / terraform-provider-aiven

Aiven Terraform Provider
https://registry.terraform.io/providers/aiven/aiven/latest/docs
MIT License
126 stars 69 forks source link

Change to aiven_mirrormaker_replication_flow resource affects unspecified optional fields #1632

Closed nickgooding closed 5 months ago

nickgooding commented 6 months ago

What happened?

Updating attribute(s) for a aiven_mirrormaker_replication_flow resource appears to be able to affect unrelated configuration.

Example:

  1. Given the following resource:

    resource "aiven_mirrormaker_replication_flow" "this" {
    project        = "example-project"
    service_name   = "mirrormaker"
    source_cluster = "source-cluster"
    target_cluster = "target-cluster"
    enable         = true
    
    topics = [
    "example-topic",
    ]
    }

Aiven Console view before: Screenshot from 2024-03-05 14-33-58

  1. Make the following change and re-apply

    --- a/main.tf
    +++ b/main.tf
    @@ -7,6 +7,6 @@ resource "aiven_mirrormaker_replication_flow" "this" {
    
    topics = [
    "example-topic",
    +   "another-topic",
    ]
    }

Plan:

$ terraform apply
aiven_mirrormaker_replication_flow.this: Refreshing state... [id=example-project/mirrormaker/source-cluster/target-cluster]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aiven_mirrormaker_replication_flow.this will be updated in-place
  ~ resource "aiven_mirrormaker_replication_flow" "this" {
        id                                  = "example-project/mirrormaker/source-cluster/target-cluster"
      ~ topics                              = [
            "example-topic",
          + "another-topic",
        ]
        # (11 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aiven_mirrormaker_replication_flow.this: Modifying... [id=example-project/mirrormaker/source-cluster/target-cluster]
aiven_mirrormaker_replication_flow.this: Modifications complete after 1s [id=example-project/mirrormaker/source-cluster/target-cluster]

Aiven Console view after: Screenshot from 2024-03-05 14-34-29

Note that "Offset syncs topic location" has changed from "target" to "source" and "Emit heartbeats enabled" has toggled, despite neither being present in the resource or plan.

I haven't been able to reproduce this issue if offset_syncs_topic_location and emit_heartbeats_enabled are explicitly set for the resource. Edit: I haven't been able to reproduce the issue for offset_syncs_topic_location if it is explicitly set for the resource. Unfortunately I've learned the hard way emit_heartbeats_enabled can switch itself back on.

What did you expect to happen?

another-topic is added to the replication flow topic config and no other changes are made

What else do we need to know?

Using v4.14.0 of the provider.

nickgooding commented 6 months ago

Another possibly related issue: it does not appear to be possible to create a aiven_mirrormaker_replication_flow with emit_heartbeats_enabled set to false.

Plan:

$ terraform apply

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aiven_mirrormaker_replication_flow.this will be created
  + resource "aiven_mirrormaker_replication_flow" "this" {
      + emit_backward_heartbeats_enabled    = false
      + emit_heartbeats_enabled             = false
      + enable                              = true
      + id                                  = (known after apply)
      + project                             = "example-project"
      + replication_policy_class            = "org.apache.kafka.connect.mirror.DefaultReplicationPolicy"
      + service_name                        = "mirrormaker"
      + source_cluster                      = "source-cluster"
      + sync_group_offsets_enabled          = false
      + sync_group_offsets_interval_seconds = 1
      + target_cluster                      = "target-cluster"
      + topics                              = [
          + "example-topic",
        ]
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aiven_mirrormaker_replication_flow.this: Creating...
aiven_mirrormaker_replication_flow.this: Creation complete after 1s [id=example-project/mirrormaker/source-cluster/target-cluster]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

API response:

{
    "replication_flows": [
        {
            "enabled": true,
            "replication_policy_class": "org.apache.kafka.connect.mirror.DefaultReplicationPolicy",
            "source_cluster": "source-cluster",
            "sync_group_offsets_interval_seconds": 1,
            "target_cluster": "target-cluster",
            "topics": [
                "example-topic"
            ]
        }
    ]
}

(emit_heartbeats_enabled is not present)

Aiven console view: Screenshot from 2024-03-05 14-53-45

It's hard to know how to interpret the missing field in the API response and in the console it could just be a UI bug, but it appears that this does result in the enabling of heartbeats

Serpentiel commented 6 months ago

Hey, @nickgooding! 👋

Thank you for raising this issue! Our team will check things out, and we will release a fix shortly.

Serpentiel commented 5 months ago

Looks like this is related to incorrect API serialization as in #1521.

Serpentiel commented 5 months ago

Hey, @nickgooding! 👋

I've identified additional problematic fields, besides emit_heartbeats_enabled, and prepared a fix for them:

offset_syncs_topic_location is a different story. There is a problem with it being an optional field instead of being a required one, but it's actually quite interesting why the change happens.

Do I understand correctly that you also use Aiven Console or some other tool to manage the resource? I would like to remind you that when using IaaC tools such as Terraform, it's important to manage the state of the resource from a single "source of truth", i.e. nothing else should be used to manage the same resource besides the IaaC tool that you're using.

IaaC tools are designed in a way that makes them override the state of the resource if it's different from what the tool expects it to be, here's what exactly happens to the field in question:

While this problem has occurred because the field was incorrectly defined as an optional one, we still recommend not to modify the state of any resource when using an IaaC solution. This is the best practice and a general "rule of thumb" recommendation to follow.

That said, I've prepared PRs that should get merged before a fix can be released:

Sorry for quite a long process on our side. I hope this information helps you understand why changes like that might occur, too 🙂

I will be closing this issue for now, but please let me know if there are any other questions or concerns. We will try to release the fix ASAP, but in case there is an urgent need, you are welcome to try building the provider yourself.

In the meantime, I would recommend starting to manage the offset_syncs_topic_location field via Terraform.

nickgooding commented 5 months ago

Hi, thanks for the fix.

There appears to still be an oversight with offset_syncs_topic_location that means the provider doesn't update the TF state based on the remote resource, so won't correctly update in the case of configuration drift. I think something like the below will fix it, but I haven't had a chance to test it thoroughly.

diff --git a/internal/sdkprovider/service/kafka/mirrormaker_replication_flow.go b/internal/sdkprovider/service/kafka/mirrormaker_replication_flow.go
index 3bc30f73..1e5b9feb 100644
--- a/internal/sdkprovider/service/kafka/mirrormaker_replication_flow.go
+++ b/internal/sdkprovider/service/kafka/mirrormaker_replication_flow.go
@@ -213,6 +213,13 @@ func resourceMirrorMakerReplicationFlowRead(ctx context.Context, d *schema.Resou
                return diag.FromErr(err)
        }

+       if err := d.Set(
+               "offset_syncs_topic_location",
+               replicationFlow.ReplicationFlow.OffsetSyncsTopicLocation,
+       ); err != nil {
+               return diag.FromErr(err)
+       }
+
        return nil
 }
nickgooding commented 5 months ago

IaaC tools are designed in a way that makes them override the state of the resource if it's different from what the tool expects it to be, here's what exactly happens to the field in question: [...]

  • Terraform has an understanding of it as if the field wasn't set at all, because it's optional, thus it ignores the field when forging a request to our API

This is a result of the bug mentioned in my previous comment rather than an inherent limitation of Terraform. As part of generating a plan, the provider should refresh the state, detect that configuration drift has occurred, and then change the field to match configuration explicitly as part of the plan.

Serpentiel commented 5 months ago

Well, yes, that's true. The problem occurred because the field was incorrectly marked as optional instead of being a required one.

Terraform doesn't shows a diff, by design, when an optional field was changed externally, because it's not tracked in the state, and Terraform then is not able to determine if it ever changed, because it was recorded as null, because it's optional and was not set in the manifest, and it's not throwing a diff on any change, because with this logic, any value would then produce a potential change, e.g:

And null appears there because the field is optional and is not set in the manifest; in reality it is a required field, because the flow requires either source or target to be set there, which causes the API to assume lack of this field in the request to the default source value.

https://github.com/aiven/terraform-provider-aiven/pull/1651 should change it to be a required field instead, which should also fix the problem.

TL;DR. This bug is a combination of the API behavior when this field is missing from a request (it assumes it to source), and the way Terraform behaves with optional fields.

nickgooding commented 5 months ago

Terraform doesn't shows a diff, by design, when an optional field was changed externally, because it's not tracked in the state

I think you have either misunderstood my follow-up comment, which has nothing to do with the resource argument being optional, or are mistaken. There is a bug in resourceMirrorMakerReplicationFlowRead even accounting for the changes in your PR.

Using another optional aiven_mirrormaker_replication_flow argument to demonstrate:

  1. Apply the following resource config:

    resource "aiven_mirrormaker_replication_flow" "this" {
    project        = "example-project"
    service_name   = "mirrormaker"
    source_cluster = "source-cluster"
    target_cluster = "target-cluster"
    enable         = true
    
    topics = [
    "example-topic",
    ]
    offset_syncs_topic_location = "source"
    }
  2. Deliberately introduce configuration drift
    FLOW_CONFIG=$(
    curl --silent \
    --url "https://api.aiven.io/v1/project/${PROJECT_NAME}/service/${SERVICE_NAME}/mirrormaker/replication-flows/${SOURCE_CLUSTER}/${TARGET_CLUSTER}" \
    --header "Authorization: Bearer ${AIVEN_API_KEY}" \
    | jq '.replication_flow + {"offset_syncs_topic_location": "target", "topics": ["another-topic"]}  | del(.source_cluster, .target_cluster, .replication_progress)'
    )
    curl --request PUT \
    --url "https://api.aiven.io/v1/project/${PROJECT_NAME}/service/${SERVICE_NAME}/mirrormaker/replication-flows/${SOURCE_CLUSTER}/${TARGET_CLUSTER}" \
    --header "Authorization: Bearer ${AIVEN_API_KEY}" \
    --header 'content-type: application/json' \
    --data "${FLOW_CONFIG}"
  3. Refresh the Terraform state. Note that topics has updated to reflect the configuration drift, but offset_syncs_topic_location has not
    $ terraform refresh
    aiven_mirrormaker_replication_flow.this: Refreshing state... [id=infrastructure-sre-sandbox/mirrormaker-sandbox/kaluza-nonprod/kafka-sandbox]
    $ terraform state show aiven_mirrormaker_replication_flow.this
    # aiven_mirrormaker_replication_flow.this:
    resource "aiven_mirrormaker_replication_flow" "this" {
    emit_backward_heartbeats_enabled    = false
    emit_heartbeats_enabled             = false
    enable                              = true
    id                                  = "example-project/mirrormaker/source-cluster/target-cluster"
    offset_syncs_topic_location         = "source"
    project                             = "example-project"
    replication_policy_class            = "org.apache.kafka.connect.mirror.DefaultReplicationPolicy"
    service_name                        = "mirrormaker"
    source_cluster                      = "source-cluster"
    sync_group_offsets_enabled          = false
    sync_group_offsets_interval_seconds = 1
    target_cluster                      = "target_cluster"
    topics                              = [
        "another-topic",
    ]
    topics_blacklist                    = []
    }

The result of this issue is that a terraform apply will not correct configuration drift in the case where only the offset_syncs_topic_location attribute is affected.

I am happy to go into more detail if this is still unclear.

Serpentiel commented 5 months ago

Hey again, @nickgooding! 👋

Thank you for additional clarification. Yes, as I can see, there was a missing d.Set call, sorry for any confusion earlier and an oversight from my part, I though the diff that you provided was copied from my PR.

You are correct, I've updated the PR to add the missing call. I still am going to keep the field as required, because I believe it makes more sense since this value can't be empty.

Should you have any further questions or any additional information to share, please let me know 🙏