angelabad / terraform-aws-msk-cluster

Terraform module which creates Msk Kafka Cluster on AWS
https://registry.terraform.io/modules/angelabad/msk-cluster/
Apache License 2.0
30 stars 34 forks source link

Unable to upgrade MSK cluster in-place. Cluster configuration is overly strict by pinning to one kafka version. #10

Closed bradonkanyid closed 3 years ago

bradonkanyid commented 3 years ago

On 2020-05-28, AWS updated MSK to allow in-place Kafka version upgrades. This isn't possible with this module because the msk configuration resource has immutable fields that are attempted to be changed, but cannot be changed while the cluster is associated with the msk configuration.

aws_msk_configuration is being set to the specific kafka version that is passed in on initial creation, and it must be recreated when a newer supported Kafka version is passed in.

The kafka_versions list is immutable in an aws_msk_configuration, so destroy/create of the resource is the only option, but this cannot work because the cluster is still associated with the configuration, so it cannot be destroyed. create_before_destroy also doesn't work here because there's no way to orchestrate it in a way where the association of the MKS cluster to the new MSK configuration can complete before the old MSK configuration is destroyed.

I propose: 1) Not setting the kafka_versions variable to a specific version at all, by setting it to [] 2) Some flag/behavior that you can opt into that has the same effect as (1)

nsvijay04b1 commented 3 years ago

I have same issue. MSK in-place update when there is changes in msk configuration completes by creating a new revision of configuration but MSK still pinned to the revision that it was created first time. Please advise.

23:23:33    # module.msk.aws_msk_configuration.this will be updated in-place
23:23:33    ~ resource "aws_msk_configuration" "this" {
23:23:33          arn               = "arn:aws:kafka:eu-west-1:XXXXXXXXXXX:configuration/demonexus-uat-msk-2-6-1/c1307ab1-03ae-4ec5-aeXXXXXXXXX8259d-3"
23:23:33          id                = "arn:aws:kafka:eu-west-1:XXXXXXXXXX:configuration/demonexus-uat-msk-2-6-1/c1307ab1-03ae-4ec5-aeXXXXXXXXXX259d-3"
23:23:33          kafka_versions    = [
23:23:33              "2.6.1",
23:23:33          ]
23:23:33          latest_revision   = 1
23:23:33          name              = "demonexus-uat-msk-2-6-1"
23:23:33        ~ server_properties = <<~EOT
23:23:33              auto.create.topics.enable = true
23:23:33            - default.replication.factor = 2
23:23:33            + default.replication.factor = 3
23:23:33              message.max.bytes = 100000000
23:23:33          EOT
23:23:33      }

It created revision 2 for configuration after this update

Revision number Description Creation time
2 (latest) - May 18, 2021, 11:27:04 PM GMT+3
1 - May 7, 2021, 9:36:39 AM GMT+3

But, my MSK still point to revision1.

Cluster configuration

Cluster configuration ARN

arn:aws:kafka:eu-west-1:XXXXXXXXXXXX:configuration/demonexus-uat-msk-2-6-1/c1307ab1-03ae-4ec5-ae1b-XXXXXXXX-3
Cluster configuration name
demonexus-uat-msk-2-6-1
Configuration description
-
Revision
1
Revision description
nsvijay04b1 commented 3 years ago

I opened a PR-11 for this issue-10 , pending approval #https://github.com/angelabad/terraform-aws-msk-cluster/pull/11

@angelabad @bradonkanyid @victorcmoura @andormarkus

nsvijay04b1 commented 3 years ago

@angelabad @bradonkanyid @victorcmoura @andormarkus
Please approve PR

angelabad commented 3 years ago

Hi, sorry, but I cant reproduce this error, on this module I dont use configurartion versiones, because of this, I recreate the full configuration and delete prior config, In my computer if I change properties I get something like:

 # module.simple.aws_msk_cluster.this will be updated in-place
  ~ resource "aws_msk_cluster" "this" {
        id                       = "arn:aws:kafka:eu-west-2:342631421827:cluster/simple/5c07d788-791a-4239-b92f-0b3a1c47ce41-3"
        tags                     = {}
        # (10 unchanged attributes hidden)

      ~ configuration_info {
          ~ arn      = "arn:aws:kafka:eu-west-2:342631421827:configuration/simple-2940432070742172855/884bfe18-c47b-485b-85e2-82fe948ba7e4-3" -> (known after apply)
            # (1 unchanged attribute hidden)
        }

        # (3 unchanged blocks hidden)
    }

  # module.simple.aws_msk_configuration.this must be replaced
+/- resource "aws_msk_configuration" "this" {
      ~ arn               = "arn:aws:kafka:eu-west-2:342631421827:configuration/simple-2940432070742172855/884bfe18-c47b-485b-85e2-82fe948ba7e4-3" -> (known after apply)
      ~ id                = "arn:aws:kafka:eu-west-2:342631421827:configuration/simple-2940432070742172855/884bfe18-c47b-485b-85e2-82fe948ba7e4-3" -> (known after apply)
      ~ latest_revision   = 1 -> (known after apply)
      ~ name              = "simple-2940432070742172855" -> (known after apply) # forces replacement
      ~ server_properties = "default.replication.factor = 3" -> "default.replication.factor = 2"
        # (1 unchanged attribute hidden)
    }

  # module.simple.random_id.configuration must be replaced
+/- resource "random_id" "configuration" {
      ~ b64_std     = "simple-KM6DY6bA5Lc=" -> (known after apply)
      ~ b64_url     = "simple-KM6DY6bA5Lc" -> (known after apply)
      ~ dec         = "simple-2940432070742172855" -> (known after apply)
      ~ hex         = "simple-28ce8363a6c0e4b7" -> (known after apply)
      ~ id          = "KM6DY6bA5Lc" -> (known after apply)
      ~ keepers     = { # forces replacement
          ~ "server_properties" = "default.replication.factor = 3" -> "default.replication.factor = 2"
        }
        # (2 unchanged attributes hidden)
    }

And it changes the configuration info arn.

Please could you help me to understand where the error is?

Thanks for your info!

nsvijay04b1 commented 3 years ago

Hi, In my case, configuration is in-place upgrade and not re-create with some random number in name. There is fundamentally diff approach here. Just curious if you didn't face issues with this approach in in-place MSK upgrades?

here is the code diff.

image

angelabad commented 3 years ago

Hi @nsvijay04b1, @bradonkanyid I reviewed the code and because of this I create new configuration resource when it changes, because there is a bug on aws provider resource. This bug causes that changes in configuration doesnt trigger change notification, so msk doesn't know about this change. It only knows about it on next apply/refresh, you can see the bug here:

My workaround to avoid this error is to generate a random id everytime the config changes and genereate new one.

When this bug is resolved I will remove my workaround from this module. I will try to research about the provider bug.

Thanks for your contributions!

bradonkanyid commented 3 years ago

I'd request re-opening this issue, as somehow it got taken over by a different issue altogether. My issue wasn't around being able to update configuration in-place, per se, but with upgrading Kafka versions in-place.

Steps to reproduce: 1) Build an MSK cluster w/ version = 2.7.0, using this module 2) Change your version to 2.7.1 3) Try to apply

bradonkanyid commented 3 years ago

Looking over this issue again, I think the right thing to do would be to include the kafka version in the random_id keepers, so that a change to kafka_version would generate a new aws_msk_configuration. This would side-step the fact that an MSK configuration's kafka_version is immutable after creation.

bradonkanyid commented 3 years ago
  # module.msk.module.msk.aws_msk_configuration.this must be replaced
+/- resource "aws_msk_configuration" "this" {
      ~ arn               = "arn:aws:kafka:us-east-1:436196666173:configuration/msk-test-13535995669164465721/7d1eb4a5-f2dd-424e-a6ea-48ad4e71c495-17" -> (known after apply)
      ~ id                = "arn:aws:kafka:us-east-1:436196666173:configuration/msk-test-13535995669164465721/7d1eb4a5-f2dd-424e-a6ea-48ad4e71c495-17" -> (known after apply)
      ~ kafka_versions    = [ # forces replacement
          - "2.7.0",
          + "2.7.1",
        ]
      ~ latest_revision   = 1 -> (known after apply)
        name              = "msk-test-13535995669164465721"
        # (1 unchanged attribute hidden)
    }

Here's the problem. The configuration sees it needs to change, but random_id has no idea. With create_before_destroy = true, it will attempt to create a new configuration with the same name as the existing configuration, and fail.