clowdhaus / terraform-aws-eks-migrate-v19-to-v20

What it says on the tin
5 stars 3 forks source link

ConfigMap can be destroyed after it was created with submodule. #3

Open ajvn opened 8 months ago

ajvn commented 8 months ago

Describe the bug During step 6 of this guide, race condition (not sure if it classifies as such, but it does exhibit certain characteristics of one in this context) might happen where content of aws-auth configmap will get deleted after it was created via sub-module.

Output will look like this:

module.eks.kubernetes_config_map_v1_data.aws_auth[0]: Destroying... [id=kube-system/aws-auth]
module.eks_aws_auth.kubernetes_config_map_v1_data.aws_auth[0]: Creating...
module.eks_aws_auth.kubernetes_config_map_v1_data.aws_auth[0]: Creation complete after 1s [id=kube-system/aws-auth]
module.eks.kubernetes_config_map_v1_data.aws_auth[0]: Destruction complete after 1s

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

As both of those resources point to the same configmap, this is problematic. It essentially removes access from the cluster for all the groups, but those already in access entries (luckily AWSAdmin is part of it).

In order to restore access, backup auth-aws configmap in kube-system namespace, remove bootstrappers and re-apply it via kubectl, or do re-apply via Terraform. If your Terraform access also depends on this config, then you'll have to re-apply backed up YAML.

I'm not sure if there's anything you can do on the module side, but it would be good to mention that people should take backup of aws-auth configmap before starting this procedure.

To Reproduce Steps to reproduce the behavior: It can happen during step 6. It didn't happen in 2 clusters, as destruction happened before recreation, but it happened in third one.

Expected behavior I'd expect that old resource is always removed first, before it's recreated by sub-module.

Screenshots N/A

Desktop (please complete the following information): N/A

Smartphone (please complete the following information): N/A

Additional context Atlantis with v1.7.4 of Terraform is in use, but I assume it can happen by just using Terraform as well.