cookiecutter-openedx / cookiecutter-openedx-devops

Open edX Tutor on Kubernetes implemented with Terraform
GNU Affero General Public License v3.0
42 stars 17 forks source link

Invalid for_each argument error on terraform-aws-modules/eks v19.4 #57

Closed Markchau closed 1 year ago

Markchau commented 1 year ago

Describe the bug During the backend build procedures, I encountered several errors with terragrunt run-all apply. The following errors are what we encountered:

│ Error: Invalid for_each argument │ │ on .terraform/modules/eks/main.tf line 97, in resource "aws_ec2_tag" "cluster_primary_security_group": │ 97: for_each = { for k, v in merge(var.tags, var.cluster_tags) : │ 98: k => v if local.create && k != "Name" && var.create_cluster_primary_security_group_tags && v != null │ 99: } │ ├──────────────── │ │ local.create is true │ │ var.cluster_tags is empty map of string │ │ var.create_cluster_primary_security_group_tags is true │ │ var.tags is map of string with 26 elements │ │ The "for_each" map includes keys derived from resource attributes that │ cannot be determined until apply, and so Terraform cannot determine the │ full set of keys that will identify the instances of this resource. │ │ When working with unknown values in for_each, it's better to define the map │ keys statically in your configuration and place apply-time results only in │ the map values. │ │ Alternatively, you could use the -target planning option to first apply │ only the resources that the for_each value depends on, and then apply a │ second time to fully converge.

│ Error: reading EKS Cluster (cluster_name): couldn't find resource │ │ with data.aws_eks_cluster.eks, │ on providers.tf line 1, in data "aws_eks_cluster" "eks": │ 1: data "aws_eks_cluster" "eks" { │

Workflow Since I am new to Terraform and Terragrunt, was wondering what might possibly going wrong here. After a few trials with release v14.1.1 and v15.0.0, I still getting the same error multiple times and fail on terragrunt run-all apply. The subsequent creations of aws resources failed with exit status 1 on the .terragrunt-cache which having the common dependency on Kubernetes.

I tried to find the potential solution but then I found the error might possibly coming from the module terraform-aws-eks v19.4

I also found a closed issue seems plausibly discussing the similar issue on the module version 19.4 https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2337

So I am wondering Is the module terraform-aws-eks v19.4 causing the error? And lead to the second error on it failed to find resource with the EKS Cluster? It

Expected behavior There should no error occur with the terragrunt commands and resources created successfully on AWS.

Additional context It will be a great gratitude and appreciation if you can help on resolving this issue. Thank you for your time to work on this.

lpm0073 commented 1 year ago

Hi mark. it looks like Terraform is struggling with initializing some of the meta data. Please try this:


cd vpc
terragrunt apply target=module.cookiecutter_meta
cd ..
terragrunt run-all apply```
Markchau commented 1 year ago

@lpm0073 Hi Mr. Lawrence, thank you so much for this suggestion. I have tried your sequence of commands, but the errors persist. I try to cd to the Kubernetes folder and run terragrunt plan, it returns the same error:

│ Error: reading EKS Cluster (the_cluster_name_we_use): couldn't find resource │ │ with data.aws_eks_cluster.eks, │ on providers.tf line 1, in data "aws_eks_cluster" "eks": │ 1: data "aws_eks_cluster" "eks" { │

Hence, I wonder is the creation failure of the EKS cluster lead to any context missing in the meta data and thus the terragrunt run-all apply will fail whatsoever. Is there any method that I could debug or what actually is going wrong here? I have no clue on this, I have tried both the terraform logs and terragrunt debug logs but not much helpful information provided. I am using Terraform v1.5.4 and terragrunt v0.48.5 installed with Homebrew.

Do you have any idea on this error? It will be grateful if you could help me out on this. Thank you.

lpm0073 commented 1 year ago

please note the relative path at the top of this screen shot. please navigate to the same relative location on your last, and let me know if you see a collection of ".state" files that should look exactly like these.

Screenshot 2023-08-07 at 7 57 18
lpm0073 commented 1 year ago

also, please try this alternative to my instructions above:

cd eks
terragrunt apply target=module.cookiecutter_meta
terragrunt apply
cd ..

This SHOULD successfully build your EKS cluster.

Markchau commented 1 year ago

@lpm0073 Hi Mr. Lawrence, hope you have a nice day, thanks for the instructions. The below screenshot is all the .state file inside the terraform/common/cookiecutter_meta/output folder.

Screenshot 2023-08-07 at 10 05 53 PM

Since there is no folder named eks that I could find, I think you are referring to the kubernetes folder, when I try to run the terragrunt apply target=module.cookiecutter_meta command inside that folder, I have encountered another error like this:

ERRO[0034] …/openedx_devops/terraform/stacks/service/kubernetes/terragrunt.hcl is a dependency of …/openedx_devops/terraform/environments/prod/kubernetes/terragrunt.hcl but detected no outputs. Either the target module has not been applied yet, or the module has no outputs. If this is expected, set the skip_outputs flag to true on the dependency block. 
ERRO[0034] Unable to determine underlying exit code, so Terragrunt will exit with error code 1

The path is quite long so I just replaced it with three dot here, the root folder of the repository is openedx_devops.

So I think I have another issue here but I don't quite understand how terraform output work cause I am still learning it. What is possibly going wrong here? Thank you for your time to help me out on this.

lpm0073 commented 1 year ago

my mistake. yes, you correctly guessed the correct path. and the message that you see also makes sense. if you run these modules individually, then you compulsorily will need to execute the first couple of steps as follows:

cd vpc

# create the VPC
terragrunt apply target=module.cookiecutter_meta
terragrunt apply

# build the EKS cluster
cd ../kubernetes
terragrunt apply target=module.cookiecutter_meta
terragrunt apply

once you get through the initial build you shouldn't run into any further Terraform technicalities.

Markchau commented 1 year ago

Hi Mr. Lawrence, thanks for your friendly instruction, I have followed the exact sequence of the steps you provided above. I failed with terragrunt apply target=module.cookiecutter_meta initially, then I think you might referring terragrunt apply -target module.cookiecutter_meta instead. So I retry with this command followed by terragrunt apply, the resources of vpc, subnets and IP. etc. are successfully created.

However, when I try to run terragrunt apply -target module.cookiecutter_meta in the kubernetes folder, the same error occurred again:

ERRO[0034] …/openedx_devops/terraform/stacks/service/kubernetes/terragrunt.hcl is a dependency of …/openedx_devops/terraform/environments/prod/kubernetes/terragrunt.hcl but detected no outputs. Either the target module has not been applied yet, or the module has no outputs. If this is expected, set the skip_outputs flag to true on the dependency block. 
ERRO[0034] Unable to determine underlying exit code, so Terragrunt will exit with error code 1

I can't create a new EKS cluster, it will exit with error code 1 and just fail immediately. I try to run terragrunt plan -target module.cookiecutter_meta, it pass and no error but a warning exist:

│ Warning: Resource targeting is in effect
│ 
│ You are creating a plan with the -target option, which means that the
│ result of this plan may not represent all of the changes requested by the
│ current configuration.
│ 
│ The -target option is not for routine use, and is provided only for
│ exceptional situations such as recovering from errors or mistakes, or when
│ Terraform specifically suggests to use it as part of an error message.

So is there any step I make it wrong previously? Thank you.

lpm0073 commented 1 year ago

did you resolve all of your problems?

Markchau commented 1 year ago

@lpm0073 Hi Mr. Lawrence, I still trying to figure out what problem leads to the error as I mentioned in the previous comment above. When I try to run terragrunt apply -target module.cookiecutter_meta, I still get the same error:

ERRO[0034] …/openedx_devops/terraform/stacks/service/kubernetes/terragrunt.hcl is a dependency of …/openedx_devops/terraform/environments/prod/kubernetes/terragrunt.hcl but detected no outputs. Either the target module has not been applied yet, or the module has no outputs. If this is expected, set the skip_outputs flag to true on the dependency block. 
ERRO[0034] Unable to determine underlying exit code, so Terragrunt will exit with error code 1

The commend will exit status 1 and I can't create any new EKS cluster. All the subsequent terragrunt apply and terragrunt run-all apply will fail and the initial two errors will occur when the creation of EKS cluster failed:

│ Error: Invalid for_each argument
│
│ on .terraform/modules/eks/main.tf line 97, in resource "aws_ec2_tag" "cluster_primary_security_group":
│ 97: for_each = { for k, v in merge(var.tags, var.cluster_tags) :
│ 98: k => v if local.create && k != "Name" && var.create_cluster_primary_security_group_tags && v != null
│ 99: }
│ ├────────────────
│ │ local.create is true
│ │ var.cluster_tags is empty map of string
│ │ var.create_cluster_primary_security_group_tags is true
│ │ var.tags is map of string with 26 elements
│
│ The "for_each" map includes keys derived from resource attributes that
│ cannot be determined until apply, and so Terraform cannot determine the
│ full set of keys that will identify the instances of this resource.
│
│ When working with unknown values in for_each, it's better to define the map
│ keys statically in your configuration and place apply-time results only in
│ the map values.
│
│ Alternatively, you could use the -target planning option to first apply
│ only the resources that the for_each value depends on, and then apply a
│ second time to fully converge.
│ Error: reading EKS Cluster (cluster_name): couldn't find resource
│
│ with data.aws_eks_cluster.eks,
│ on providers.tf line 1, in data "aws_eks_cluster" "eks":
│ 1: data "aws_eks_cluster" "eks" {
│

If you know what is the problem, would you mind to help me out on this? Thank you.

lpm0073 commented 1 year ago

Please take note that "…/openedx_devops/terraform/stacks/service/kubernetes/terragrunt.hcl" is referring to the Kubernetes module in the Service Stack. This means that you're running your modules out of sequence. You first need to run modules in the "Stack" layer, and then only afterwards should you attempt to run any of the modules in the "Environment" layer

Markchau commented 1 year ago

@lpm0073 Hi Mr. Lawrence, thanks for the correction, it seems I misunderstood the sequence of the running modules. It's my mistake, I apologize. So if I understand it correctly, I shall have run terragrunt run-all init and terragrunt run-all apply in the terraform/stacks/service folder first, make sure everything work out then follow the command sequence that you provided above in the terraform/environments/prod folder? Please re-correct me if I am wrong again. Thanks for your assistance, I really appreciate it, thank you so much.

Markchau commented 1 year ago

@lpm0073 Hi Mr. Lawrence, sorry to disturb, hope you have a nice day, since you mentioned:

You first need to run modules in the "Stack" layer, and then only afterwards should you attempt to run any of the modules in the "Environment" layer

I have run the commands in following sequence in stack layer:

cd …/openedx_devops/terraform/stacks/service
terragrunt run-all init

cd vpc
terragrunt apply -target module.cookiecutter_meta
terragrunt apply

cd ../kubernetes
terragrunt apply -target module.cookiecutter_meta
terragrunt apply

However, when I try to run terragrunt apply in the terraform/stacks/service/kubernetes folder, I came across with a new error:

│ Error: metadata.0.name a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
│ 
│   with kubernetes_namespace.namespace-shared,
│   on main.tf line 263, in resource "kubernetes_namespace" "namespace-shared":
│  263:     name = var.namespace
│ 

I find some similar issue like this one https://github.com/hashicorp/terraform-provider-kubernetes/issues/214 but I don't quite understand what's happening. What am I possibly doing wrong here? Is that because I have a global namespace (global_platform_name) that include uppercase leading the error? If that's so, how could I update at once? Also, could I have '-' character in the environment_subdomain? Thank you.

lpm0073 commented 1 year ago

what is the exact value of 'var.namespace' in your case? it appears that you're passing illegal character values to this variable. what character set are you using?

Markchau commented 1 year ago

@lpm0073 If I am correct, the var.namespace should referring to the environment_namespace from the openedx_devops/terraform/environments/prod/env.hcl which

environment_namespace = "${local.global_vars.locals.platform_name}-${local.global_vars.locals.platform_region}-${local.environment}"

I set global_platform_name with a capitalised platform name, e.g. "Course", I guess uppercase is illegal in this case, am I correct? Meanwhile, I set environment_subdomain=tl-edx, Is this also illegal to set a subdomain include '-' character? If these are causes, is there any approach/command that I could update the namespace or global parameters at once which set initially with cookiecutter? Thank you.

Markonick commented 1 year ago

Thanks for creating this issue @Markchau! I am facing the exact same issue, following the steps described to build the Vpc and EKS modules (Kubernetes) individually but getting the same errors as you described. Kinda blocked here now :-(

Markonick commented 1 year ago

@lpm0073 Regarding your comment on sequence order of modules, I think I get what you mean, looking at the terminal logs:

`INFO[0011] The stack at /Users/nicolasmarkos/Projects/openedx-devops/terraform/environments/dev will be processed in the following order for command apply: Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Might not have solved my problem but at least I'm getting some more understanding. Baby steps :-)

lpm0073 commented 1 year ago

greetings all. i ran through a test of the stack and environment builds just now. confirming that i also ran into the Error: Invalid for_each argument

Following are the steps that i took to mitigate this problem:

cd kubernetes 
terragrunt apply --target module.cookiecutter_meta
terragrunt apply 
cd ..

terragrunt run-all apply

The challenge that Terraform runs into that leads to the "Invalid for_each argument" error is that it is unable to calculate the state transition path due to meta data that has not yet been initialized. I don't really know why this is a problem per se, but, getting around it is simply a matter of issuing a command that will initialize the missing meta data.

hope that helps.

Markchau commented 1 year ago

@lpm0073 Hi Mr. Lawrence. Hope you a nice upcoming weekend. I understand that it need the meta data to be created fist but I am still facing the same error on the var.namespace with terragrunt apply under the terraform/stacks/service/kubernetes folder:

│ Error: metadata.0.name a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
│ 
│   with kubernetes_namespace.namespace-shared,
│   on main.tf line 263, in resource "kubernetes_namespace" "namespace-shared":
│  263:     name = var.namespace
│ 

I set global_platform_name (used as part of global variable of the namespace) with a capitalised platform name, e.g. "Course", Is uppercase illegal in this case? Meanwhile, I set environment_subdomain=tl-edx, Is it also illegal to set a subdomain include '-' character? Any approach/command that I could update the namespace or global parameters at once which set initially with cookiecutter? If there's no choice, then I will recreate a new repo. Thank you.

lpm0073 commented 1 year ago

@Markchau , the error message above provides you with guidance on the allowed characters, which are:

lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character

and it also provides you with the exact Regex expression that it uses to validate your string, which is:

regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'

So, your problem is the hyphen ('-') character that you're using in your environment_subdomain.

Markchau commented 1 year ago

@lpm0073 Is there any approach/command that I could update the environment_subdomain and all the relevant variables for all the .hcl/.tf file at once? Thank you. If there is no choice, then I need either manually update all the files or restart the whole process on creating a new repo. I could find a make file with the cookiecutter command, but I think it's used on creating the repo only

lpm0073 commented 1 year ago

the only variable that you need to modify is environment_subdomain, located in env.hcl. But since you're raising the question, for any future, larger modifications that you might make, you can re-run the cookiecutter, which will regenerate all of the files in the repository.

Markonick commented 1 year ago

From my side @lpm0073 comments on "build stack first then build environment" worked like a charm. I ran into a few snags sure, some of which were related to IAM permissions etc but nothing that was a show-stopper and nothing further related to this specific issue.

lpm0073 commented 1 year ago

Thanks, @Markonick. Closing this issue.