Closed vstthomas closed 8 months ago
Just dredged this out of the terraform logs on another build
http.response.header.access_control_allow_origin="*" http.response.header.date="Thu, 28 Dec 2023 16:25:01 GMT" http.response.header.x_amzn_requestid=6d86cae6-95a3-4783-a677-40965613616f tf_aws.sdk=aws-sdk-go-v2 tf_mux_provider="*schema.GRPCProviderServer" timestamp=2023-12-28T08:25:01.069-0800
2023-12-28T08:25:01.079-0800 [DEBUG] provider.terraform-provider-aws_v5.25.0_x5: HTTP Response Received:
http.response.body=
| {
| "addon" : {
| "addonName" : "snapshot-controller",
| "clusterName" : "gitops-demo-stage",
| "status" : "DEGRADED",
| "addonVersion" : "v6.3.2-eksbuild.1",
| "health" : {
| "issues" : [ {
| "code" : "InsufficientNumberOfReplicas",
| "message" : "The add-on is unhealthy because it doesn't have the desired number of replicas.",
| "resourceIds" : null
| } ]
| },
| "addonArn" : "arn:aws-us-gov:eks:us-gov-east-1:367652197469:addon/gitops-demo-stage/snapshot-controller/8cc6589c-8e75-42f3-4a39-1f6481dd9616",
| "createdAt" : 1.703780359717E9,
| "modifiedAt" : 1.703780371096E9,
| "serviceAccountRoleArn" : null,
| "tags" : { },
| "publisher" : null,
| "owner" : null,
| "marketplaceInformation" : null,
| "configurationValues" : null
| }
| }
tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/aws-sdk-go-base/v2@v2.0.0-beta.39/logging/tf_logger.go:45 http.response.header.access_control_allow_origin="*" tf_req_id=84fad3e7-cdc2-6efa-8332-c5179fac2de9 http.status_code=200 rpc.service=EKS rpc.system=aws-api http.response.header.access_control_allow_methods="GET,HEAD,PUT,POST,DELETE,OPTIONS" http.response.header.date="Thu, 28 Dec 2023 16:25:01 GMT" http.response.header.x_amzn_requestid=364ff971-d3e3-4fc5-99e5-702ea0ff909c http.response.header.content_type=application/json http.response_content_length=796 http.response.header.access_control_allow_headers="*,Authorization,Date,X-Amz-Date,X-Amz-Security-Token,X-Amz-Target,content-type,x-amz-content-sha256,x-amz-user-agent,x-amzn-platform-id,x-amzn-trace-id" http.response.header.access_control_expose_headers="x-amzn-errortype,x-amzn-errormessage,x-amzn-trace-id,x-amzn-requestid,x-amz-apigw-id,date" tf_resource_type=aws_eks_addon http.duration=206 tf_provider_addr=registry.terraform.io/hashicorp/aws http.response.header.x_amz_apigw_id=QqYmkH7ZulQFmIw= http.response.header.x_amzn_trace_id=Root=1-658da15c-6b1a93965abbb38710b4209e tf_aws.sdk=aws-sdk-go-v2 tf_mux_provider="*schema.GRPCProviderServer" @module=aws aws.region=us-gov-east-1 rpc.method=DescribeAddon timestamp=2023-12-28T08:25:01.079-0800
Let's start with a proper reproduction first
How would you like to see that?
These are my steps:
tf init
- works as expectedPlanning failed. Terraform encountered an error while generating this plan.
│ Error: configuring Terraform AWS Provider: validating provider credentials: retrieving caller identity from STS: operation error STS: GetCallerIdentity, https response error StatusCode: 403, RequestID: f6b1d826-c73f-4f9e-a56e-dd84d8166eee, api error InvalidClientTokenId: The security token included in the request is invalid.
│
│ with provider["registry.terraform.io/hashicorp/aws"],
│ on main.tf line 12, in provider "aws":
│ 12: provider "aws" {
However, if I were to adjust the code
FROM
provider "aws" {
region = local.region
}
TO:
provider "aws" {
region = local.region
alias = "virginia"
}
The plan works afterwards; not sure why. But, from the Provider Configuration page, it says:
You can use expressions in the values of these configuration arguments, but can only reference values that are known before the configuration is applied. This means you can safely reference input variables, but not attributes exported by resources (with an exception for resource arguments that are specified directly in the configuration).
So, it looks like region = local.region
might need to be region = hardCodedRegion
. Seems like it still shouldn't work by adding the alias either though 🤷
% tf apply -auto-approve
...
module.eks_managed_node_group.aws_eks_node_group.this[0]: Still creating... [5m0s elapsed]
module.eks_managed_node_group.aws_eks_node_group.this[0]: Creation complete after 5m5s [id=reproduction:separate-2024010823044795020000000f]
╷
│ Error: reading IAM Role Managed Policy Attachment (reproduction-20240108225525688100000001:arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly): couldn't find resource
│
│ with aws_iam_role_policy_attachment.this["arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"],
│ on main.tf line 60, in resource "aws_iam_role_policy_attachment" "this":
│ 60: resource "aws_iam_role_policy_attachment" "this" {
│
╵
╷
│ Error: reading IAM Role Managed Policy Attachment (reproduction-20240108225525688100000001:arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy): couldn't find resource
│
│ with aws_iam_role_policy_attachment.this["arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"],
│ on main.tf line 60, in resource "aws_iam_role_policy_attachment" "this":
│ 60: resource "aws_iam_role_policy_attachment" "this" {
│
╵
╷
│ Error: reading IAM Role Managed Policy Attachment (reproduction-20240108225525688100000001:arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy): couldn't find resource
│
│ with aws_iam_role_policy_attachment.this["arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"],
│ on main.tf line 60, in resource "aws_iam_role_policy_attachment" "this":
│ 60: resource "aws_iam_role_policy_attachment" "this" {
In my case it's likely because of the government partition. Fixed that with:
data "aws_partition" "current" {}
locals {
name = "reproduction"
region = "us-east-1"
part = data.aws_partition.current.partition
...
}
# Then updated the attachments
resource "aws_iam_role_policy_attachment" "this" {
for_each = { for k, v in toset([
"arn:${local.part}:iam::aws:policy/AmazonEKSWorkerNodePolicy",
"arn:${local.part}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly",
"arn:${local.part}:iam::aws:policy/AmazonEKS_CNI_Policy"
]) : k => v }
policy_arn = each.value
role = aws_iam_role.this.name
}
Not really sure what we're looking for so here's everything 😀
Anyway, it's up and running. What info did you need out of this?
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days
Issue closed due to inactivity.
Not exactly sure what I'm running into here (a timeout, or what?!) but this behavior just started today.
Someone suggested passing the
-parallelism=1
parameter to Terraform (a few months ago) but I removed it last week; it caused very slow builds.Now I'm seeing this behavior:
Not sure if the
parallelism
parameter is even part of this issue 🤷 but it could be a factor.Partially Installed?
It seems like some of these were started but couldn't complete for some reason:
The Config file
Given the above config, running a subsequent plan outputs:
If I were then to apply these changes, they would build without error in
~60s
.Additional context
If it is the Terraform
parallelism
parameter, perhaps we could look beyond it to a real solution?Setting this to
1
causes very long builds. If it's something else, I'd like to hear what the maintainers think.TIA