Closed vstthomas closed 7 months ago
UnrecognizedClientException: The security token included in the request is invalid
From what I remember, this error only comes up when the server doesn't recognize the principal at all. I'm assuming that you are using IRSA so have you made sure that everything is wired-up correctly so that the pod absolutely has access to a role? It's surprising to me that you would only be seeing this issue in the Gov account, but also I don't think that this issue would be unique to a Gov account versus the standard partition.
it can't delete the nodes that it creates
I responded to the blueprints issue that you referenced. Since that's more of an issue with blueprints, I'd rather continue the conversation over there so that we can see how we can drive their policy to be closer to our official policy. For now, you should be able to change the defaults that are used in that TerminateInstances
ABAC policy with karpenter.irsa_tag_key
and karpenter.irsa_tag_values
.
For now, you should be able to change the defaults that are used in that TerminateInstances ABAC policy with karpenter.irsa_tag_key and karpenter.irsa_tag_values.
I'm Terraforming, I see references in the module to irsa_tag_key
and irsa_tag_values
but I don't see anything that explains how to code that up within the context of the blueprints.
Is there an example/docs you could share?
The https://github.com/aws-ia/terraform-aws-eks-blueprints-addons/issues/339#issuecomment-1883964653 solution fixes this issue: the log messages about the security token
just evaporated.
Added this to the configuration
module "eks_blueprints_addons" {
source = "aws-ia/eks-blueprints-addons/aws"
version = "~> 1.12.0" #ensure to update this to the latest/desired version
...
# --------------------------------------------------------------------------------------------------------------------
# Auto-Scaling
# karpenter: https://karpenter.sh/docs/getting-started/getting-started-with-karpenter/
# AWS Samples: https://github.com/aws-samples/karpenter-blueprints/blob/main/cluster/terraform/karpenter.tf
# --------------------------------------------------------------------------------------------------------------------
enable_karpenter = true
karpenter_enable_spot_termination = true
karpenter_enable_instance_profile_creation = true
karpenter_node = {
iam_role_use_name_prefix = false
}
# Solution from the above issue
karpenter = {
irsa_tag_key = "aws:ResourceTag/kubernetes.io/cluster/${var.cluster_name}"
irsa_tag_value = "*"
}
}
The plan added this to the existing policy:
Terraform will perform the following actions:
# module.eks_blueprints_addons.module.karpenter.aws_iam_policy.this[0] will be updated in-place
~ resource "aws_iam_policy" "this" {
id = "arn:aws-us-gov:iam::010101010101:policy/karpenter-20240108171158929300000026"
name = "karpenter-20240108171158929300000026"
~ policy = jsonencode(
~ {
~ Statement = [
# (5 unchanged elements hidden)
{
Action = "eks:DescribeCluster"
Effect = "Allow"
Resource = "arn:aws-us-gov:eks:*:010101010101:cluster/gitops-demo-stage"
},
~ {
~ Condition = {
~ StringLike = {
- "ec2:ResourceTag/Name" = [
- "*karpenter*",
- "*compute.internal",
- "*ec2.internal",
]
+ "ec2:ResourceTag/aws:ResourceTag/kubernetes.io/cluster/gitops-demo-stage" = [
+ "*karpenter*",
+ "*compute.internal",
+ "*ec2.internal",
]
}
}
# (3 unchanged attributes hidden)
},
{
Action = [
"sqs:ReceiveMessage",
"sqs:GetQueueUrl",
"sqs:GetQueueAttributes",
"sqs:DeleteMessage",
]
Effect = "Allow"
Resource = "arn:aws-us-gov:sqs:us-gov-east-1:010101010101:karpenter-gitops-demo-stage"
},
# (1 unchanged element hidden)
]
# (1 unchanged attribute hidden)
}
)
tags = {}
# (6 unchanged attributes hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
Leaving a resultant policy of
# module.eks_blueprints_addons.module.karpenter.aws_iam_policy.this[0]:
resource "aws_iam_policy" "this" {
arn = "arn:aws-us-gov:iam::010101010101:policy/karpenter-20240108171158929300000026"
description = "IAM Policy for karpenter"
id = "arn:aws-us-gov:iam::010101010101:policy/karpenter-20240108171158929300000026"
name = "karpenter-20240108171158929300000026"
name_prefix = "karpenter-"
path = "/"
policy = jsonencode(
{
Statement = [
{
Action = [
"ec2:DescribeSubnets",
"ec2:DescribeSpotPriceHistory",
"ec2:DescribeSecurityGroups",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeImages",
"ec2:DescribeAvailabilityZones",
]
Effect = "Allow"
Resource = "*"
},
{
Action = [
"ec2:RunInstances",
"ec2:DeleteLaunchTemplate",
"ec2:CreateTags",
"ec2:CreateLaunchTemplate",
"ec2:CreateFleet",
]
Effect = "Allow"
Resource = [
"arn:aws-us-gov:ec2:us-gov-east-1::image/*",
"arn:aws-us-gov:ec2:us-gov-east-1:010101010101:*",
]
},
{
Action = "iam:PassRole"
Effect = "Allow"
Resource = "arn:aws-us-gov:iam::010101010101:role/karpenter-gitops-demo-stage"
},
{
Action = "pricing:GetProducts"
Effect = "Allow"
Resource = "*"
},
{
Action = "ssm:GetParameter"
Effect = "Allow"
Resource = "arn:aws-us-gov:ssm:us-gov-east-1::parameter/*"
},
{
Action = "eks:DescribeCluster"
Effect = "Allow"
Resource = "arn:aws-us-gov:eks:*:010101010101:cluster/gitops-demo-stage"
},
{
Action = "ec2:TerminateInstances"
Condition = {
StringLike = {
"ec2:ResourceTag/aws:ResourceTag/kubernetes.io/cluster/gitops-demo-stage" = [
"*karpenter*",
"*compute.internal",
"*ec2.internal",
]
}
}
Effect = "Allow"
Resource = "arn:aws-us-gov:ec2:us-gov-east-1:010101010101:instance/*"
},
{
Action = [
"sqs:ReceiveMessage",
"sqs:GetQueueUrl",
"sqs:GetQueueAttributes",
"sqs:DeleteMessage",
]
Effect = "Allow"
Resource = "arn:aws-us-gov:sqs:us-gov-east-1:010101010101:karpenter-gitops-demo-stage"
},
{
Action = [
"iam:TagInstanceProfile",
"iam:RemoveRoleFromInstanceProfile",
"iam:GetInstanceProfile",
"iam:DeleteInstanceProfile",
"iam:CreateInstanceProfile",
"iam:AddRoleToInstanceProfile",
]
Effect = "Allow"
Resource = "*"
},
]
Version = "2012-10-17"
}
)
policy_id = "ANPAVLGOHKROSQWJKMUQT"
tags = {}
tags_all = {}
}
We can close this one out. Thank you!
Glad to hear fixing the policy resolved the issue. We're working on getting that fix merged in the EKS blueprints repo so that less users hit this in the future.
This one has regressed; just noticed another siting today:
{"level":"ERROR","time":"2024-01-30T23:03:19.803Z","logger":"controller.pricing","message":"retreiving on-demand pricing data, UnrecognizedClientException: The security token included in the request is invalid\n\tstatus code: 400, request id: 6c87f8cb-5104-40de-9063-39184170652f; UnrecognizedClientException: The security token included in the request is invalid\n\tstatus code: 400, request id: 413339af-7dad-4baf-a48f-ae5799bb87a7","commit":"1072d3b"}
Reproduction https://github.com/VivSoftOrg/reproduction/tree/karpenter-iam
This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.
bump
same issue here in gov cloud
This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.
Getting the same issue as well. In AWS GovCloud, using Karpenter v0.36.0. There are no GovCloud endpoints for the pricing API, so I am assuming thats why there is an error. To get arround this error, we set settings.isolatedVPC: true
in the helm chart.
{"level":"ERROR","time":"","logger":"controller.pricing","message":"updating pricing, retreiving on-demand pricing data, UnrecognizedClientException: The security token included in the request is invalid\n\tstatus code: 400, request id: xxxxxxxxxxx; UnrecognizedClientException: The security token included in the request is invalid\n\tstatus code: 400, request id: xxxxxxxxxx","commit":"6b868db"}
There are no GovCloud endpoints for the pricing API, so I am assuming thats why there is an error
When we are trying to hit the pricing API, we go to the us-east-1
endpoint which contains information on the pricing in gov cloud. Assuming that the principal and role that you are using here has permission to make the call cross-region in us-east-1
, I don't believe that you should be running into this issue. Karpenter's policy by default does not scope down the region that the principal can make calls from in the pricing API
Because GovCloud is in a different partition, it's not possible to create an IAM role with cross-region permissions to us-east-1
.
https://docs.aws.amazon.com/whitepapers/latest/aws-fault-isolation-boundaries/partitions.html
You cannot use IAM credentials from one partition to interact with resources in a different partition.
Chiming in to note the same issue for our GovCloud installation.
I think the "baked-in" static pricing data is probably good enough for our use case and avoids the complexity of setting up with iam user credentials for a commercial partition to make the getPricing call dynamically. However, even settling for the static pricing, there does not seem to be a way to configure karpenter to skip any getPricing calls which results in logs cluttered with the ERROR: ... updating pricing, retreiving on-demand pricing data, UnrecognizedClientException
messages.
Would it be possible to include a configuration value for skipping the dynamic pricing updates? It seems that the settings.isolatedVPC
accomplishes this indirectly based on comments above, but it would be nice to have more direct control over just the pricing update calls.
Description
Observed Behavior:
I saw this bug, which looks close, and added a question at the end. No reply so I'm revisiting this issue:
I've Terraformed a VPC/EKS cluster using AWS modules. Everything works as expected.
After that, karpenter was sent to the system via blueprints/addons. Then a NodePool was configured.
Using the same Terraform/process/automation:
The security token included in the request is invalid
(seems like a provider issue){"level":"ERROR","time":"2023-12-29T01:32:02.192Z","logger":"controller.pricing","message":"retreiving on-demand pricing data, UnrecognizedClientException: The security token included in the request is invalid\n\tstatus code: 400, request id: 5d038231-2332-4aef-b888-dbe037b60dc2; UnrecognizedClientException: The security token included in the request is invalid\n\tstatus code: 400, request id: a0b548d6-4833-4883-84ac-84db45806e7b","commit":"1072d3b"}
Expected Behavior:
Karpenter works the same in GovCloud as it does in a commercial account.
Reproduction Steps (Please include YAML):
Government Partition
The token, presented to some service is rejected per the log message above.
I'm not quite sure which service is rejecting the token but I'm willing to work towards the solution.
Versions:
Chart Version:
kubectl version
):