Closed WesleyBuck closed 2 years ago
Hi @WesleyBuck,
Good morning.
Could you please confirm the value of environment variable AWS_WEB_IDENTITY_TOKEN_FILE
in your EKS pod? I had tested the EKS setup as part of other unrelated issue https://github.com/aws/aws-sdk-net/issues/1856 and the mentioned scenario works fine.
Thanks, Ashish
Hi @ashishdhingra,
Confirmed token is contained within the file. As per the description: " Then we confirmed that we have installed AWS CLI tool inside the same pod that contains .NET SDK code, and were able to successfully execute SQS related commands, this confirmed that the credentials token and required environment variables were successfully injected into the pod, which rules out that this is an issue from the cluster side. "
Was not able to get the same error as user reported, but a different error:
Dockerfile
might need to be tweaked to correct the path). Thereafter create the following files:
eks-cluster-create.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata: name: eks-issue1920 region: us-east-2 version: "1.20"
iam: withOIDC: true serviceAccounts:
namespace: eks-issue1920-ns labels: {aws-usage: "application"} attachPolicyARNs:
nodeGroups:
k8s.io/cluster-autoscaler/enabled: "true" k8s.io/cluster-autoscaler/cluster-13: "owned" desiredCapacity: 1
**eks-manifest.yaml**
```YAML
apiVersion: apps/v1
items:
- apiVersion: v1
kind: Service
metadata:
annotations:
Process: eks-issue1920
creationTimestamp: null
labels:
app: eks-issue1920
name: eks-issue1920
spec:
type: LoadBalancer
ports:
- name: "5999"
port: 5999
targetPort: 5999
selector:
app: eks-issue1920
status:
loadBalancer: {}
- apiVersion: apps/v1
kind: Deployment
metadata:
name: eks-issue1920
namespace: eks-issue1920-ns
spec:
selector:
matchLabels:
app: eks-issue1920
replicas: 1
template:
metadata:
labels:
app: eks-issue1920
spec:
serviceAccountName: eks-issue1920-sqs
containers:
- name: eks-issue1920
image: <<accountid>>.dkr.ecr.us-east-2.amazonaws.com/eksissue1920:latest
ports:
imagePullPolicy: Always #IfNotPresent
restartPolicy: Always
kind: List
metadata: {}
NOTE: Replace <<accountid>>
in above eks-manifest.yaml
with your own account ID. Kindly note that image
should be set to ECR image created in step 3 below.
aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin <<accountid>>.dkr.ecr.us-east-2.amazonaws.com
docker build -t eksissue1920 .
docker tag eksissue1920:latest <<accountid>>.dkr.ecr.us-east-2.amazonaws.com/eksissue1920:latest
docker push <<accountid>>.dkr.ecr.us-east-2.amazonaws.com/eksissue1920:latest
eksctl create cluster --config-file=./eks-cluster-create.yaml
to create EKS cluster. This will use configuration in YAML file to create EKS cluster, IAM OIDC provider, required service account(s) and node group(s).kubectl apply -f ./eks-manifest.yaml
(to delete deployment, you may use kubectl delete -f ./eks-manifest.yaml
command).eks-issue1920
workload is created in cluster, click on it and take note of the Pod name. You may examine the pod status by clicking on it (it should be in Running state). (You may also list the pods from command line using command kubectl get pods --namespace eks-issue1920-ns -o wide
)kubectl exec --stdin --tty --namespace eks-issue1920-ns <<podname>> -- /bin/bash)
. Replace eks-issue1920-ns
with your own namespace if differently set in the YAML file. Replace <<podname>>
with name of pod.)kubectl logs <<podname>> --namespace eks-issue1920-ns
to get the logs (replace <<podname>>
with name of pod). (Since our console application is the entry point and is writing logs to console, this would give the trace output)
Log Output:
info: RX.Workers.WhatsAppConsumer[0]
WhatsAppConsumer Loaded!!
info: AWSSDK[0]
Found AWS options in IConfiguration
info: AWSSDK[0]
Found credentials using the AWS SDK's default credential search
info: Microsoft.Hosting.Lifetime[0]
Now listening on: http://[::]:80
info: Microsoft.Hosting.Lifetime[0]
Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
Content root path: /app
fail: RX.Services.QueueService[0]
Creating queue: afs1-b-036928772765-npr-bb-bbdbnk-sqs-send-to-whatsapp
fail: RX.Services.QueueService[0]
Failed to create queue: afs1-b-036928772765-npr-bb-bbdbnk-sqs-send-to-whatsapp
fail: RX.Services.QueueService[0]
Amazon.Runtime.Internal.HttpErrorResponseException: Exception of type 'Amazon.Runtime.Internal.HttpErrorResponseException' was thrown.
at Amazon.Runtime.HttpWebRequestMessage.GetResponseAsync(CancellationToken cancellationToken)
at Amazon.Runtime.Internal.HttpHandler`1.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.Unmarshaller.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.SQS.Internal.ValidationResponseHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.ErrorHandler.InvokeAsync[T](IExecutionContext executionContext)
kubectl exec --stdin --tty --namespace eks-issue1920-ns eks-issue1920-76b58df489-r64gb -- /bin/bash
(replace eks-issue1920-76b58df489-r64gb
with pod name) and execute env
command to examine the environment variables:
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_SERVICE_PORT=443
HOSTNAME=eks-issue1920-76b58df489-r64gb
AWS_DEFAULT_REGION=us-east-2
ASPNETCORE_URLS=http://+:80
AWS_REGION=us-east-2
PWD=/app
AWS_ROLE_ARN=arn:aws:iam::<<accountid>>:role/eksctl-eks-issue1920-addon-iamserviceaccount-Role1-596F68EAW9BF
HOME=/root
KUBERNETES_PORT_443_TCP=tcp://10.100.0.1:443
TERM=xterm
SHLVL=1
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
KUBERNETES_PORT_443_TCP_PROTO=tcp
DOTNET_RUNNING_IN_CONTAINER=true
KUBERNETES_PORT_443_TCP_ADDR=10.100.0.1
KUBERNETES_SERVICE_HOST=10.100.0.1
KUBERNETES_PORT=tcp://10.100.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/bin/env
Confirmed that the role pointed to by AWS_ROLE_ARN
environment variable had arn:aws:iam::aws:policy/AmazonSQSFullAccess
attached.
TODO: May be modify client code to log verbose error going through the InnerException(s) in IAmazonSQSExtension.CreateQueue
.
Hi @WesleyBuck, On reinvestigating this issue, I used an example for creating a SQS queue and did not face the same error.
Reproduction Steps:
Console Logs:
Your new message queue:
Queue: https://sqs.us-east-2.amazonaws.com/<<ACCOUNT-ID>>/<<QUEUE-NAME>>
QueueArn: arn:aws:sqs:us-east-2:<<ACCOUNT-ID>>:<<QUEUE-NAME>>
ApproximateNumberOfMessages: 0
ApproximateNumberOfMessagesNotVisible: 0
ApproximateNumberOfMessagesDelayed: 0
CreatedTimestamp: 1635287070
LastModifiedTimestamp: 1635287070
VisibilityTimeout: 30
MaximumMessageSize: 262144
MessageRetentionPeriod: 345600
DelaySeconds: 0
ReceiveMessageWaitTimeSeconds: 0
Could you also please check the EKS parameters using the .yaml
markup https://github.com/aws/aws-sdk-net/issues/1920#issuecomment-949037559 to see if something is missing there?
This issue has not received a response in 5 days. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.
I'm trying to get a .NET Core app to work with EKS new support for IAM for Service Accounts (IMDSv2 enable with the required OpenID Connect). I've followed these instructions .
This app is reading from an SQS queue and was working previously with IMDSv1 without the container annotation for OpenID Connect. AWSSDK has been updated to the latest stable which is newer than the minimum supported version specified here.
My understanding is that a token which is a Kubernetes secret is mounted and the path is stored as the environment variable AWS_WEB_IDENTITY_TOKEN_FILE. I can confirm that both the environment variable and mount exist when I describe the Kubernetes pod. According to the docs, the credential chain is meant to check if this token exists first. However, I don't think that is happening. From the logs, the initial request to retrieve the queue URL fails with "The security token included in the request is invalid".
We did log a ticket with AWS Enterprise Support, while implementing IRSA into our .NET SDK pod, but still our application was unable to manage SQS service.
During the session, Enterprise Support reviewed our IAM role and confirmed that the required permissions and Trust policy is correctly applied.
Then we confirmed that we have installed AWS CLI tool inside the same pod that contains .NET SDK code, and were able to successfully execute SQS related commands, this confirmed that the credentials token and required environment variables were successfully injected into the pod, which rules out that this is an issue from the cluster side.
Expected Behavior
AWSSDK should be able to use the provided token to access the required resources.
Current Behavior
My application logs have the following error:
RX.Services.QueueService[0] Failed to get queue url: afs1-b-036928772765-npr-bb-bbdbnk-sqs-send-to-whatsapp RX.Services.QueueService[0] Amazon.SQS.AmazonSQSException: The security token included in the request is invalid ---> Amazon.Runtime.Internal.HttpErrorResponseException: Exception of type 'Amazon.Runtime.Internal.HttpErrorResponseException' was thrown. at Amazon.Runtime.HttpWebRequestMessage.GetResponseAsync(CancellationToken cancellationToken) at Amazon.Runtime.Internal.HttpHandler
1.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.Unmarshaller.InvokeAsync[T](IExecutionContext executionContext) at Amazon.SQS.Internal.ValidationResponseHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.ErrorHandler.InvokeAsync[T](IExecutionContext executionContext) --- End of inner exception stack trace --- at Amazon.Runtime.Internal.HttpErrorResponseExceptionHandler.HandleExceptionStream(IRequestContext requestContext, IWebResponseData httpErrorResponse, HttpErrorResponseException exception, Stream responseStream) at Amazon.Runtime.Internal.HttpErrorResponseExceptionHandler.HandleExceptionAsync(IExecutionContext executionContext, HttpErrorResponseException exception) at Amazon.Runtime.Internal.ExceptionHandler
1.HandleAsync(IExecutionContext executionContext, Exception exception) at Amazon.Runtime.Internal.ErrorHandler.ProcessExceptionAsync(IExecutionContext executionContext, Exception exception) at Amazon.Runtime.Internal.ErrorHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.EndpointDiscoveryHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.EndpointDiscoveryHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.MetricsHandler.InvokeAsync[T](IExecutionContext executionContext) at Onboarding.Utility.IAmazonSQSExtension.CreateQueueIfQueueDoesNotExist(IAmazonSQS sqs, ILogger logger, String queueName, CancellationToken cancellationToken, Int32 delaySeconds) in C:\Projects\AWS RX\RX\Utility\IAmazonSQSExtension.cs:line 34Environment
AWSSDK.Extensions.NETCore.Setup: 3.7.1 AWSSDK.SecurityToken: 3.7.1.62 AWSSDK.SQS: 3.7.1.15 .NET Core SDK: 3.1.402
Running in amazonlinux2 on EKS 1.20
Please find example solution (AWS RX.zip) used to reproduce the issue.