aws / aws-sdk-net

The official AWS SDK for .NET. For more information on the AWS SDK for .NET, see our web site:
http://aws.amazon.com/sdkfornet/
Apache License 2.0
2.05k stars 852 forks source link

Amazon.SQS.AmazonSQSException: The security token included in the request is invalid. #1920

Closed WesleyBuck closed 2 years ago

WesleyBuck commented 3 years ago

I'm trying to get a .NET Core app to work with EKS new support for IAM for Service Accounts (IMDSv2 enable with the required OpenID Connect). I've followed these instructions .

This app is reading from an SQS queue and was working previously with IMDSv1 without the container annotation for OpenID Connect. AWSSDK has been updated to the latest stable which is newer than the minimum supported version specified here.

My understanding is that a token which is a Kubernetes secret is mounted and the path is stored as the environment variable AWS_WEB_IDENTITY_TOKEN_FILE. I can confirm that both the environment variable and mount exist when I describe the Kubernetes pod. According to the docs, the credential chain is meant to check if this token exists first. However, I don't think that is happening. From the logs, the initial request to retrieve the queue URL fails with "The security token included in the request is invalid".

We did log a ticket with AWS Enterprise Support, while implementing IRSA into our .NET SDK pod, but still our application was unable to manage SQS service.

During the session, Enterprise Support reviewed our IAM role and confirmed that the required permissions and Trust policy is correctly applied.

Then we confirmed that we have installed AWS CLI tool inside the same pod that contains .NET SDK code, and were able to successfully execute SQS related commands, this confirmed that the credentials token and required environment variables were successfully injected into the pod, which rules out that this is an issue from the cluster side.

Expected Behavior

AWSSDK should be able to use the provided token to access the required resources.

Current Behavior

My application logs have the following error:

RX.Services.QueueService[0] Failed to get queue url: afs1-b-036928772765-npr-bb-bbdbnk-sqs-send-to-whatsapp RX.Services.QueueService[0] Amazon.SQS.AmazonSQSException: The security token included in the request is invalid ---> Amazon.Runtime.Internal.HttpErrorResponseException: Exception of type 'Amazon.Runtime.Internal.HttpErrorResponseException' was thrown. at Amazon.Runtime.HttpWebRequestMessage.GetResponseAsync(CancellationToken cancellationToken) at Amazon.Runtime.Internal.HttpHandler1.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.Unmarshaller.InvokeAsync[T](IExecutionContext executionContext) at Amazon.SQS.Internal.ValidationResponseHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.ErrorHandler.InvokeAsync[T](IExecutionContext executionContext) --- End of inner exception stack trace --- at Amazon.Runtime.Internal.HttpErrorResponseExceptionHandler.HandleExceptionStream(IRequestContext requestContext, IWebResponseData httpErrorResponse, HttpErrorResponseException exception, Stream responseStream) at Amazon.Runtime.Internal.HttpErrorResponseExceptionHandler.HandleExceptionAsync(IExecutionContext executionContext, HttpErrorResponseException exception) at Amazon.Runtime.Internal.ExceptionHandler1.HandleAsync(IExecutionContext executionContext, Exception exception) at Amazon.Runtime.Internal.ErrorHandler.ProcessExceptionAsync(IExecutionContext executionContext, Exception exception) at Amazon.Runtime.Internal.ErrorHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.EndpointDiscoveryHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.EndpointDiscoveryHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeAsync[T](IExecutionContext executionContext) at Amazon.Runtime.Internal.MetricsHandler.InvokeAsync[T](IExecutionContext executionContext) at Onboarding.Utility.IAmazonSQSExtension.CreateQueueIfQueueDoesNotExist(IAmazonSQS sqs, ILogger logger, String queueName, CancellationToken cancellationToken, Int32 delaySeconds) in C:\Projects\AWS RX\RX\Utility\IAmazonSQSExtension.cs:line 34

Environment

AWSSDK.Extensions.NETCore.Setup: 3.7.1 AWSSDK.SecurityToken: 3.7.1.62 AWSSDK.SQS: 3.7.1.15 .NET Core SDK: 3.1.402

Running in amazonlinux2 on EKS 1.20

Please find example solution (AWS RX.zip) used to reproduce the issue.

ashishdhingra commented 3 years ago

Hi @WesleyBuck,

Good morning.

Could you please confirm the value of environment variable AWS_WEB_IDENTITY_TOKEN_FILE in your EKS pod? I had tested the EKS setup as part of other unrelated issue https://github.com/aws/aws-sdk-net/issues/1856 and the mentioned scenario works fine.

Thanks, Ashish

WesleyBuck commented 2 years ago

Hi @ashishdhingra,

Confirmed token is contained within the file. As per the description: " Then we confirmed that we have installed AWS CLI tool inside the same pod that contains .NET SDK code, and were able to successfully execute SQS related commands, this confirmed that the credentials token and required environment variables were successfully injected into the pod, which rules out that this is an issue from the cluster side. " Token

ashishdhingra commented 2 years ago

Was not able to get the same error as user reported, but a different error:

  1. Use the sample code from customer AWS RX.zip (Dockerfile might need to be tweaked to correct the path). Thereafter create the following files: eks-cluster-create.yaml
    
    apiVersion: eksctl.io/v1alpha5
    kind: ClusterConfig

metadata: name: eks-issue1920 region: us-east-2 version: "1.20"

iam: withOIDC: true serviceAccounts:

nodeGroups:

  1. Create the ECR repository from AWS Console. Take note of the Push commands.
  2. From the project root directory, execute the push commands one by one:
      • aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin <<accountid>>.dkr.ecr.us-east-2.amazonaws.com
      • docker build -t eksissue1920 .
      • docker tag eksissue1920:latest <<accountid>>.dkr.ecr.us-east-2.amazonaws.com/eksissue1920:latest
      • docker push <<accountid>>.dkr.ecr.us-east-2.amazonaws.com/eksissue1920:latest
  3. Execute the command eksctl create cluster --config-file=./eks-cluster-create.yaml to create EKS cluster. This will use configuration in YAML file to create EKS cluster, IAM OIDC provider, required service account(s) and node group(s).
  4. Execute kubectl apply -f ./eks-manifest.yaml (to delete deployment, you may use kubectl delete -f ./eks-manifest.yaml command).
  5. After the eks-issue1920 workload is created in cluster, click on it and take note of the Pod name. You may examine the pod status by clicking on it (it should be in Running state). (You may also list the pods from command line using command kubectl get pods --namespace eks-issue1920-ns -o wide)
  6. (You may connect to interactive session using kubectl exec --stdin --tty --namespace eks-issue1920-ns <<podname>> -- /bin/bash). Replace eks-issue1920-ns with your own namespace if differently set in the YAML file. Replace <<podname>> with name of pod.)
  7. Execute the command kubectl logs <<podname>> --namespace eks-issue1920-ns to get the logs (replace <<podname>> with name of pod). (Since our console application is the entry point and is writing logs to console, this would give the trace output) Log Output:
    info: RX.Workers.WhatsAppConsumer[0]
      WhatsAppConsumer Loaded!!
    info: AWSSDK[0]
      Found AWS options in IConfiguration
    info: AWSSDK[0]
      Found credentials using the AWS SDK's default credential search
    info: Microsoft.Hosting.Lifetime[0]
      Now listening on: http://[::]:80
    info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
    info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
    info: Microsoft.Hosting.Lifetime[0]
      Content root path: /app
    fail: RX.Services.QueueService[0]
      Creating queue: afs1-b-036928772765-npr-bb-bbdbnk-sqs-send-to-whatsapp
    fail: RX.Services.QueueService[0]
      Failed to create queue: afs1-b-036928772765-npr-bb-bbdbnk-sqs-send-to-whatsapp
    fail: RX.Services.QueueService[0]
      Amazon.Runtime.Internal.HttpErrorResponseException: Exception of type 'Amazon.Runtime.Internal.HttpErrorResponseException' was thrown.
         at Amazon.Runtime.HttpWebRequestMessage.GetResponseAsync(CancellationToken cancellationToken)
         at Amazon.Runtime.Internal.HttpHandler`1.InvokeAsync[T](IExecutionContext executionContext)
         at Amazon.Runtime.Internal.Unmarshaller.InvokeAsync[T](IExecutionContext executionContext)
         at Amazon.SQS.Internal.ValidationResponseHandler.InvokeAsync[T](IExecutionContext executionContext)
         at Amazon.Runtime.Internal.ErrorHandler.InvokeAsync[T](IExecutionContext executionContext)
  8. Connect to pod using command kubectl exec --stdin --tty --namespace eks-issue1920-ns eks-issue1920-76b58df489-r64gb -- /bin/bash (replace eks-issue1920-76b58df489-r64gb with pod name) and execute env command to examine the environment variables:
    KUBERNETES_SERVICE_PORT_HTTPS=443
    KUBERNETES_SERVICE_PORT=443
    HOSTNAME=eks-issue1920-76b58df489-r64gb
    AWS_DEFAULT_REGION=us-east-2
    ASPNETCORE_URLS=http://+:80
    AWS_REGION=us-east-2
    PWD=/app
    AWS_ROLE_ARN=arn:aws:iam::<<accountid>>:role/eksctl-eks-issue1920-addon-iamserviceaccount-Role1-596F68EAW9BF
    HOME=/root
    KUBERNETES_PORT_443_TCP=tcp://10.100.0.1:443
    TERM=xterm
    SHLVL=1
    AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
    KUBERNETES_PORT_443_TCP_PROTO=tcp
    DOTNET_RUNNING_IN_CONTAINER=true
    KUBERNETES_PORT_443_TCP_ADDR=10.100.0.1
    KUBERNETES_SERVICE_HOST=10.100.0.1
    KUBERNETES_PORT=tcp://10.100.0.1:443
    KUBERNETES_PORT_443_TCP_PORT=443
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    _=/usr/bin/env

    Confirmed that the role pointed to by AWS_ROLE_ARN environment variable had arn:aws:iam::aws:policy/AmazonSQSFullAccess attached.

TODO: May be modify client code to log verbose error going through the InnerException(s) in IAmazonSQSExtension.CreateQueue.

aaoswal commented 2 years ago

Hi @WesleyBuck, On reinvestigating this issue, I used an example for creating a SQS queue and did not face the same error.

Reproduction Steps:

  1. Create a C# console application and replace Program.cs with the code in the example above.
  2. Follow all the instructions listed in the previous comment.

Console Logs:

Your new message queue:
Queue: https://sqs.us-east-2.amazonaws.com/<<ACCOUNT-ID>>/<<QUEUE-NAME>>
    QueueArn: arn:aws:sqs:us-east-2:<<ACCOUNT-ID>>:<<QUEUE-NAME>>
    ApproximateNumberOfMessages: 0
    ApproximateNumberOfMessagesNotVisible: 0
    ApproximateNumberOfMessagesDelayed: 0
    CreatedTimestamp: 1635287070
    LastModifiedTimestamp: 1635287070
    VisibilityTimeout: 30
    MaximumMessageSize: 262144
    MessageRetentionPeriod: 345600
    DelaySeconds: 0
    ReceiveMessageWaitTimeSeconds: 0

Could you also please check the EKS parameters using the .yaml markup https://github.com/aws/aws-sdk-net/issues/1920#issuecomment-949037559 to see if something is missing there?

github-actions[bot] commented 2 years ago

This issue has not received a response in 5 days. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.