aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.13k stars 580 forks source link

InvalidIdentityToken : Token file expired, refresh token #3052

Open deweve opened 3 years ago

deweve commented 3 years ago

Describe the bug

On long live application like a Worker or an API. After a long period the app is not authenticated anymore and receive the error InvalidIdentityToken :

{
  "message": "Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements",
  "path": [
    "createAffectation"
  ],
  "stack": "InvalidIdentityToken: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\n
at deserializeAws_queryAssumeRoleWithWebIdentityCommandError (/app/node_modules/@aws-sdk/client-sts/dist/cjs/protocols/Aws_query.js:363:41)\n
at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)\n
at async /app/node_modules/@aws-sdk/middleware-serde/dist/cjs/deserializerMiddleware.js:6:20\n
at async StandardRetryStrategy.retry (/app/node_modules/@aws-sdk/middleware-retry/dist/cjs/StandardRetryStrategy.js:51:46)\n
at async /app/node_modules/@aws-sdk/middleware-logger/dist/cjs/loggerMiddleware.js:6:22\n
at async /app/node_modules/@aws-sdk/client-sts/dist/cjs/defaultStsRoleAssumers.js:70:33\n
at async SignatureV4.signRequest (/app/node_modules/@aws-sdk/client-sqs/node_modules/@aws-sdk/signature-v4/dist/cjs/SignatureV4.js:84:29)\n
at async /app/node_modules/@aws-sdk/client-sqs/node_modules/@aws-sdk/middleware-signing/dist/cjs/middleware.js:14:22\n    
at async StandardRetryStrategy.retry (/app/node_modules/@aws-sdk/client-sqs/node_modules/@aws-sdk/middleware-retry/dist/cjs/StandardRetryStrategy.js:51:46)\n
at async /app/node_modules/@aws-sdk/middleware-sdk-sqs/dist/cjs/send-message.js:6:18\n

Your environment

SDK version number

"@aws-sdk/client-s3": "^3.18.0"
"@aws-sdk/client-sqs": "^3.23.0"
"@aws-sdk/client-sts": "^3.18.0"

Is the issue in the browser/Node.js/ReactNative?

Node.js

Details of the browser/Node.js/ReactNative version

14.15.4

Steps to reproduce


getConfigFromEnv() {
  return {
      credentials: fromTokenFile({
        webIdentityTokenFile: env.aws.tokenFile,
        roleArn: env.aws.roleArn,
        roleSessionName: env.aws.sessionName,
        durationSeconds: env.aws.sessionDuration,
        roleAssumerWithWebIdentity: getDefaultRoleAssumerWithWebIdentity(),
      }),
    };
}

export abstract class SQSBaseClient {
  public sqs: SQS;
  public queueUrl: string;
  private initPromise: Promise<void> | undefined;
  protected fifoQueue: boolean;
  public messageSizeLimit: number;

  protected constructor(public queueName: string, config?: SQSClientConfig) {
    this.sqs = new SQS({
      ...config,
      ...getConfigFromEnv(),
      region: env.aws.region,
      apiVersion: "2012-11-05",
    });
    this.fifoQueue = queueName.endsWith(".fifo"); //FIFO queues always end in .fifo by AWS rules
  }

This is the abstraction to use SQS, we do not want to handle the authentification directly.

Observed behavior

Our Backend is running in an EKS cluster, Kubernetes is injecting a token in a pod to assume a role but this token has an expiration. The token is changed on the machine before the token expires.

After a long period the sdk is not able to authenticate to AWS using the past token to assume a role.

Expected behavior

The token in the file is re read when the past token is expired. I do not see any documentation in the sdk v3 nodejs to do it.

In the v2 I can do it : https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/WebIdentityCredentials.html

How can we change the credentials in the sdk ?

AllanZhengYP commented 3 years ago

Hi @deweve Can you confrim whether the issue persists on the latest version 3.42.0?

deweve commented 3 years ago

Hi, i will try tomorow, to test easly I have to decrease the duration of the token injected in Kubernetes.

deweve commented 3 years ago

I'm trying to find a good way to reproduce the error but i do not understand how it appears. I had tried to use an expires token oidc but it does not works. I realised that I am using the package import { fromTokenFile } from "@aws-sdk/credential-provider-web-identity"; And I do not know if it will change everythings

deweve commented 2 years ago

@AllanZhengYP Even with the latest sdk version I have the problem.

I discover that the sdk does not try to read the tokenFile to regenerate a StsClient. https://github.com/aws/aws-sdk-js-v3/blob/e0025cddbba244a41ddf1fd1adb761142e15c22d/clients/client-sts/src/defaultStsRoleAssumers.ts#L85

And because we are not authenticate to sts anymore we cannot regenerate credentials.

For now I'll just do :

function getConfigFromEnv() {
  return {
      credentials: fromTokenFile({
        webIdentityTokenFile: env.aws.tokenFile,
        roleArn: env.aws.roleArn,
        roleSessionName: env.aws.sessionName,
        durationSeconds: env.aws.sessionDuration,
        roleAssumerWithWebIdentity: getDefaultRoleAssumerWithWebIdentity(),
      }),
    };
}

export abstract class SQSBaseClient {
  public sqs: SQSClient;
  public queueUrl: string;
  private initPromise: Promise<void> | undefined;
  protected fifoQueue: boolean;
  public messageSizeLimit: number;

  protected constructor(public queueName: string, config?: SQSClientConfig) {
    this.sqs = new SQSClient({
      ...config,
      ...getConfigFromEnv(),
      region: env.aws.region,
      apiVersion: "2012-11-05",
    });
    this.fifoQueue = queueName.endsWith(".fifo"); //FIFO queues always end in .fifo by AWS rules
    setInterval(() => {
      this.sqs = new SQSClient({
        ...config,
        ...getConfigFromEnv(),
        region: env.aws.region,
        apiVersion: "2012-11-05",
      });
      log.info("Update credentials");
    }, 1000 * 60 * 30);
  }

The best solution would be to create a fonction for roleAssumerWithWebIdentity:

tleef commented 2 years ago

I have a similar setup, running an API container in an EKS cluster, and am experiencing the same issue.

const config: DynamoDBClientConfig = {};

config.credentials = fromTokenFile({
    roleAssumerWithWebIdentity: getDefaultRoleAssumerWithWebIdentity(),
});

const client = new DynamoDBClient(config)

I'm not passing in the role or the token file because the SDK is reading them from the environment variables by default.

While debugging this issue, I logged out the contents of the token file and decoded it. In my case, the JWT was not expired. I had recently deployed a fresh pod and since the token was issued at the same time, it had almost 24 hrs left.

I found that updating to the latest SDK version(s) didn't help me either

"@aws-sdk/client-dynamodb": "^3.44.0",
"@aws-sdk/client-sts": "^3.43.0",
"@aws-sdk/credential-provider-web-identity": "^3.41.0",

Rolling back to a known good container didn't help either so it doesn't appear to be code-related.

Edit In my case, it appears this was caused by an outage. Trying to fetch https://oidc.eks.us-east-1.amazonaws.com/id/XXXXXXXXX/.well-known/openid-configuration was giving me a mix of 504 and 500 responses. As soon as the OIDC provider came back, everything started working again.

I will say that it is a little sus that the identity token was still able to be issued and assigned to the pod while the OIDC provider was down 🤔 It's possible that the OIDC provider was only partially down. I hope the private signing key is not being shared to delegate the issuance of identity tokens.

deweve commented 2 years ago

I've realised that the sdk does not use env_variables AWS_REGION or AWS_DEFAULT_REGION to assume the role.

https://github.com/aws/aws-sdk-js-v3/blob/e0025cddbba244a41ddf1fd1adb761142e15c22d/clients/client-sts/src/defaultStsRoleAssumers.ts#L17

I do not think it is very related but I'll try

deweve commented 2 years ago

@tleef Hi can you tell me how did you check that you received errors with EKS. In your case it may be just the problem with AWS globally that happens yersteday

tleef commented 2 years ago

I'm pretty sure it was related to the outage yesterday, if not it was a major coincidence.

The first thing I did was print the contents of the identity token file. Inside the file, you should see a JWT. You can use the jwt.io debugger to decode the token. That should let you see the issuer, iss, and expiration time, exp.

I double-checked that the token wasn't expired and then tried to visit /.well-known/openid-configuration of my issuer in the browser. This endpoint is a public endpoint that exists on all OIDC servers and it is used, in part, to discover where the public keys can be found to verify a JWT issued by that server. In my case, when visiting that endpoint, I was getting 504s and sometimes 500s.

Once I saw the 504s/500s coming from my issuer, I felt that explained the original error, Couldn't retrieve verification key from your identity provider, well enough. Knowing the AWS was experiencing outages I decided to wait and see if it would recover and in my case it did.

carlosescura commented 2 years ago

We are still experiencing the same issues days after the outage, but this time they are just sporadic on some EC2 nodes. Anyone still having this issue?

We're on us-east-1

dev-rowbot commented 2 years ago

We are experiencing this same issue as @tleef - we have a NodeJS application deployed to Kubernetes with an attached Service Account linked to a role that grants access to DynamoDB. The container env variables are

AWS_ROLE_ARN=arn:aws:iam::41xxxxxx:role/eksctl-eks-cluster-name-addon-iamserviceaccou-Role1-XXXXXX
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_DEFAULT_REGION=eu-west-1
AWS_REGION=eu-west-1

We deployed the application 7 hours ago and it worked fine initially, but now there are these errors being thrown:

ExpiredTokenException: The security token included in the request is expired    at deserializeAws_json1_0UpdateItemCommandError (/app/common/temp/node_modules/.pnpm/@aws-sdk+client-dynamodb@3.48.0/node_modules/@aws-sdk/client-dynamodb/dist-cjs/protocols/Aws_json1_0.js:4014:41)    at processTicksAndRejections (internal/process/task_queues.js:95:5)    at async /app/common/temp/node_modules/.pnpm/@aws-sdk+middleware-serde@3.47.2/node_modules/@aws-sdk/middleware-serde/dist-cjs/deserializerMiddleware.js:7:24    at async /app/common/temp/node_modules/.pnpm/@aws-sdk+middleware-signing@3.47.2/node_modules/@aws-sdk/middleware-signing/dist-cjs/middleware.js:11:20    at async StandardRetryStrategy.retry (/app/common/temp/node_modules/.pnpm/@aws-sdk+middleware-retry@3.47.2/node_modules/@aws-sdk/middleware-retry/dist-cjs/StandardRetryStrategy.js:51:46)    at async /app/common/temp/node_modules/.pnpm/@aws-sdk+middleware-logger@3.47.2/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:6:22

The interesting part is that I grabbed the contents of AWS_WEB_IDENTITY_TOKEN_FILE and decoded it and I can see it has not expired yet

{
  "aud": [
    "sts.amazonaws.com"
  ],
  "exp": 1643195886,
  "iat": 1643109486,
  "iss": "https://oidc.eks.eu-west-1.amazonaws.com/id/112XXXXXX",
  "kubernetes.io": {
    "namespace": "default",
    "pod": {
      "name": "my-pod-name",
      "uid": "8211a0d9-c0db-4f2a-b9df-a2e7955d634d"
    },
    "serviceaccount": {
      "name": "dynamodb-service-account-name",
      "uid": "75de7cd8-eff1-9b71-b46d-7a6e090301e7"
    }
  },
  "nbf": 1643109486,
  "sub": "system:serviceaccount:default:dynamodb-service-account-name"
}

Packages used

        "@aws-sdk/client-sts": "~3.45.0",
        "@aws-sdk/credential-provider-node": "~3.45.0",
        "@aws-sdk/client-dynamodb": "~3.48.0",

This is a serious issue for us at the moment, any ideas why this would be failing?

dev-rowbot commented 2 years ago

In case it helps somebody else, to get around this issue we now check the token expiry every time we interact with DynamoDB

            const credentials = await this.dynamoDbClient.config.credentials();
            if (credentials && credentials.expiration && credentials.expiration > new Date()) {
                return this.dynamoDbClient;
            } else {
                this.dynamoDbClient.destroy();
                const config = await this.getConfiguration();
                this.dynamoDbClient = new DynamoDBClient(config);
            }

It seems like a strange way to do it since the token should be valid for the entire day but this has resolved our issue

tiagoernst commented 2 years ago

Any news regarding this issue ?

danyonedwards commented 1 year ago

We are seeing the same issue with the TranslateClient. Just wondering if there is any news on this issue?

yenfryherrerafeliz commented 1 year ago

Hi people, we currently working on reproducing this issue in order to identify reported problem. I will get back to you all as soon as possible.

Thanks!