aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.03k stars 569 forks source link

AbortSignal Timeout is only used for the actual request, not all middleware calls #5810

Open Llois41 opened 6 months ago

Llois41 commented 6 months ago

Checkboxes for prior research

Describe the bug

After upgrading the @aws-sdk/client-kms package from 3.478.0 to 3.511.0 we started seeing STS calls to the wrong region as stated in https://github.com/aws/aws-sdk-js-v3/issues/5799. We call the send()-Method like following:

// kmsClient is a global singleton created without any specified option
await kmsClient.send(command, { abortSignal: AbortSignal.timeout(1_000) });

We were surprised that the configured one second abortSignal timeout was not honored for calls to STS endpoints (which were dramatically slower due to https://github.com/aws/aws-sdk-js-v3/issues/5799).

SDK version number

@aws-sdk/client-kms@3.511.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v18.19.1

Reproduction Steps

run following snippet

const { EncryptCommand, KMSClient } = await import('@aws-sdk/client-kms');
const {default: https} = await import('node:https');

const original = https.request;
https.request = function (options, callback) {
  console.log(Date.now(), 'starting request', options.method, options.host, options.path);
  return original(options, (error, result) => {
    console.log(Date.now(), 'request done', options.method, options.host, options.path);
    callback(error, result);
  });
};

const kmsClient = new KMSClient({});
await kmsClient.send(new EncryptCommand({ KeyId: 'KEY_ID', Plaintext: new TextEncoder().encode('foo') }), { abortSignal: AbortSignal.timeout(1) });

Observed Behavior

A request to STS is made but does not honor the AbortSignal timeout (1ms). Instead it takes 64ms (finishes the call to STS and then applies the AbortSignal timeout):

1708340691530 starting request POST sts.eu-central-1.amazonaws.com /
1708340691594 request done POST sts.eu-central-1.amazonaws.com /
Uncaught Error [AbortError]: Request aborted
    at /app/node_modules/@smithy/node-http-handler/dist-cjs/node-http-handler.js:72:36
    at new Promise (<anonymous>)
    at NodeHttpHandler.handle (/app/node_modules/@smithy/node-http-handler/dist-cjs/node-http-handler.js:57:16)
    at /app/node_modules/@aws-sdk/client-kms/dist-cjs/commands/EncryptCommand.js:43:58
    at /app/node_modules/@smithy/middleware-serde/dist-cjs/deserializerMiddleware.js:5:32
    at /app/node_modules/@aws-sdk/middleware-signing/dist-cjs/awsAuthMiddleware.js:30:26
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /app/node_modules/@smithy/middleware-retry/dist-cjs/retryMiddleware.js:31:46
    at async /app/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:7:26 {
  '$metadata': { attempts: 1, totalRetryDelay: 0 }
}

Expected Behavior

KMSClient should use the provided AbortSignal timeout for all operations.

Possible Solution

No response

Additional Information/Context

No response

RanVaknin commented 6 months ago

Hi @Llois41 ,

I'm not able to reproduce the reported behavior. When running your code I see the request being aborted immediately:

$ node sample.mjs
/Users/rvaknin/test_folder/5810/node_modules/@smithy/node-http-handler/dist-cjs/index.js:249
        const abortError = new Error("Request aborted");
                           ^

Error [AbortError]: Request aborted
    at /Users/rvaknin/test_folder/5810/node_modules/@smithy/node-http-handler/dist-cjs/index.js:249:28
    at new Promise (<anonymous>)
    at _NodeHttpHandler.handle (/Users/rvaknin/test_folder/5810/node_modules/@smithy/node-http-handler/dist-cjs/index.js:234:12)
    at async /Users/rvaknin/test_folder/5810/node_modules/@smithy/middleware-serde/dist-cjs/index.js:33:24
    at async /Users/rvaknin/test_folder/5810/node_modules/@aws-sdk/middleware-signing/dist-cjs/index.js:192:18
    at async /Users/rvaknin/test_folder/5810/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38
    at async /Users/rvaknin/test_folder/5810/node_modules/@aws-sdk/middleware-logger/dist-cjs/index.js:33:22
    at async file:///Users/rvaknin/test_folder/5810/sample.mjs:16:1 {
  '$metadata': { attempts: 1, totalRetryDelay: 0 }
}

Node.js v18.13.0
$ npm ls
5810@1.0.0 /Users/rvaknin/test_folder/5810
└── @aws-sdk/client-kms@3.511.0

I'm not quite sure why you are seeing this behavior. Any chance you can create a small git repo and share it was so I could clone and further look into it?

Thanks, Ran~

Llois41 commented 6 months ago

@RanVaknin Hi, thanks for your answer. Did you check that the STS call before the KMS call really was made? We observed that there is some sort of caching or disable mechanism preventing it from happening. Also I do not see the log that the STS request started in your output. Also we used Node 18.19.1. Could you please check your Node version and if the STS call really was made?

RanVaknin commented 6 months ago

Hi @Llois41 ,

Thanks for re-iterating this. After re-reading this I think I understand the problem. The STS call mentioned here is the one implicitly made by the SDK to resolve credentials. From what I can see, the abort controller provided to the command, is not passed down to the implicit STS client that resolves credentials.

I'll discuss this with the team to see if this can be added.

Thanks again, Ran~

kuhe commented 6 months ago

@Llois41 the fix for correcting the default STS region was released in v3.515.0 of all SDK clients including KMS.

If you can take up the fix, do you still need AbortSignal access to the inner STS client?

If explicitly using a credential provider, an inner client config can also be passed that sets a requestTimeout on the inner client's requestHandler.

Llois41 commented 6 months ago

@kuhe Yes we would still appreciate the fix since we do not explicitly configure a credential provider, STS seems to be the default. It would make sense to pass the AbortSignal down since always something can go wrong (connection issues can have so many reasons) which would cause the KMS call to hang until connection timeouts occur.

kuhe commented 6 months ago

Using https://github.com/aws/aws-sdk-js-v3/releases/tag/v3.524.0, that is version v3.524.0 of the client, you can set a timeout on the STS request like this:

import { KmsClient } from '@aws-sdk/client-kms';
import { fromNodeProviderChain } from '@aws-sdk/credential-providers';

const client = new KmsClient({
  credentials: fromNodeProviderChain({
    clientConfig: {
      requestHandler: {
        requestTimeout: 1_000,
        connectionTimeout: 1_000,
      }
    }
  });
});

The clientConfig field of any credential provider will be passed to the inner STS client if it is used to fetch credentials.

Llois41 commented 6 months ago

@kuhe Thanks for the code example.

Is there any description of this behaviour in the docs (i.e. the differnece between requestTimeout and connectionTimeout? Reading https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/introduction/#abort-controller I would expect that the middleware stack also respects the AbortSignal. It sounds super counter-intuitive to have three different timeouts for one operation. We want the whole KMS call (including all middleware calls) to take a specific amount of time. We don't care about the underlying sometimes optional stuff going on. That would also expect us to keep track of all middleware functions because they may do something asynchronous which could take some time without regarding the given AbortSignal.

We would appreciate if the API would allow only one mechanism. :)