aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.03k stars 569 forks source link

MIGRATION ISSUE: Slower requests on V3 #5747

Closed fabiokr closed 2 weeks ago

fabiokr commented 7 months ago

Pre-Migration Checklist

Which JavaScript Runtime is this issue in?

Node.js (includes AWS Lambda)

AWS Lambda Usage

Describe the Migration Issue

While migrating an application from sdk v2 to v3, I noticed that the test suite times have doubled. We use localstack for local development. We also tested the application on a production environment with a Lambda on a VPC, and noticed the same behavior, each request apparently takes more time to finish in V3, which affected our Lambdas runtime duration.

Code Comparison

V2 Example

// package.json
{
  "name": "test",
  "version": "1.0.0",
  "main": "index.js",
  "license": "N/A",
  "private": true,
  "dependencies": {
    "aws-sdk": "^2.1539.0"
  }
}
// index.js
const DynamoDB = require('aws-sdk/clients/dynamodb');

const dynamodb = new DynamoDB({
  region: "us-east-1",
  endpoint: "http://localhost:4566",
  accessKeyId: 'localstack',
  secretAccessKey: 'localstack',
});

async function run() {
  for(let i = 0; i < 100; i++) {
    await dynamodb.putItem({
      TableName: "products",
      Item: {
        id: {
          S: `${i + 100}`
        },
        name: {
          S: `product-${i + 100}`
        }
      }
    }).promise();
  }
}

run().then(() => { console.log("DONE") }).catch((e) => { console.error(e) });

V3 example

// package.json
{
  "name": "test",
  "version": "1.0.0",
  "main": "index.js",
  "license": "N/A",
  "private": true,
  "dependencies": {
    "@aws-sdk/client-dynamodb": "^3.496.0"
  }
}
// index.js
const { DynamoDB } = require('@aws-sdk/client-dynamodb');
const { NodeHttpHandler } = require('@smithy/node-http-handler');
const { Agent: HttpAgent } = require("node:http");
const { Agent: HttpsAgent } = require("node:https");

const dynamodb = new DynamoDB({
  region: "us-east-1",
  endpoint: "http://localhost:4566",
  credentials: {
    accessKeyId: 'localstack',
    secretAccessKey: 'localstack',
  }
});

async function run() {
  for(let i = 0; i < 100; i++) {
    await dynamodb.putItem({
      TableName: "products",
      Item: {
        id: {
          S: `${i + 100}`
        },
        name: {
          S: `product-${i + 100}`
        }
      }
    });
  }
}

run().then(() => { console.log("DONE") }).catch((e) => { console.error(e) });

Observed Differences/Errors

Running those on my local machine this is what I get:

# V2
time node index.js
DONE
node index.js  0,31s user 0,07s system 25% cpu 1,494 total
# V3
time node index.js
DONE
node index.js  0,57s user 0,10s system 9% cpu 6,668 total

Additional Context

No response

RanVaknin commented 7 months ago

Hi @fabiokr ,

Are you bundling your application and uploading to lambda? Or are you using the Lambda provided SDK?

Thanks, Ran~

fabiokr commented 7 months ago

Hi @fabiokr ,

Are you bundling your application and uploading to lambda? Or are you using the Lambda provided SDK?

Thanks, Ran~

Hi, I am bundling my application and uploading to Lambda.

RanVaknin commented 2 weeks ago

Hi @fabiokr ,

Sorry for the late response. The time library you used to measure the SDK timing is measuring the roundtrip time and not the SDK time. This includes the SDK request build time + the time it takes the runtime and OS's networking layer to establish the connection, traversing through your VPC, making it to localstack and roundtripping back. It is an incorrect way to measure performance because it does not really measure the SDK's part in the request response lifecycle but rather it as a small component in the larger network roundtrip.

Using your code and running the SDK in isolation (not in Lambda) just a local program to strip any infrastructure layers that will add to the roundtrip time:

// v3:
async function run() {
  let totalDuration = 0n;
  for (let i = 0; i < 100; i++) {
      const command = new PutItemCommand({
          TableName: 'products',
          Item: {
              id: { S: `${i + 100}` },
              name: { S: `product-${i + 100}` }
          }
      });

      const startTime = process.hrtime.bigint();
      try {
          await client.send(command);
      } catch (error) {
          console.error(`Error with request ${i}:`, error);
          continue; 
      }
      const endTime = process.hrtime.bigint();
      const duration = endTime - startTime;
      totalDuration += duration;
  }
  const totalMilliseconds = Number(totalDuration / 1000000n); 
  console.log(`Uploaded 100 items in ${totalMilliseconds}ms`);
}

run()
// Uploaded 100 items in 8728ms
// v2:
async function run() {
    let totalDuration = 0n; 
    for (let i = 0; i < 100; i++) {
        const request = dynamoDB.putItem({
            TableName: 'products',
            Item: {
                id: { S: `${i + 100}` },
                name: { S: `product-${i + 100}` }
            }
        });

        const startTime = process.hrtime.bigint();
        await request.promise();
        const endTime = process.hrtime.bigint();
        const duration = endTime - startTime;
        totalDuration += duration;
    }
    const totalMilliseconds = Number(totalDuration / 1000000n); 
    console.log(`Uploaded 100 items in ${totalMilliseconds}ms`);
}

run().catch(error => console.error(error));

// Uploaded 100 items in 26976ms

As you can see using code similar to yours the v2 SDK is about 3 times as slow than v3. In this case Im just running this from my local NodeJS application, so there is no Lambda cold / hot start times involved, no VPC, no localstack dynamo clone. Instead I use the same architecture and node version to run these two code snippets with the same networking stack and running it against the actual dynamodb server. So while this test does not accurately measure the SDK's performance, we can at least say definitively that v2 is slower to roundtrip requests.

If you want to be accurate in your assessment of performance you need to isolate the SDK's request build, sign and serialization and deserialization times from the time the request spends roundtripping to and from your lambda. V2 and v3 both have methods you can "hook into" to measure things at different stages of the request building. V2 uses request handlers and v3 uses the middleware stack .

We have a have an entire benchmarking blogpost that showcases how v3 is performing better on Lambda.

I'm not sure about how your tests are implemented, but it might have to do with the Node version differences between your two lambdas (if there is one) or any implementation details that might have relied on v2 behavior that is different in v3.

Without more concrete performance profiling this will not be really actionable.

Thanks again, Ran~

fabiokr commented 2 weeks ago

@RanVaknin Thanks for the detailed response. I tried reproducing this again with the latest version and they now perform quite similarly. I'll close this ticket.

github-actions[bot] commented 4 days ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.