Closed perpil closed 4 months ago
The size difference in the following packages has increased our lambda deployment package size to over the hard limit of 262144000 bytes:
@mymattcarroll Although there was a bump in size, I was seeing it be on the order of kilobytes not megabytes. I limited my repro to one package though. Were you close to the 250 MB limit prior? Do you have numbers of your size between 3.575 and 3.577 or later? If you can drop some numbers in here, that may help determine whether it is the same cause or a different one.
You should mark fast-xml-parser
as external if you don't need an XML-based client.
Another 10kb, which we the SDK team should fix, is the inclusion of tslib
in 3.576.0, which can be removed.
After these fixes the difference should only be 134.6kb -> 140.3kb
test code:
npm init -y
npm i @aws-sdk/client-dynamodb@3.577.0 esbuild --save-exact
echo "import { DynamoDBClient, ListTablesCommand } from \"@aws-sdk/client-dynamodb\";\
\
export { DynamoDBClient, ListTablesCommand };" > index.ts
npx esbuild --bundle index.ts --outfile=dist/bundle.js --minify --analyze --format=esm --main-fields=module,main --platform=node \
--external:@aws-sdk/client-sts \
--external:@aws-sdk/client-sso* \
--external:@aws-sdk/credential-prov* \
--external:@aws-sdk/token-providers \
--external:fast-xml-parser
npm i @aws-sdk/client-dynamodb@3.575.0 --save-exact
npx esbuild --bundle index.ts --outfile=dist/bundle.js --minify --analyze --format=esm --main-fields=module,main --platform=node \
--external:@aws-sdk/client-sts \
--external:@aws-sdk/client-sso* \
--external:@aws-sdk/credential-prov* \
--external:@aws-sdk/token-providers \
--external:fast-xml-parser
When we publish the fix in #6149 you shouldn't have to do anything different anymore. Taking the dist-es (ESM, tree-shakeable) code for bundling should avoid importing both tslib and fast-xml-parser.
Turns out our problem was package-lock.json
containing duplicate versions of the AWS SDK packages. Regenerating our package-lock.json
fixed our issue.
Credit to @divporter for teaching me about the npm dedupe
command.
I just tried 3.588. I'm seeing better coldstart times, but still 50K in bloat. One thing I noticed was smithy/core was bundled as cjs instead of esm. Here's the breakdown of where the size increases are coming from if it helps:
Delta 3.575 -> 3.588 @smithy +56.1
@siddsriv is making the same change in @smithy/core
https://github.com/smithy-lang/smithy-typescript/pull/1297
Numbers for 3.590 are much more inline with 3.575. I think we can close now. Thanks for all your help here!
aws sdk | node version | avg(@initDuration) | bundleSize | # runs |
---|---|---|---|---|
3.575.0 | v20.13.1 | 246.79 | 316588 | 3 |
3.577.0 | v20.13.1 | 270.0433 | 463023 | 3 |
3.590.0 | v20.13.1 | 241.88 | 318138 | 1 |
Actually, hold on closing here. Although I saw better performance and size on the repro, I still saw poor performance when I updated my prod code to 3.590.0. I had to manually set @smithy/core
to 2.2.0
to get proper bundle size and still wasn't seeing the same performance. Let me dig in a bit.
3.592 updates the @smithy/core
issue above and fixes the bundle size issue I was having in my prod code (not the repro).
Here's some data from my production api. The best performance was from 3.568 for me, I'm seeing about a 50 ms coldstart delta still and I'm still digging into it.
@awsSdk | node | runtime | avg(initDuration ) |
avg(secrets loading ) |
max(bundleSize ) |
---|---|---|---|---|---|
3.590.0 | v20.13.1 | 2024-05-13T12:21:54.000Z | 294.255 | 136.25 | 1122060 |
3.592.0 | v20.13.1 | 2024-05-13T12:21:54.000Z | 299.41 | 134.875 | 840024 |
3.568.0 | v20.13.1 | 2024-05-13T12:21:54.000Z | 243.1 | 136 | 837877 |
I've updated the repro to include more clients client-sts
, client-dynamodb
,lib-dynamodb
and client-iam
. The performance difference between 3.575 and 3.592 is negligible, but you can see the 50 ms difference between 3.568 and 3.592 clearly here.
aws sdk | node version | avg(@initDuration) | bundleSize | # runs |
---|---|---|---|---|
3.568.0 | v20.13.1 | 213.5387 | 496644 | 8 |
3.575.0 | v20.13.1 | 269.3038 | 504269 | 8 |
3.592.0 | v20.13.1 | 265.4675 | 504954 | 8 |
50ms on these number is 25% slower, not sure I agree with the negligible part.
Would be great to understand what was the trade off for adding this and if there's anything at all that we can do to reverse it besides pinning to 3.575.
Hey @dreamorosi, the negligible is between 3.575 and 3.592. However, I then compared 3.568 to 3.592 and saw the 50ms difference when I replicated what I saw in my production code. I agree that 50ms is a big regression and you should be able to replicate my results with the code I've currently checked into the repo. I looked at a few things to see if I could identify the source and I found some tiny things (shared-ini-loader and one other package I'm blanking on) but nothing that explains the 50 ms. The source of the latency no longer seems to be related to code size because I tried minification and didn't see much difference.
Alright, here's some more benchmarks with my repro. Looks like the performance bombed in 3.575. I'd recommend taking a close look at https://github.com/aws/aws-sdk-js-v3/pull/6088
aws sdk | node version | avg(@initDuration) | bundleSize | # runs |
---|---|---|---|---|
3.568.0 | v20.13.1 | 206.165 | 211662 | 2 |
3.569.0 | v20.13.1 | 211.84 | 211837 | 2 |
3.572.0 | v20.13.1 | 214.2233 | 211837 | 3 |
3.574.0 | v20.13.1 | 206.9233 | 211837 | 3 |
3.575.0 | v20.13.1 | 269.46 | 214626 | 2 |
3.592.0 | v20.13.1 | 270.725 | 214940 | 2 |
Could you create a repro outside of Lambda that isolates the issue?
I have run the CDK example and the initDuration for v3.592.0 is in the 210-220 range. If there is an issue with the SDK it should be demonstrable in Node.js outside of Lambda.
Add a performance.now()
mark to the top of the handler file and then to the top of the handler function. This should correspond to the cold start initialization duration. I did this, and no difference in init duration was shown. I also looked at the diff between the two bundled files and there wasn't anything that would take 50ms.
I'll see what I can do. Just to confirm, you are comparing 3.574 to 3.592 and initializing multiple clients (like the repro) and you don't see much difference?
Yes, I used the provided repository and ran both those versions many times using the provided execution command examples in the readme.
I also tested the generated bundles themselves with additional timing reports, edited by hand, but those consisted merely of
const A = performance.now(); // top of file
...
const B = performance.now(); // top of handler function
...
console.log("init time", B - A); // before end of handler function
The contributing factors to initialization time in AWS Lambda are not entirely related to the user code, and fluctuate over time due to fleet provisioning. I don't know the details, so I'd like to remove that variance.
Not sure if I'm running into the same issue but I saw something similar when upgrading to a newer SDK version and found this issue. The size of my Lambda function increased from 3.8mb to 6.1mb. I also use the built in CDK NodejsFunction
which bundles with esbuild (CJS, not minifying the code).
When analyzing the bundle, it seems that @smithy/...
packages are now part of every AWS SDK package and responsible for the majority of the size.
AWS SDK 3.556.0 (click to enlarge):
AWS SDK 3.592.0 (click to enlarge):
@jaapvanblaaderen It could potentially be related. I was seeing something similar with esm modules until they made a change to the packaging.
@kuhe WRT to the performance.now()
benchmarking. I haven't been able to isolate this from lambda yet, but I do have some new performance benchmarks where I've wrapped the imports outside of the handler with performance.now()
. The @initTime
in the timings below shows that, but I haven't yet updated the repro code since I'm OOTO today. What is interesting is @initDuration
and @initTime
is consistently better for 3.574
and the percentage difference between the two SDKs is consistent for both performance.now and @initDuration
. I did take a look at the diff between the actual bundled code and it wasn't much between 3.574
and 3.575
. I'll see if I can isolate exactly what code is causing this performance regression.
@awsSdk | @node |
avg(initDuration ) |
max(@bundleSize ) |
runs | avg(@initTime ) |
min(@initTime ) |
max(@initTime ) |
pct(@initTime ,90) |
---|---|---|---|---|---|---|---|---|
3.574.0 | v20.13.1 | 213.1845 | 211917 | 11 | 7.0253 | 6.646 | 8.3802 | 7.2597 |
3.575.0 | v20.13.1 | 264.9182 | 214706 | 11 | 8.1337 | 7.5145 | 10.1601 | 8.8453 |
I've narrowed it down to the exact line that got added in 3.575
.
If I patch that line from:
export const keepAliveSupport = {
supported: Boolean(typeof Request !== "undefined" && "keepalive" in new Request("https://[::1]")),
};
To:
export const keepAliveSupport = {
supported: true,
};
Then the 50 ms latency hit disappears in 3.575. Wild guess: maybe DNS is getting involved and there is a 50 ms timeout?
I've narrowed it down to the exact line that got added in
3.575
.If I patch that line from:
export const keepAliveSupport = { supported: Boolean(typeof Request !== "undefined" && "keepalive" in new Request("https://[::1]")), };
To:
export const keepAliveSupport = { supported: true, };
Then the 50 ms latency hit disappears in 3.575. Wild guess: maybe DNS is getting involved and there is a 50 ms timeout?
Hi @perpil. How did you measure/narrow it down to that specific line? This is only relevant in Node 16 as Node 18+ has Request
object at global scope so keepAlive
will be true
. This change was also added 9 months ago. Test here:
export const handler = async (event) => {
const response = {
statusCode: 200,
body: JSON.stringify({
"keepAlive": Boolean(typeof Request !== "undefined" && "keepalive" in new Request("https://[::1]"))
}),
};
return response;
};
You also need to test initDuration
+ duration
in order to evaluate performance. Code outside handler contributes to initDuration
and inside contributes toduration
Thanks @perpil, that's a great help finding the most likely location of the problem. I will investigate how to fix it.
@richarddd the smithy dependencies were upgraded in 3.575
so this new code got pulled in. When you bundle using the repro further up in the thread you'll see these lines in the generated index.mjs
output which weren't in 3.574
:
// node_modules/@smithy/fetch-http-handler/dist-es/fetch-http-handler.js
var keepAliveSupport = {
supported: Boolean(typeof Request !== "undefined" && "keepalive" in new Request("https://[::1]"))
};
You'll also note, there are no references to that variable in the generated output, so it isn't necessary in the node environment. When I remove that code, or hardcode it to true, I no longer see the 50 ms hit to @initDuration
(it never showed up in @duration
because it is loaded outside of the handler). If you want to do an accurate repro with your code, you should move the code outside the handler, compare it to not using the Request at all, and look at the resulting @initDuration
. As I understand, fetch-http-handler
is only supposed to be used in browser environments and node-http-handler
is meant to be used when running in Lambda.
@kuhe Thanks for looking into it, I think there are a couple ways of solving it so reach out if you need help. One thing you mentioned was to try and use performance.now()
to wrap the code outside the handler to do the benchmark. The issue with this approach is when it is bundled, some of the code in index.mjs
is actually executed before any of the imports so you miss some of the latency. In the resulting bundle you get something like this, so @initDuration
is what you should be using for an accurate benchmark.
var keepAliveSupport = ...
var start = performance.now();
...other code
var end = performance.now()
If you update your install lockfile to pick up @smithy/fetch-http-handler@3.0.3
it should no longer perform the Request init in the module scope.
The reason fetch-http-handler appears now is that recently we added preliminary support for fetch
in Node.js, i.e. using FetchHttpHandler in Node.js. The stream collector mixin now brings in some cross-over functionality from the fetch-http-handler package.
@perpil sorry for the misunderstanding I can now reproduce the issue. Great finding! The reason for this regression is that when Request
class is being used, it's actually lazily loaded and pulls in a lot of dependencies (undicii etc) that adds this 40-50ms cold starts. @kuhe reproducible can be done by simply doing:
const a = Request;
export const handler = async (event) => {
const response = {
statusCode: 200,
body: JSON.stringify("slow"),
};
return response;
};
Comment out first line and we're back to normal. Request further pulls in more lazily evaluated classes such as URL/Stream etc when instantiated.
Source: https://github.com/nodejs/node/blob/f2381bc7cb120e43bb97ffeccd19dfdc946c8b2c/lib/internal/bootstrap/web/exposed-window-or-worker.js#L86
(basically a property that does require
)
@kuhe I can confirm SDK 3.598
with @smithy/fetch-http-handler
3.0.3
no longer has the coldstart issue. The Request
code is treeshaken from my bundle now that it isn't in the module scope. Thanks so much! 🚀
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.
Checkboxes for prior research
Describe the bug
I'm seeing a larger than normal regression in coldstart performance and treeshaken bundle size after 3.575. This chart shows cold start latency and esbuild built bundle size by sdk. It's a small sample size (3 runs), but I'm seeing +24 ms in coldstart and +146K in bundle size.
SDK version number
@aws-sdk/client-dynamodb@3.577.0
Which JavaScript Runtime is this issue in?
Node.js
Details of the browser/Node.js/ReactNative version
v20.13.1
Reproduction Steps
Read the README in this Repro
Observed Behavior
Increased coldstart times and bundle size.
Expected Behavior
Minimal impact to coldstart and bundle size on tree-shaken code.
Possible Solution
Looking at the bundle analyzer, the biggest size impact are from the @smithy/* packages, but it also seems like
fast-xml-parser
appeared too, I haven't dug much beyond that.Additional Information/Context
In my app that I upgraded, I saw higher impacts to coldstart performance (>90ms) between 3.563 and 3.577. I was using more clients however:
client-sts
,client-dynamodb
,client-iam
, andlib-dynamodb
. To keep things simple, I did the repro with just client-dynamodb.