Closed alfaproject closed 4 months ago
linting started running out of memory, and again, more than 2 to 3 times slower
Are you linting the SDK's distributed code? The amount of runtime code in the user application would not have changed, so the time your linter spends passing over your source code should not change.
lot of our lambdas started to straight timing out
The v3 SDK does not inherently increase the connection time to DynamoDB from 50ms to 1000ms, since both use the node:https module.
How many clients do you initialize per handler execution, and how many requests are made per client before it is garbage-collected? You may want to set the node:https agent max sockets value higher and/or reduce the number of client initializations. Example.
If any streams are opened by requests, ensure they are closed by reading the data or discarding it. For example, if you make a request to S3::getObject, the response body stream must be read or closed.
TS just struggles massively to do anything productive
If you are using Jest against TS source code, ensure that it is using babel's transpile without typecheck mode and not ts-jest.
Try out skipLibCheck
for tsconfig.
Please create a sample repository that demonstrates the issue(s). In my testing, TS compilation time, bundling size, and runtime performance are better in v3. Here is my latest test: https://github.com/kuhe/aws-sdk-js-sample.
I get what you are trying to say with that sample and don't get me wrong but that's far from a production-like setup unless you use lambda for fun. We have all the best practises for everything that you mentioned and use SWC for Jest because ESBuild wasn't cutting it either, so imagine.
Perhaps it's better to look at a simple example from some real code that we are allowed to share. We have a VERY THIN layer for the AWS SDK, mostly because we wanted to have a smooth transition and would be easy to revert back if needed, which it was.
Simple DynamoDB Client factory using AWS SDK v2:
import { awsRegion } from '@tma/node-env';
import DynamoDb, { DocumentClient } from 'aws-sdk/clients/dynamodb';
import { keepAliveAgent } from '../keep-alive-agent';
let client: DocumentClient | undefined;
export function dynamoDbClient() {
if (!client) {
const dynamodbClient = new DynamoDb({
httpOptions: {
agent: keepAliveAgent,
connectTimeout: 50,
timeout: 5000,
},
region: awsRegion(),
retryDelayOptions: {
base: 10,
},
});
client = new DocumentClient({ service: dynamodbClient });
}
return client;
}
That's battle tested, we have billions of DynamoDB connections per month using that factory and VERY RARELY connections time out.
The exact same thing using AWS SDK v3:
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient } from '@aws-sdk/lib-dynamodb';
import { NodeHttpHandler } from '@smithy/node-http-handler';
let client: DynamoDBDocumentClient | undefined;
export function dynamoDbClient() {
if (!client) {
const dynamodbClient = new DynamoDBClient({
requestHandler: new NodeHttpHandler({
connectionTimeout: 1000,
requestTimeout: 5000,
}),
});
client = DynamoDBDocumentClient.from(dynamodbClient, {
marshallOptions: {
convertClassInstanceToMap: true,
removeUndefinedValues: true,
},
});
}
return client;
}
VERY RARELY a connection doesn't time out. Mind you, this is just an example but it's as good as any, maybe there's an obvious mistake but we pretty much followed the migration guide and tried to be careful.
what version of JS SDK v2 were you using?
For your JSv3 code, have you experimented with setting the maxSockets and maxAttempts configuration like so?
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient } from '@aws-sdk/lib-dynamodb';
import { NodeHttpHandler } from '@smithy/node-http-handler';
import https from "https";
let client: DynamoDBDocumentClient | undefined;
export function dynamoDbClient() {
if (!client) {
const dynamodbClient = new DynamoDBClient({
maxAttempts: 6, // default is 3
requestHandler: new NodeHttpHandler({
connectionTimeout: 1000,
requestTimeout: 5000,
httpsAgent: new https.Agent({
maxSockets: 400 // default is 50, maximum may be 1000 based on your envrionment
}),
}),
});
client = DynamoDBDocumentClient.from(dynamodbClient, {
marshallOptions: {
convertClassInstanceToMap: true,
removeUndefinedValues: true,
},
});
}
return client;
}
Adjust the numbers based on your application's usage pattern.
Hi @alfaproject ,
I tested your code with 10,000 concurrent writes from lambda, and also with a fewer writes but very big and nested items and was not able to force the problem you are describing of connections timing out. (instead was hitting throughput limit exception or lambda size exception that were solved by increasing each limit)
In order to further investigate this and help you on your path of upgrading to v3, we would like to setup a call with you to discuss the problem in detail.
Setting up a meeting with the SDK team can be done by opening a support ticket through the AWS console. To speed things up you can provide a link to this issue and the Technical Account Manager should be able to quickly reach out to us in order to schedule some time.
For the meeting to be fruitful, we will need some sort of reproducible code. I know you mentioned you can't share any specifics about your actual code, but the example you provided did not reproduce the reported issue. Since the issue you were describing is not reproducible, the SDK team cannot identify the problem and take action on fixing it. Once we have that more robust repro code, we could make better use of the time in the meeting and come prepared with observations and suggestions.
Thanks, Ran~
This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.
Yes, there is still a problem, but it took us more than 6 months to do the necessary code changes for the migration so our budget for this migration has depleted and our roadmap is quite filled until end of the year. I'd still like to find a way in my own out-of-work free time to try and do a minimal repo or try to get more budget for this upgrade from work. Either way it's still a problem.
I finished migrating a large service from v2 to v3 yesterday, and we are experiencing several memory overflow and timeout issues in Lambda functions that we never had before. It's hard to say precisely what it might have been, but it's related to SDK v3. It seems to me that there are a lot of default client initialization values that were previously not defined the way they are.
Hi @renanwilliam,
Can you please open a separate issue, share the actual error message you are seeing and your entire lambda code so we can help you effectively?
Thanks, Ran~
I have investigated our performance issue and found that it is unrelated to SDK v3. Briefing the investigation:
Lets play a game: Guess when we migrated from node16/aws-sdk to node18/@aws-sdk
Hi @renanwilliam ,
Im happy to hear that this was not due to the SDK.
@andreujuanc, without further details about your specific performance implications it's hard to take action. Please create a separate issue highlighting the differences, the environment from which you are running it, your code, and any other information that would enable us to help you.
Since no reproduction code was given, and other concerns have seemed to have been addressed Im going to close this issue. If this persists for anyone else, please open a separate issue and provide as much detail as possible so we can better assist you.
All the best, Ran~
Since no reproduction code was given, and other concerns have seemed to have been addressed Im going to close this issue. If this persists for anyone else, please open a separate issue and provide as much detail as possible so we can better assist you.
We literally just swapped libraries from v2 to v3 and updated from node 16 to 18.
Have you been able to check my comment? Is related to node 18 memory handling and the "--max-old-space-size" parameter.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.
Checkboxes for prior research
Describe the bug
After being pestered for months that we need to update to the latest SDK version, we finally did the work.
Things we noticed right away:
tl;dr I really don't understand why you are pushing people to move from v2 to v3. It's simply not usable. We have 12 billion lambda invocations per month atm and from our initial tests, if we went to production with this new SDK, we'd have to start paying at least 4 times more. Not to mention that horrible developer experience since TS just struggles massively to do anything productive.
Sadly, I can't share any code from our company but trust me on this. We know what we are doing, it's the exact same code base, just a different SDK, we double checked every single thing that we could. I will use this report for our account manager as well.
SDK version number
@aws-sdk/client-apigatewaymanagementapi@3.377.0, @aws-sdk/client-cloudwatch@3.377.0, @aws-sdk/client-dynamodb@3.377.0, @aws-sdk/client-glue@3.377.0, @aws-sdk/client-kinesis@3.377.0, @aws-sdk/client-lambda@3.377.0, @aws-sdk/client-personalize-events@3.377.0, @aws-sdk/client-personalize-runtime@3.377.0, @aws-sdk/client-quicksight@3.377.0, @aws-sdk/client-rds-data@3.377.0, @aws-sdk/client-s3@3.377.0, @aws-sdk/client-ses@3.377.0, @aws-sdk/client-sns@3.377.0, @aws-sdk/client-sqs@3.377.0, @aws-sdk/client-sts@3.377.0, @aws-sdk/credential-providers@3.377.0, @aws-sdk/lib-dynamodb@3.377.0, @aws-sdk/lib-storage@3.377.0, @aws-sdk/s3-request-presigner@3.377.0, @aws-sdk/types@3.370.0, @aws-sdk/util-dynamodb@3.377.0, @aws-sdk/util-utf8@3.374.0,
Which JavaScript Runtime is this issue in?
Node.js
Details of the browser/Node.js/ReactNative version
v16.18.1
Reproduction Steps
Migrate any production ready software platform from v2 to v3. We use a monorepo so all of our 400 lambdas are in the same repository in different 'micro-services' (CF stacks).
Observed Behavior
Developer experience is horrible. Everything takes 5 times the time, linting, testing, building, anything that TypeScript has to import from the thousands of files in the
@aws-sdk
foldersWe could live with the above if the production side of things after building everything into a small package for node to run had performance improvements, but not, it's just as bad. Somehow even the bundled version uses more memory, and CPU resources from lambdas which basically threw all of our fine tuning away (memory and timeout configuration for each lambda had to be increased across the board)
Expected Behavior
Same developer experience. Any medium/big application should have linting and testing experience similar to v2 experience.
More importantly, same or better runtime experience or else what's the point? Having to increase our lambdas memory and timeout configuration to be able to have the same user-facing experience as before is honestly not going to happen. We are not going to increase our lambda costs overnight just because AWS wants us to update to a framework that simply isn't ready, sorry.
Possible Solution
For developer experience you can do what most other big SDKs do and concatenate the TS files into a single index.d.ts for each package
But for runtime performance, I actually have no idea what you did that made it so bad compared with v2. You will have to investigate.
Additional Information/Context
I can share anything else that isn't private information from our company and I'm happy to have a video call with any systems architect from AWS via our account manager or otherwise.