Open robert-hanuschke opened 2 years ago
Your v2 repro code isn't calling .promise()
on the s3Client.getObject(...)
calls, so your Promise.all(...)
isn't actually waiting for anything. You can confirm this by hovering result
and seeing what TypeScript thinks the type is.
Thanks a lot for the heads up @Roryl-c , I missed that one. Corrected in an edit above as to send noone down the wrong path. Correcting the code there gets some improvement - we're now still looking at a 12 times worse performance on V3 though.
New representative output (including a console.log of the first element of the result to verify correct operation):
V2:
aws-sdk-v2$ time node dist/index.js <bucket_name>
2022-04-26T04:14:04.309Z
2022-04-26T04:14:05.326Z
159
{
AcceptRanges: 'bytes',
LastModified: 2022-04-25T12:48:57.000Z,
ContentLength: 0,
ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
ContentType: 'binary/octet-stream',
Metadata: {},
Body: <Buffer >
}
real 0m2,557s
user 0m0,764s
sys 0m0,061s
V3:
Thanks a lot for the heads up @Roryl-c , I missed that one. Corrected in an edit above.
Correcting the code there gets some improvement - we're now still looking at a 12 times worse performance on v3 though.
aws-sdk-v3$ time node dist/index.js <bucket_name>
2022-04-26T04:14:23.927Z
2022-04-26T04:14:25.164Z
159
{
'$metadata': {
httpStatusCode: 200,
requestId: undefined,
extendedRequestId: 'oaX1SPWbVP+SbotZ5EtrxulYZEMyVU8+JtrrjpoGHi1pXXQsxWvBqjy8V2uQ99SkpmxI1wS1+cE=',
cfId: undefined,
attempts: 1,
totalRetryDelay: 0
},
AcceptRanges: 'bytes',
Body: <ref *1> IncomingMessage {
_readableState: ReadableState {
objectMode: false,
highWaterMark: 16384,
buffer: BufferList { head: null, tail: null, length: 0 },
length: 0,
pipes: [],
flowing: null,
ended: true,
endEmitted: false,
reading: false,
sync: true,
needReadable: false,
emittedReadable: false,
readableListening: false,
resumeScheduled: false,
errorEmitted: false,
emitClose: true,
autoDestroy: false,
destroyed: false,
errored: null,
closed: false,
closeEmitted: false,
defaultEncoding: 'utf8',
awaitDrainWriters: null,
multiAwaitDrain: false,
readingMore: true,
dataEmitted: false,
decoder: null,
encoding: null,
[Symbol(kPaused)]: null
},
_events: [Object: null prototype] { end: [Array] },
_eventsCount: 1,
_maxListeners: undefined,
socket: TLSSocket {
_tlsOptions: [Object],
_secureEstablished: true,
_securePending: false,
_newSessionPending: false,
_controlReleased: true,
secureConnecting: false,
_SNICallback: null,
servername: '<bucket_name>.s3.us-east-1.amazonaws.com',
alpnProtocol: false,
authorized: true,
authorizationError: null,
encrypted: true,
_events: [Object: null prototype],
_eventsCount: 9,
connecting: false,
_hadError: false,
_parent: null,
_host: '<bucket_name>.s3.us-east-1.amazonaws.com',
_readableState: [ReadableState],
_maxListeners: undefined,
_writableState: [WritableState],
allowHalfOpen: false,
_sockname: null,
_pendingData: null,
_pendingEncoding: '',
server: undefined,
_server: null,
ssl: null,
_requestCert: true,
_rejectUnauthorized: true,
parser: null,
_httpMessage: [ClientRequest],
timeout: 0,
write: [Function: writeAfterFIN],
[Symbol(res)]: null,
[Symbol(verified)]: true,
[Symbol(pendingSession)]: null,
[Symbol(async_id_symbol)]: 48,
[Symbol(kHandle)]: null,
[Symbol(kSetNoDelay)]: false,
[Symbol(lastWriteQueueSize)]: 0,
[Symbol(timeout)]: null,
[Symbol(kBuffer)]: null,
[Symbol(kBufferCb)]: null,
[Symbol(kBufferGen)]: null,
[Symbol(kCapture)]: false,
[Symbol(kBytesRead)]: 34826,
[Symbol(kBytesWritten)]: 3387,
[Symbol(connect-options)]: [Object],
[Symbol(RequestTimeout)]: undefined
},
httpVersionMajor: 1,
httpVersionMinor: 1,
httpVersion: '1.1',
complete: true,
headers: {
'x-amz-id-2': 'oaX1SPWbVP+SbotZ5EtrxulYZEMyVU8+JtrrjpoGHi1pXXQsxWvBqjy8V2uQ99SkpmxI1wS1+cE=',
'x-amz-request-id': '8YZSWGXFWP7F92XJ',
date: 'Tue, 26 Apr 2022 04:14:26 GMT',
'last-modified': 'Mon, 25 Apr 2022 12:48:57 GMT',
etag: '"d41d8cd98f00b204e9800998ecf8427e"',
'accept-ranges': 'bytes',
'content-type': 'binary/octet-stream',
server: 'AmazonS3',
'content-length': '0'
},
rawHeaders: [
'x-amz-id-2',
'oaX1SPWbVP+SbotZ5EtrxulYZEMyVU8+JtrrjpoGHi1pXXQsxWvBqjy8V2uQ99SkpmxI1wS1+cE=',
'x-amz-request-id',
'8YZSWGXFWP7F92XJ',
'Date',
'Tue, 26 Apr 2022 04:14:26 GMT',
'Last-Modified',
'Mon, 25 Apr 2022 12:48:57 GMT',
'ETag',
'"d41d8cd98f00b204e9800998ecf8427e"',
'Accept-Ranges',
'bytes',
'Content-Type',
'binary/octet-stream',
'Server',
'AmazonS3',
'Content-Length',
'0'
],
trailers: {},
rawTrailers: [],
aborted: false,
upgrade: false,
url: '',
method: null,
statusCode: 200,
statusMessage: 'OK',
client: TLSSocket {
_tlsOptions: [Object],
_secureEstablished: true,
_securePending: false,
_newSessionPending: false,
_controlReleased: true,
secureConnecting: false,
_SNICallback: null,
servername: '<bucket_name>.s3.us-east-1.amazonaws.com',
alpnProtocol: false,
authorized: true,
authorizationError: null,
encrypted: true,
_events: [Object: null prototype],
_eventsCount: 9,
connecting: false,
_hadError: false,
_parent: null,
_host: '<bucket_name>.s3.us-east-1.amazonaws.com',
_readableState: [ReadableState],
_maxListeners: undefined,
_writableState: [WritableState],
allowHalfOpen: false,
_sockname: null,
_pendingData: null,
_pendingEncoding: '',
server: undefined,
_server: null,
ssl: null,
_requestCert: true,
_rejectUnauthorized: true,
parser: null,
_httpMessage: [ClientRequest],
timeout: 0,
write: [Function: writeAfterFIN],
[Symbol(res)]: null,
[Symbol(verified)]: true,
[Symbol(pendingSession)]: null,
[Symbol(async_id_symbol)]: 48,
[Symbol(kHandle)]: null,
[Symbol(kSetNoDelay)]: false,
[Symbol(lastWriteQueueSize)]: 0,
[Symbol(timeout)]: null,
[Symbol(kBuffer)]: null,
[Symbol(kBufferCb)]: null,
[Symbol(kBufferGen)]: null,
[Symbol(kCapture)]: false,
[Symbol(kBytesRead)]: 34826,
[Symbol(kBytesWritten)]: 3387,
[Symbol(connect-options)]: [Object],
[Symbol(RequestTimeout)]: undefined
},
_consuming: false,
_dumped: false,
req: ClientRequest {
_events: [Object: null prototype],
_eventsCount: 4,
_maxListeners: undefined,
outputData: [],
outputSize: 0,
writable: true,
destroyed: true,
_last: true,
chunkedEncoding: false,
shouldKeepAlive: true,
_defaultKeepAlive: true,
useChunkedEncodingByDefault: false,
sendDate: false,
_removedConnection: false,
_removedContLen: false,
_removedTE: false,
_contentLength: 0,
_hasBody: true,
_trailer: '',
finished: true,
_headerSent: true,
socket: [TLSSocket],
_header: '<REDACTED>',
_keepAliveTimeout: 0,
_onPendingData: [Function: noopPendingOutput],
agent: [Agent],
socketPath: undefined,
method: 'GET',
maxHeaderSize: undefined,
insecureHTTPParser: undefined,
path: '/027f19cca4a7119da720?x-id=GetObject',
_ended: false,
res: [Circular *1],
aborted: false,
timeoutCb: [Function: emitRequestTimeout],
upgradeOrConnect: false,
parser: null,
maxHeadersCount: null,
reusedSocket: true,
host: '<bucket_name>.s3.us-east-1.amazonaws.com',
protocol: 'https:',
[Symbol(kCapture)]: false,
[Symbol(kNeedDrain)]: false,
[Symbol(corked)]: 0,
[Symbol(kOutHeaders)]: [Object: null prototype]
},
[Symbol(kCapture)]: false,
[Symbol(RequestTimeout)]: undefined
},
BucketKeyEnabled: undefined,
CacheControl: undefined,
ChecksumCRC32: undefined,
ChecksumCRC32C: undefined,
ChecksumSHA1: undefined,
ChecksumSHA256: undefined,
ContentDisposition: undefined,
ContentEncoding: undefined,
ContentLanguage: undefined,
ContentLength: 0,
ContentRange: undefined,
ContentType: 'binary/octet-stream',
DeleteMarker: undefined,
ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
Expiration: undefined,
Expires: undefined,
LastModified: 2022-04-25T12:48:57.000Z,
Metadata: {},
MissingMeta: undefined,
ObjectLockLegalHoldStatus: undefined,
ObjectLockMode: undefined,
ObjectLockRetainUntilDate: undefined,
PartsCount: undefined,
ReplicationStatus: undefined,
RequestCharged: undefined,
Restore: undefined,
SSECustomerAlgorithm: undefined,
SSECustomerKeyMD5: undefined,
SSEKMSKeyId: undefined,
ServerSideEncryption: undefined,
StorageClass: undefined,
TagCount: undefined,
VersionId: undefined,
WebsiteRedirectLocation: undefined
}
real 0m28,922s
user 0m1,799s
sys 0m0,213s
Also crosschecked the Lambda code I mentioned, thankfully didn't miss the .promise() in the V2 implementation there. After reverting to V2 just for that specific functionality (the rest of the Lambda still runs on V3), it works again.
@robert-hanuschke Hi! Thanks for bringing this to our attention.
I was able to reproduce your findings observing these 2 scenarios:
Running the code for 87 items in S3 bucket: V3:
Time isntantiate s3 client: 136.60179090499878ms
Time to list objects in bucket: 654.9991250038147ms
87 subsequent GetObject calls: 7387.941290855408ms
V2:
Time isntantiate s3 client: 79.35779213905334ms
Time to list objects in bucket: 646.8728330135345ms
87 subsequent GetObject calls: 861.3942501544952ms
Running the code with 1 item in S3 bucket: V3:
Time isntantiate s3 client: 138.31212520599365ms
Time to list objects in bucket: 498.3790419101715ms
1 subsequent GetObject calls: 251.10545825958252 ms
V2:
Time isntantiate s3 client: 99.40116715431213ms
Time to list objects in bucket: 535.6090829372406ms
1 subsequent GetObject calls: 524.4567511081696ms
It appears that fetching more subsequent calls causes performance issues in V3, however single or fewer reads are faster in V3.
@trivikr, based on your suggestion I have inspected the CPU usage of the provided example and found out that for the multiple item reads from the bucket is in fact a subsequent call of getOjectCommand, more specifically:
@aws-sdk/node-http-handler/dist-cjs/node-http-handler.js 49:28
Im forwarding it to the dev team for further investigation.
Hi @robert-hanuschke ,
I recognize that these kind of edge cases can be frustrating; however, we prioritize fixes and features based on community engagement and feedback. At this time there is no plan to address this issue. If anything changes I will certainly let you know.
Many thanks!
Hi Ran,
thanks for reopening. I was worried it would be completely forgotten if closed. It having lower priority if it's an edge case I totally understand.
For the time being I can live with the workaround of having this call still on the old sdk and hope this issue helps people stumbling across it who are in the same situation.
I have witnessed something similar. Many keys retrieved my making an array of promises, where each promise reuses the same client and sends a GetObjectCommand instance will execute what seems like in sequence, instead of "in parallel" like we'd expect from promise.all.
What I can add though, is when in the same promise, I iterate through the stream and convert to a buffer, then the promises are properly executed in what seems like parallel.
A minimal code example here:
import { GetObjectCommand, S3Client } from '@aws-sdk/client-s3';
import { Readable } from 'stream';
import { BUCKET, KEYS } from './keys';
export function streamToBuffer(stream: Readable): Promise<Buffer> {
return new Promise((resolve, reject) => {
const chunks: Uint8Array[] = [];
stream.on('data', (chunk) => chunks.push(chunk));
stream.on('error', reject);
stream.on('end', () => resolve(Buffer.concat(chunks)));
});
}
export async function getObjectsFast(client: S3Client, bucket: string, keys: string[]): Promise<any[]> {
return Promise.all(keys.map(key => getObjectFast(client, bucket, key)));
}
async function getObjectFast(client: S3Client, bucket: string, key: string): Promise<any> {
const command = new GetObjectCommand({ Bucket: bucket, Key: key });
const result = await client.send(command);
const buffer = await streamToBuffer(result.Body as Readable);
return buffer;
}
async function getObjectSlow(client: S3Client, bucket: string, key: string): Promise<any> {
const command = new GetObjectCommand({ Bucket: bucket, Key: key });
const result = await client.send(command);
return result;
}
export async function getObjectsSlow(client: S3Client, bucket: string, keys: string[]): Promise<any[]> {
return Promise.all(keys.map(key => getObjectSlow(client, bucket, key)));
}
async function run() {
const client = new S3Client({ region: 'ap-southeast-2' });
console.log(KEYS.length); // 75
const t1 = new Date();
await getObjectsFast(client, BUCKET, KEYS);
console.log('took1:', new Date().valueOf() - t1.valueOf()); // 260
const t2 = new Date();
await getObjectsSlow(client, BUCKET, KEYS);
console.log('took2:', new Date().valueOf() - t2.valueOf()); //6200
}
void run();
3.121.0 of the lib Node v14.17.6
I recently ran into a very similar situation, where doing a lot of GetObjectCommands in rapid succession would slow down tremendously and eventually lock up the process. Our current workaround is to make a new client instance for each GetObjectCommand and destroying it as soon as the read is done:
// Before
const client = new S3Client();
async function getFileStream() {
const response = await client.send(new GetObjectCommand({ Bucket, Key }));
return response.Body;
}
// After
async function getFileStream() {
const client = new S3Client();
const response = await client.send(new GetObjectCommand({ Bucket, Key }));
response.Body.on('finished', () => client.destroy());
return response.Body;
}
I thought that the issue before might've been caused by the default keepAlive behavior of the SDK, but explicitly disabling that and lowering the maxSockets didn't seem to resolve the problem fully 🤔
@rijkvanzanten
I was able to reproduce this issue recently and found a solution. The problem as I can see comes from two areas:
Once the HTTP Agent sockets are all in use, the S3Clients will hang. Your "new client per instance" will work, but you pay a big performance penalty for the huge resource allocations for every request, and losing the TLS session cache and HTTP keep-alive.
Here's a setting I find that works in my environment. You can tune the maxSockets
and socketTimeout
to your environment.
const s3 = new S3Client({
// Use a custom request handler so that we can adjust the HTTPS Agent and
// socket behavior.
requestHandler: new NodeHttpHandler({
httpsAgent: new Agent({
maxSockets: 500,
// keepAlive is a default from AWS SDK. We want to preserve this for
// performance reasons.
keepAlive: true,
keepAliveMsecs: 1000,
}),
socketTimeout: 5000,
}),
});
Great solution @askldjd! Thanks for sharing 😄 Very curious to see what changed between v2 and v3 in the handling of those sockets which now causes the socket to hang rather than cleanup.
That's a good question. If I am reading correctly, V2 defaults the timeout to 2 minutes. However, V3 defaults the timeout to zero.
That might be the root cause.
Hi everyone,
Thank you for continuing commenting and providing information. @samjarman Thank you for the repro code, and @askldjd thanks for the workaround. I was able to confirm the described behavior (finally) and noticed that the performance increased with the suggested agent config. We are looking into this with priority.
Thank you for all for your help! Ran~
Node profiler output comparing v2 and v3
We are seeing a similar problem and I don't think this is much of an edge case.
We are pretty much just piping the files from s3 to the client
const commandResult = await client.send(command);
commandResult.Body.pipe(response);
And if there is a file that takes like a minute to download (slower devices/slower internet) If you download more an more from the server on different clients we see the download go slower and slower until it stops, this is even with the work around.
I may be misunderstanding how to do this, but a similar flow worked great from v2.
https://github.com/aws/aws-sdk-js-v3/issues/3560#issuecomment-1484140333 the https.Agent configuration above is not exactly a workaround, it is the correct and recommended way to increase the socket pool.
I will submit a change that begins the socket timeout countdown before the socket event as requested in https://github.com/aws/aws-sdk-js-v3/issues/3722 that may be related to @ToyboxZach 's problem.
@kuhe @RanVaknin Workaround not working for the parallel getObject + upload requests, after around 100 items fetched/uploaded connection hangs concurrency 20-50 S3 client initialized on lambda start Basically works once every ~20 times correctly
Single client for both download and upload not really working
the only option that somehow works consistently is to create 2 separate clients for the actual upload and download:
import {NodeHttpHandler} from '@smithy/node-http-handler';
import {Agent} from 'node:https';
const s3download = new S3({
region: process.env.AWS_REGION,
logger: console,
forcePathStyle: true,
requestHandler: new NodeHttpHandler({
httpsAgent: new Agent({
timeout: 5000,
maxSockets: 1000, // default 50
// keepAlive is a default from AWS SDK. We want to preserve this for
// performance reasons.
keepAlive: true,
keepAliveMsecs: 1000, // default unset,
}),
connectionTimeout: 1000,
requestTimeout: 5000, // default 0
}),
});
const s3upload = new S3({
region: process.env.AWS_REGION,
logger: console,
forcePathStyle: true,
requestHandler: new NodeHttpHandler({
httpsAgent: new Agent({
maxSockets: 1000, // default 50
keepAlive: false,
}),
connectionTimeout: 1000,
requestTimeout: 5000, // default 0
}),
});
And im kinda sure that this will also fail if enough files are supplied
Any updates on this issue? We are experiencing the issue on our side too. Comparing V2 request breakdown and V3 request breakdown side by side made by the datadog tracer. Made all suggestions about sockets from above. To explain the images below: V3 requests consists of the weird HTTP EC2META requests plus the actual S3 request. V2 has only the actual S3 request.
V2
V3
@1nstinct you provided a flame graph, but you should provide a minimal reproduction with code. Are you instantiating a client per request?
@kuhe yes, every request I instantiating a new client and destroying it after.
function read() {
return [
(req, res, next) => {
const modelId = 'model_id';
const model = req.params[modelId];
const client = new AWS.S3({
"region": "us-west-2",
"useAccelerateEndpoint": true,
"signatureVersion": "v4"
});
const begin = Date.now();
req.s3Params = {
Bucket: model.bucket,
Key: `${model.key}/${req.params[0]}`,
ResponseContentType: "application/octet-stream",
};
client.getObject(req.s3Params)
.createReadStream()
.on('error', err => {
if (err.statusCode === 404) {
return next(boom.notFound());
}
// TODO: Better error handling
next(err);
})
.on('end', () => {
// eslint-disable-next-line
if (DEBUG) console.log(`============== V2 file ${req.params[0]} download time: ${Date.now() - begin} ms`);
})
.pipe(res);
}
];
}
function readV3() {
return [
async (req, res, next) => {
try {
const modelId = 'model_id';
const model = req.params[modelId];
let instanceCredentials;
if (process.env.NODE_ENV === 'localhost') {
instanceCredentials = fromIni({
maxRetries: 5, // The maximum number of times any HTTP connections should be retried
timeout: 100, // The connection timeout (in milliseconds) to apply to any remote requests
});
} else {
instanceCredentials = fromInstanceMetadata({
maxRetries: 5, // The maximum number of times any HTTP connections should be retried
timeout: 100, // The connection timeout (in milliseconds) to apply to any remote requests
});
}
const serviceConfig = {
region: "us-west-2",
useAccelerateEndpoint: true,
// Use a custom request handler so that we can adjust the HTTPS Agent and
// socket behavior.
requestHandler: new NodeHttpHandler({
httpsAgent: new https.Agent({
maxSockets: 500, // avoid hanging because of all sockets are in use
// keepAlive is a default from AWS SDK. We want to preserve this for
// performance reasons.
keepAlive: true,
keepAliveMsecs: 1000,
}),
socketTimeout: 5000,
}),
credentials: instanceCredentials
};
const begin = Date.now();
req.s3Params = {
Bucket: model.bucket,
Key: `${model.key}/${req.params[0]}`,
ResponseContentType: "application/octet-stream",
};
const command = new GetObjectCommand(req.s3Params);
// new client per request
const client = new S3Client(serviceConfig);
const response = await client.send(command);
res.status(response.Body.statusCode);
response.Body.on('end', () => {
client.destroy();
// eslint-disable-next-line
if (DEBUG) console.log(`============== V3 file ${req.params[0]} download time: ${Date.now() - begin} ms`);
});
response.Body.pipe(res);
} catch (err) {
if (err.name === 'NoSuchKey') {
return next(boom.notFound());
}
// TODO: Better error handling
return next(boom.badRequest(err.message));
}
}
];
}
"dependencies": {
"@aws-sdk/client-s3": "^3.485.0",
"@aws-sdk/credential-providers": "^3.485.0",
"@aws-sdk/node-http-handler": "^3.374.0",
}
node -v v18.16.1
A single file download speed results. No multi thread downloads.
Avoid instantiating the client every request. Doing so causes the credentials to be fetched, in this case from ec2 metadata service.
@kuhe the performance for V3 is close to V2 now. But we can't use this solution because we started getting "socket hang up" error I believe because V3 client is global variable now - we do not initialize it every request.
looks like it's similar to what this guy reported https://github.com/aws/aws-sdk-js-v3/issues/5561#issuecomment-1850924931
I tried to used these socket configs:
{
"region": "us-west-2",
"maxSockets": 5000,
"keepAliveMsecs": 5000,
"socketTimeout": 120000,
"useAccelerateEndpoint": true
}
{
"region": "us-west-2",
"maxSockets": 500,
"keepAliveMsecs": 1000,
"socketTimeout": 5000,
"useAccelerateEndpoint": true
}
with every config we are getting the error.
I found this issue via a comment on this StackOverflow post, although as far as I can tell the issues are slightly different: https://stackoverflow.com/questions/73256172/nodejs-amazon-aws-sdk-s3-client-stops-working-intermittently
Should I open a new issue for this? I'm not the OP of the SO post, but I've noticed an issue as described there so I wonder if somebody with more insight can tell me if it's related. Every few days, an instance of S3Client just stops working - calling await client.send(new GetObjectCommand({ ... }))
does not throw an exception and does not return; it hangs indefinitely. It took me a while to narrow the issue down to this cause.
Edit I found this issue which seems to indicate that yes, it is related.
Been tracking this ticket and just wanted to throw in our experience. We have a node service running in Fargate on ECS that essentially acts as a proxy to images in S3 buckets. It works with streams, reading from the S3 bucket, applying some image transformations and returning the transformed images by writing to a stream (an ExpressJS response). It handles many concurrent requests.
We attempted an upgrade to the AWS SDK v3 and saw a memory leak as well as hanging responses. We'd see the node process completely get hung up and be non-responsive. Log output would would stop and it would then be terminated due to failed health checks since it couldn't respond.
Everything works perfectly fine with the v2 SDK, so we had to revert. It's worth noting that when using the v3 SDK the issues didn't show up immediately, it would take a few days or more before we'd have a task get hung up and terminated due to being non-responsive. We are only creating one instance of the client. https://github.com/aws/aws-sdk-js-v3/issues/4596 seems to be related to the issues discussed here and some are saying that the config in https://github.com/aws/aws-sdk-js-v3/issues/3560#issuecomment-1484140333 fixes the issue for them.
When upgrading to v3 do we still need to apply the changes from https://github.com/aws/aws-sdk-js-v3/issues/3560#issuecomment-1484140333 or should it be fine without it? If so, any idea why this would be required in v3 and was not in v2?
We are theorizing here without more specific reproduction examples, but one thing to keep in mind is that in the v2 SDK, the S3 objects are buffered by default when you request them (reading the stream).
In v3, the S3::getObject request only opens a stream. You must consume the stream or garbage collect the client or its request handler to keep the connections open to new traffic. One example, reading the stream into a string:
// v2
const get = await s3.getObject({ ... }).promise(); // this buffers (consumes) the stream already.
// v3
const get = await s3.getObject({ ... }); // object .Body has unconsumed stream.
const str = await get.Body.transformToString(); // consumes the stream.
So I managed to fix all my issues.
Now V2 and V3 performance is similar
The solution provided in here https://github.com/aws/aws-sdk-js-v3/issues/3560#issuecomment-1484140333 has worked great for us. I'm a little hazy on the details (it's been a minute), but IIRC it effectively "reverts" the v3 http agent settings back to aws-sdk v2's settings, which solved the problems for us:
import { S3Client } from "@aws-sdk/client-s3";
import { NodeHttpHandler } from "@smithy/node-http-handler";
import { Agent as HttpAgent } from "node:http";
import { Agent as HttpsAgent } from "node:https";
const connectionTimeout = 5000;
const socketTimeout = 120000;
const maxSockets = 500;
const keepAlive = true;
const client = new S3Client({
requestHandler: new NodeHttpHandler({
connectionTimeout,
socketTimeout,
httpAgent: new HttpAgent({ maxSockets, keepAlive }),
httpsAgent: new HttpsAgent({ maxSockets, keepAlive }),
}),
});
Our problem was that bad requests/operations would never timeout, which combined with the the low default maxSockets
meant at a certain point all sockets were in use for requests that were long timed out / dead, which in turn made our endpoints "hang" and be unresponsive.
👋
When downloading hundreds of files from s3 using v3 SDK we experience this issue on lambda which stops SILENTLY working after around 150 files most of the time. Sometimes it works, it depends on the gods of the network.
There are no warning or error logs, even inside the SDK when providing a logger. So we investigate for quite a long time before finding this issue and the possible workarounds.
Is it possible to have a log or an event or something to know when the SDK is stuck because of lack of available sockets?
I don't think that downloading 200 files to transform them using streams is an edge case and this issue deserves an improvement they would help troubleshooting the issue without reading hundreds of web pages on SO or GitHub.
@clakech did you try using higher values for maxSockets
?
@rijkvanzanten socketTimeout
is actually deprecated, you should use requestTimeout
instead
When using AWS Lambda, the "GetObjectCommand" is utilized to handle the same problem in AWS Lambda. When a client sends a request, an error message may occur:
WARN @smithy/node-http-handler:WARN socket usage at capacity=50 and 131 (any number over 50) additional requests are enqueued. See Node Configuring MaxSockets or increase socketAcquisitionWarningTimeout=(millis) in the NodeHttpHandler config.
I employed Lambda to automate the process of handling image uploads to S3 bucket A. The process involves listening to events in bucket A and triggering Lambda functions. The main steps in the Lambda function implementation are as follows:
One significant challenge encountered was the need for recursive calls within the same bucket. When a user uploads an image to S3, triggering Lambda to process and store the compressed image back into the same bucket can lead to infinite recursion. To address this, I implemented a tagging mechanism to skip compressed processing. However, this approach resulted in suboptimal function call costs and performance.
To mitigate the challenges and enhance performance, I made the following optimizations:
keepAlive
to false to resolve certain issues:const agentOptions = {
keepAlive: false,
maxSockets: 1000, // default 50
};
const httpAgent = new http.Agent(agentOptions);
const httpsAgent = new https.Agent(agentOptions);
const config = {
region,
credentials: {
accessKeyId,
secretAccessKey,
},
forcePathStyle: true,
requestHandler: new NodeHttpHandler({
httpAgent: httpAgent,
httpsAgent: httpsAgent,
}),
connectionTimeout: 1000,
requestTimeout: 5000, // default 0
};
const response = await client.send(getObjectcommand);
const imgData = await response.Body.transformToByteArray(); // consumes the stream
3. Properly destroyed the client when the operation is completed:
```javascript
await client.send(PutCommand);
client.destroy();
By implementing these optimizations, I successfully resolved the recursion issue and improved the overall function duration and performance. Future considerations will include separating the triggering and compression processes after bucket separation.
The problem is still there but in a different shape, right now we receive a bunch of the
Socket timed out without establishing a connection within 1000 ms
The problem is that retry does not really do anything, and just fails the actual request. Does anyone have any ideas for workaround?
We are theorizing here without more specific reproduction examples, but one thing to keep in mind is that in the v2 SDK, the S3 objects are buffered by default when you request them (reading the stream).
In v3, the S3::getObject request only opens a stream. You must consume the stream or garbage collect the client or its request handler to keep the connections open to new traffic. One example, reading the stream into a string:
// v2 const get = await s3.getObject({ ... }).promise(); // this buffers (consumes) the stream already.
// v3 const get = await s3.getObject({ ... }); // object .Body has unconsumed stream. const str = await get.Body.transformToString(); // consumes the stream.
I use "transformToByteArray" ,
is the same with "transformToString"?
const str = await get.Body.transformToByteArray(); // consumes the stream.
@cduyzh , It is the same in the sense that both transformToString
and transformToByteArray
consumes the stream.
There are multiple workarounds and suggestions on the thread and @kuhe has implemented a warning informing users of socket usage.
Multiple people have commented about the suggestions mentioned on this thread being helpful in mitigating this issue.
If you are experiencing this same problem please review the suggestions on this thread, specifically these:
transformToString()
(now documented here)Thanks, Ran~
We were receiving an exit with code 0 (related: https://github.com/aws/aws-sdk-js-v3/issues/4332, https://github.com/aws/aws-sdk-js-v3/issues/4029 ?) after about 50 requests. Thought it might be useful to share my experience below:
maxSockets
increased the number of calls we could make to s3.send()
(GetObjectCommand
) before the script prematurely exited but did not fix the issue.requestTimeout: 10000
has allowed us to seemingly work around the problem entirely and now it does not exit prematurely.Scripts exiting mid-execution with no error thrown seems like a critical issue. It was incredibly time consuming to debug.
For what it's worth, in Bun v1.1.25 this bug doesn't occur and that's part of what makes @aws-sdk/client-s3
faster in Bun (compared to Node)
❯ bun run.mjs
cpu: 13th Gen Intel(R) Core(TM) i9-13900
runtime: bun 1.1.25 (x64-linux)
benchmark time (avg) (min … max) p75 p99 p999
------------------------------------------------------------ -----------------------------
1 KB x 3 s3 upload 203 ms/iter (195 ms … 213 ms) 211 ms 213 ms 213 ms
1 KB x 3 s3 download 182 ms/iter (181 ms … 185 ms) 183 ms 185 ms 185 ms
❯ node run.mjs
cpu: 13th Gen Intel(R) Core(TM) i9-13900
runtime: node v22.5.1 (x64-linux)
benchmark time (avg) (min … max) p75 p99 p999
------------------------------------------------------------ -----------------------------
1 KB x 3 s3 upload 361 ms/iter (358 ms … 365 ms) 365 ms 365 ms 365 ms
1 KB x 3 s3 download 525 ms/iter (518 ms … 537 ms) 534 ms 537 ms 537 ms
We are running a Lambda with node18 and have recently run into this issue as soon as we hit more than three or four concurrent Lambdas. Then all of a sudden one of the Lambda instances just starting to just silently stop as soon as we try to use GetObjectCommand
.
As we had declared the S3 client in global scope and the Lambda was "hot" we tried with the work-around to set timeout and tune connection but we just now still got the same issue when we have a heavy load reading from S3 so it seems the connection tuning also hits the ceiling at a point and then we have the same behavior of a silent death...
Anyone that can tell me how to get some kind of logging event or better yet get the Lambda to throw when this happens?
We invoke the Lambda through SQS and use dead-letter queues but as the S3 SDK just happily commits suicide on the Lambda there's no backout to DLQ either and the SQS message is consumed so we are completely in the dark as when this happens!
I've opened two separate support tickets with AWS were they closed the first one blaming our code but the second is still open so happy I found this issue which confirms this is an issue with the SDK and not with our code...
I've now updated our code to re-instantiate the client in every function and "destroy" it once we are done but that gives it a performance hit which is substantial it seems, but is the only reliable work-around it seems...
Describe the bug
Executing many parallel S3 GetObjectCommand is extremely slow in direct comparison to v2 of the sdk at best, suspected of breaking Lambda executions at worst.
Your environment
SDK version number
@aws-sdk/client-s3@3.76.0
Is the issue in the browser/Node.js/ReactNative?
Node.js
Details of the browser/Node.js/ReactNative version
node -v: v14.18.3
Steps to reproduce
I created a bucket containing 159 files (0 byte, size does not seem to be a factor). Then I created the same functionality of getting those files in parallel in minimal scripts for both v3 and v2 of the AWS sdk.
TS Code for v3:
TS Code for v2:
Observed behavior
After transpiling, I executed both versions multiple times via
time node dist/index.js <bucket-name>
. There is a huge gap in the execution time between them. I added logs of the timestamp before and after the listObjects command to verify that command isn't the actual issue.Representative outputs for the difference in execution time I experienced across all runs consistently:
v2
v3
On my machine, it's "just" 20 times slower - a lambda function I have which does a similar thing (albeit with more files - around 1100) now after migration from v2 to v3 just returns "null" at that point in the execution when that is not an available return value in any of the execution paths of the code. No error message logged that I could provide unfortunately.
Expected behavior
Similar speed to v2 of the sdk in general, not just ending a Lambda execution.
Screenshots
Additional context