aws / aws-sdk-js

AWS SDK for JavaScript in the browser and Node.js (In Maintenance Mode, End-of-Life on 09/08/2025). The AWS SDK for JavaScript v3 in the browser and Node.js is available here: https://github.com/aws/aws-sdk-js-v3
https://aws.amazon.com/developer/language/javascript/
Apache License 2.0
7.6k stars 1.55k forks source link

S3 ManagedUpload is too slow without listObjectV2 call #4557

Closed hhvys closed 7 months ago

hhvys commented 11 months ago

Describe the bug

Sample code:

const s3 = new S3({});
const stream = klaw(staticAssetDirPath);

const uploadQueue: Array<AsyncFunction<void, Error>> = [];
stream.on('data', ({ path, stats }) => {
  if (!stats.isFile()) {
    return;
  }
  objectsToUploadCount += 1;

  uploadQueue.push(
    asyncify(async () => {
      const key = createFileKeyForStorage(staticAssetDirPath, path, awsConfig.bucketPrefix);

      try {
        const upload = new S3.ManagedUpload({
          service: s3,
          params: {
            Bucket: awsConfig.bucketName,
            Key: key,
            Body: fs.createReadStream(path),
            CacheControl: 'public, max-age=31536000, immutable, no-transform',
            ContentType: mime.getType(path) ?? 'application/octet-stream',
          },
        });

        const start = Date.now();
        await upload.promise();
        const timeSpent = (Date.now() - start) / 1000;
        timeSpentArray.push({ key, timeSpent });
      } catch (ex) {
        console.error(ex);
        process.exit(1);
      }
    })
  );
});

await streamToPromise(stream as Readable);
const promisifiedParallelLimit = util.promisify(parallelLimit);
await promisifiedParallelLimit(uploadQueue, MAX_CONCURRENT_REQUESTS);

Trying to upload 20 files concurrently using this code on s3 bucket. It takes around 10x more time then the code below to upload all files to s3 bucket. Total no. of files being uploaded are around 20000.

const s3 = new S3({});

// Just adding these lines decreases upload time of all files to 10%
await s3
    .listObjectsV2({
      Bucket: bucketName,
      Prefix: bucketPrefix,
      MaxKeys: 1,
    })
    .promise();

const stream = klaw(staticAssetDirPath);

const uploadQueue: Array<AsyncFunction<void, Error>> = [];
stream.on('data', ({ path, stats }) => {
  ... // same code as above
});

... // same code as above

Expected Behavior

The upload time should not depend on the execution of listObjectV2 api.

Current Behavior

Upload time changes drastically if I add invoke listObjectV2 before using MangedUpload to upload files.

Reproduction Steps

Provided in the Bug Description

Possible Solution

No response

Additional Information/Context

No response

SDK version used

2.1138.0

Environment details (OS name and version, etc.)

Rocky 8 Linux, Node 18 using Docker

RanVaknin commented 8 months ago

Hi @hhvys ,

The behavior you described of calling a list operation prior to making an upload to make it go faster is not a behavior I have ever heard of, especially when you are not making use of any results in the list operation.

I see that you have some logging there. Are you able to provide more logs, when requests are sent, and responses received, with and without the list call?

Thanks, Ran~

github-actions[bot] commented 8 months ago

This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.