aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.04k stars 569 forks source link

The 'total' property is always undefined when uploading files larger than 5MB #4413

Open ti2sugiyama opened 1 year ago

ti2sugiyama commented 1 year ago

Checkboxes for prior research

Describe the bug

When upload large file using stream by Upload module on webApp, httpUploadProgress event object property 'total' always undefined.

SDK version number

@aws-sdk/lib-storage@3.264.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v18.13.0

Reproduction Steps

  1. setup Node.js project
  2. install @aws-sdk/client-s3 and @aws-sdk/lib-storage
  3. prepare upload file (larger than 5 MB)
  4. run the following program
  5. may see ev.total always undefined
const fs = require("fs");
const stream = require("stream");
const S3Client = require("@aws-sdk/client-s3").S3Client;
const Upload = require("@aws-sdk/lib-storage").Upload;

const s3 = new S3Client({
  // your s3 settings
});

async function main() {
  // Use a file larger than 5MB.
  const fileStream = fs.createReadStream("/path/to/your/over5mb/file.dat");
  const passThrough = new stream.PassThrough();
  // Pipe PassThrough to remove fileStream.path
  // to emulate browser file upload.
  fileStream.pipe(passThrough);

  const upload = new Upload({
    client: s3,
    params: {
      Bucket: // your Bucket,
      Key: // your Key,
      Body: passThrough,
    },
  });

  upload.on("httpUploadProgress", function (ev) {
    // ev.total always undefined
    console.log(ev);
  });

  await upload.done();
}

main();

By debugging the code, no assignment for this.totalBytes( references of 'total') anywhere excepted in constructor and function for one chunk

Observed Behavior

httpUploadProgress event object property 'total' always undefined. (In sample code, console output of ev.total is undefined to the end )

Expected Behavior

finally set the correct total file size (In sample code, final console output of ev.total is the correct total file size)

Possible Solution

No response

Additional Information/Context

aws-sdk v2 was recalculating this.totalBytes on readable event https://github.com/aws/aws-sdk-js/blob/b168eaab7aa04f8fc300b38bf67f9c26bf02c28a/lib/s3/managed_upload.js#L188

yenfryherrerafeliz commented 1 year ago

Hi @ti2sugiyama, thanks for opening this issue. I can confirm the reported behavior. I will investigate this further in order to define further steps on this.

Thanks!

yenfryherrerafeliz commented 1 year ago

@ti2sugiyama what you say in the comment below is true:

By debugging the code, no assignment for this.totalBytes( references of 'total') anywhere excepted in constructor and function for one chunk

I feel this issue just happens when the body given as parameter does not have any of the properties evaluated in the conditions here. As of right now I will mark this issue to be reviewed so it can be further addressed.

Steps to reproduce:

const client = new S3Client({ region: process.env.TEST_REGION, }) const pathToFile = "./test-file.txt" const data = "#".repeat(1024 1024 10); fs.writeFileSync(pathToFile, data); const fileStream = fs.createReadStream(pathToFile) const passThrough = new stream.PassThrough(); fileStream.pipe(passThrough) const upload = new Upload({ client: client, params: { Bucket: process.env.TEST_BUCKET, Key: process.env.TEST_KEY, Body: passThrough, } }) upload.on('httpUploadProgress', (progress) => { console.log('Progress: ', progress) }) await upload.done();



Thanks!
devnomic commented 7 months ago

Same issue. It's been a year, hopefully fixed soon.

jirimoravcik commented 5 months ago

Facing the same issue here. @yenfryherrerafeliz suggested this approach in https://github.com/aws/aws-sdk-js-v3/discussions/5165, maybe there should be a note that it doesn't work for files larger than 5MB?

Also, is using max of all ev.loaded instead of ev.total a reasonable workaround for now? Thanks!

beala commented 4 months ago

Any update on this?

aagenesds22 commented 3 months ago

Hello!

I managed to check every data type regarded as "accepted" within the Upload class: string | Uint8Array | Buffer | Readable | ReadableStream | Blob.

The findings:

I feel this issue just happens when the body given as parameter does not have any of the properties evaluated in the conditions here. As of right now I will mark this issue to be reviewed so it can be further addressed.

This is partially true, because the issue is strongly related to the Buffer conversion step, here and here. If the conversion to Buffer had been performed previously to defining the totalBytes in the initialization step here, it should work.

On a multipart upload, the __notifyProgress consumes the this.totalBytes value defined on initialization. This can be referenced here. And there you go: undefined as the value. The byteLength function is not intended to support streams, but works fine using just Buffer, Uint8Array or String.

I believe that a good solution can be achieved by strengthening the types first, replacing the "any" here which in fact is the real cause of the bug: a weak type check allowing for multiple incompatible inputs for this function. Matching type with this type definition should help. Secondly, additional code has to be added inside that function for streams on initialization. These strategies: 1 and 2 might be reutilized to that end.

I hope this helps towards a definitive solution 👍

zshzbh commented 2 months ago

Hey all,

I am able to replicate the issue when working with a file of 11 megabytes in size. The relevant package versions being utilized are:

"@aws-sdk/client-s3": "^3.614.0" "@aws-sdk/lib-storage": "^3.614.0"/ "@aws-sdk/lib-storage@3.264.0"

The observed result during the file upload process is as follows:

Progress:  {
  loaded: 2154984,
  total: undefined,
  part: 3,
  Key: 'imag_older_version',
  Bucket: 'new-bucket-maggie-ma'
}
Progress:  {
  loaded: 7397864,
  total: undefined,
  part: 1,
  Key: 'imag_older_version',
  Bucket: 'new-bucket-maggie-ma'
}
Progress:  {
  loaded: 12640744,
  total: undefined,
  part: 2,
  Key: 'imag_older_version',
  Bucket: 'new-bucket-maggie-ma'
}

Please note that sensitive information, such as bucket names, has been replaced with placeholders for privacy and security purposes.

Steps to reproduce:

  1. Create an empty node project: npm init -y
  2. Add the dependencies: npm i @aws-sdk/client-s3 npm i @aws-sdk/lib-storage
  3. Then, create an empty file. Ex: src/index.js
  4. add a file which is over 5 mb, run the code :

import { S3Client } from "@aws-sdk/client-s3";
import { Upload } from "@aws-sdk/lib-storage";
import * as fs from "fs";
import * as stream from "stream";

const client = new S3Client({
  // your region
});
const fileStream = fs.createReadStream("./your_file_name");
const passThrough = new stream.PassThrough();
fileStream.pipe(passThrough);
const upload = new Upload({
  client: client,
  params: {
    Bucket: "your_bucket_name",
    Key: "your_key",
    Body: passThrough,
  },
});
upload.on("httpUploadProgress", (progress) => {
  console.log("Progress: ", progress);
});

await upload.done().then((x) => {
  console.log("Upload done", x);
});

@aagenesds22 Regarding adding strict type suggestion - I understand your point. Since this is not a public API, strict typing may not be a necessity. Additionally, introducing dependencies on platform-specific types, especially those from the global namespace, can potentially lead to issues and complications.

I appreciate you taking the time to clarify this point and provide additional context. I have brought this up to the AWS SDK JS team, it's queued now.

Thanks! Maggie