aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
2.98k stars 560 forks source link

Lib-storage upload blob as multipart broken by recent pull #5949

Closed JackFromMoo closed 3 months ago

JackFromMoo commented 3 months ago

Checkboxes for prior research

Describe the bug

Attempting to pass a Blob such as a File as the options.params.body of an Upload will result in an error if the file is large enough to trigger a multipart upload. This issue will impacts some webpack users building their apps for browser.

The error is: TypeError: "list" argument must be an Array of Buffers This bug was introduced in pull request #5078

The root of the error is from the node-libs-browser implementation of Node's buffer, node-libs-browser. This library uses https://github.com/feross/buffer for its implementation of buffer.

When lib/lib-storage/src/chunks/getChunkStream.ts calls Buffer.concat(currentBuffer.chunks) on line 27, it produces the error. This is because currentBuffer.chunks is populated by lib/lib-storage/src/chunks/getDataReadableStream.ts, which was changed in the pull request. getDataReadableStream.ts used to only produce Buffers, but the pull request allowed it to also return values that are instances of Uint8Array. The browser implementation of Buffer.concat() explicitly checks each element of the given Array using Buffer.isBuffer(element), which will return false if "element" is a Uint8Array.

SDK version number

@aws-sdk/lib-storage@v3.532.0+

Which JavaScript Runtime is this issue in?

Browser

Details of the browser/Node.js/ReactNative version

Browser is Chrome Version 122.0.6261.139 (Official Build) extended (64-bit)

Reproduction Steps

Not sure how to provide a snippet since my code's a Vue app built using Webpack. Note that you probably will not be able to reproduce this in Node since the base Node Buffer.concat() will probably accept Uint8Arrays as elements. I assume this is how the pull request was accepted.

Simply use lib-storage, create a new Upload(), and have the options.params.body be a File object such as that from an HTML

import { Upload } from "@aws-sdk/lib-storage";
import { S3Client } from "@aws-sdk/client-s3";

const file = // your File here
const awsInfo = {
          credentials: {
            accessKeyId: /*bucket access key*/,
            secretAccessKey: /*bucket secret access key*/,
            expiration: /*bucket key expiration*/,
            sessionToken: /*session token*/
          },
          awsRegion: /*bucket region*/,
          s3Bucket: /*bucket name*/,
          path: /*bucket path*/
}

const parallelUploads3 = new Upload({
        client: new S3Client({
            region: awsInfo.awsRegion,
            credentials: awsInfo.credentials
          }),
        params: {
            Key: awsInfo.path + fileName,
            Body: file,
            Bucket: awsInfo.s3Bucket
        },

        tags: [
          /*...*/
        ], // optional tags
        queueSize: 4, // optional concurrency configuration. Jack tested queueSize and partSize, these were optimal locally
        partSize: 1024 * 1024 * 5, // optional size of each part, in bytes, at least 5MB
        leavePartsOnError: false, // optional manually handle dropped parts
});

return await parallelUploads3.done()

Observed Behavior

The error: TypeError: "list" argument must be an Array of Buffers at c.concat (index.js:399:13) at getChunkStream.js:9:72 at f (getChunkStream.js:2:1) at Generator.<anonymous> (getChunkStream.js:2:1) at Generator.next (getChunkStream.js:2:1) at n (getChunkStream.js:2:1) at getChunkStream.js:2:1

Browser implementation of Buffer.concat:


  if (!isArray(list)) {
    throw new TypeError('"list" argument must be an Array of Buffers')
  }

  if (list.length === 0) {
    return Buffer.alloc(0)
  }

  var i
  if (length === undefined) {
    length = 0
    for (i = 0; i < list.length; ++i) {
      length += list[i].length
    }
  }

  var buffer = Buffer.allocUnsafe(length)
  var pos = 0
  for (i = 0; i < list.length; ++i) {
    var buf = list[i]
    if (!Buffer.isBuffer(buf)) {
      throw new TypeError('"list" argument must be an Array of Buffers')
    }
    buf.copy(buffer, pos)
    pos += buf.length
  }
  return buffer
}```

The "list" fed into Buffer.concat is created by:
1. getting the File's .stream() (inherited from Blob). Type = ReadableStream
2. getting the ReadableStream's reader using .getReader(). Type = ReadableStreamDefaultReader
3. repeatedly calling .read() on the ReadableStreamDefaultReader to produce a field called **value**
4. If **value** is a Buffer (Buffer.isBuffer(**value**) === true) or **value** is  a Uint8Array (**value** instanceof Uint8Array), **value** is added to an Array **AS-IS** (THIS IS THE SOURCE OF THE PROBLEM)
5. If **value** is any other type, getDataReadableStream attempts to coerce it into a Buffer using Buffer.from(value)

Buffer.concat() (as implemented above) complains that the elements of the Array are Uint8Arrays and not Buffers. This is the problem.

### Expected Behavior

The expected behavior is for the upload to complete without error (**value** is coerced into a Buffer as it was prior to #5078, upload succeeds rather than failing immediately).

### Possible Solution

I haven't looked into the implementation of #5078 enough to provide a recommendation that wouldn't break it, but simply reverting the changes it introduced to getReadableDataStream.ts would resolve the problem. It may cause other problems with the rest of the code in that PR, it was fairly large and I don't have time to look over it.

Maybe if @kuhe takes a look at this issue he can determine whether this would cause problems. I imagine it will nullify much of the performance benefits of the PR :(

### Additional Information/Context

_No response_
harfangk commented 3 months ago

I'm having this exact same issue, with exact same setting of Vue.js with Webpack. After some digging, I found out that it's caused by using ancient version of https://github.com/feross/buffer.

aws-sdk-js-v3 requires buffer version 5.6.0, which can properly handle UIntArray8, so it's not to blame.

The code you pasted in the issue is identical to what's found in mine, which is 4.9.2 released in 2019, so it's the same version or even older.

For me, the root cause was the old nuxt version, and the chain is like this:

nuxt: ^2.16.3 @nuxt/webpack: ^2.16.3 webpack: ^4.46.0 node-libs-browser:^2.2.1 buffer: ^4.3.0

JackFromMoo commented 3 months ago

Thanks @harfangk, you're absolutely right that that's where the problem lies. I've been learning a ton about webpack and dependencies as I've been troubleshooting this. Hopefully the closed issue will help people with the same problem in the future.

github-actions[bot] commented 3 months ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.