aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
2.96k stars 556 forks source link

prefer duck-type checks over instanceof - Upload doesn't support polyfills/compat shims for Readable/ReadableStream #6153

Open jedwards1211 opened 4 weeks ago

jedwards1211 commented 4 weeks ago

Checkboxes for prior research

Describe the bug

If I pass a polyfill for a Readable or ReadableStream in params in Upload, the request fails saying Body Data is unsupported format, expected data to be one of: string | Uint8Array | Buffer | Readable | ReadableStream | Blob.

This is very confusing when working with a library like archiver since it uses the readable-stream userland implementation of Node's Readable. Their docs don't make this clear at all (that's their bad), so I was baffled what was wrong until I dug into their code. And it would certainly be better if archiver used Node's own streams when possible.

But, @aws-sdk could tolerate cases like this instead of erroring out, and that would make things go more smoothly for users in a variety of cases.

SDK version number

@aws-sdk/lib-storage 3.588.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v20.10.0

Reproduction Steps

import { S3Client } from "@aws-sdk/client-s3";
import { Upload } from "@aws-sdk/lib-storage";
import archiver from "archiver";

(async () => {
  const archive = archiver("zip");

  archive.append("this is a test", { name: "test.txt" });
  archive.finalize();
  await new Upload({
    client: new S3Client(),
    params: {
      Bucket: "test",
      Key: "test",
      Body: archive,
    },
  }).done();
})();

Observed Behavior

Running the code outputs the following error:

Error: Body Data is unsupported format, expected data to be one of: string | Uint8Array | Buffer | Readable | ReadableStream | Blob;.
    at getChunk (/Users/andy/gh/aws-sdk-v3-upload-stream-issue/node_modules/.pnpm/@aws-sdk+lib-storage@3.588.0_@aws-sdk+client-s3@3.588.0/node_modules/@aws-sdk/lib-storage/dist-cjs/index.js:167:9)
    at _Upload.__doMultipartUpload (/Users/andy/gh/aws-sdk-v3-upload-stream-issue/node_modules/.pnpm/@aws-sdk+lib-storage@3.588.0_@aws-sdk+client-s3@3.588.0/node_modules/@aws-sdk/lib-storage/dist-cjs/index.js:383:24)
    at _Upload.done (/Users/andy/gh/aws-sdk-v3-upload-stream-issue/node_modules/.pnpm/@aws-sdk+lib-storage@3.588.0_@aws-sdk+client-s3@3.588.0/node_modules/@aws-sdk/lib-storage/dist-cjs/index.js:213:37)

Expected Behavior

Upload would work if I had permission on this bucket

Possible Solution

Upload could use duck typing checks to determine if an object looks like a Readable or ReadableStream instead of using instanceof checks.

Additional Information/Context

I'm able to work around this by doing archiver.pipe(new PassThrough()) since that returns a bonafide Node.js Readable instance.

aBurmeseDev commented 3 weeks ago

Hi @jedwards1211 - thanks for reaching out.

This was brought up to team discussion and as we all understand, this's because archiver doesn't accept standard valid ReadableStream type that's expected. We'd suggest reaching out to archiver to accept the one Node defines as we won't make changes on our end unfortunately. Let us know if you have further questions.

Best, John

jedwards1211 commented 3 weeks ago

I mainly wanted to help other library users avoid having an annoying debugging session with archiver or other libraries that use polyfills.

I've talked to archiver about, at the very least, explaining this clearly in their docs. But, people don't always read docs well, and if they wanted to fix this with code changes, it would probably break other use cases. On the other hand, @aws-sdk/lib-storage could broaden its compatibility with everything without any breaking change.

I am disappointed the team doesn't think this would be a worthwhile change.

aBurmeseDev commented 3 weeks ago

Thanks for your response. Totally understand the frustration that you're trying to help other users. However, since the type returned by archiver isn't standard to NodeJS, we can't make an exception and make the change.

Also I understand that there may be workaround to use PassThrough, from browsing the issues there on https://github.com/archiverjs/node-archiver/issues and hopefully it will get documented as you requested. Closing the issue for now.

jedwards1211 commented 3 weeks ago

@aBurmeseDev I thought I would share some more context FWIW. I'm not assuming y'all don't know about things like this, but sharing it just in case you don't.

Weird environments

Last year I briefly worked at an SDK codegen company on supporting a wider variety of JavaScript environments -- Node, Deno, Bun, Vercel Edge, CloudFlare Workers, Jest node, Jest jsdom, react-native etc.

We ended up being unable to support certain environments with body instanceof Blob etc and had to resort to checks like this:

function isBlob(x: any): x is Blob {
  return x instanceof Object && typeof x.size === 'number' &&
    typeof x.type === 'string' && typeof x.arrayBuffer === 'function' &&
    typeof x.slice === 'function'
}

Jest still has a particularly onerous problem with instanceof; even instanceof Array fails in some cases: https://github.com/jestjs/jest/issues/2549.

And as expected, @aws-sdk has these exact compatibility problems with Jest: https://github.com/aws/aws-sdk-js-v3/issues/4273

Cross-realm compatibility

Some objects, for example Uint8Array, can be transferred across realms in JavaScript. In the browser, this is between windows, and in Node, this is between vm contexts. And these objects, despite being legitimate standard APIs, don't work with instanceof.

For example, a Uint8Array from another realm will fail an instanceof check. And I do see an instanceof Uint8Array check in @aws-sdk/lib-storage. This is another issue waiting to get filed, though a rare case.

Functions like Array.isArray were created to help with cross-realm support:

Array.isArray() checks if the passed value is an Array. It does not check the value's prototype chain, nor does it rely on the Array constructor it is attached to. It returns true for any value that was created using the array literal syntax or the Array constructor. This makes it safe to use with cross-realm objects, where the identity of the Array constructor is different and would therefore cause instanceof Array to fail.

Has the AWS SDK team considered if you want the SDKs to generally be cross-realm compatible? If so, until the day that Uint8Array.isUint8Array functions are created, you would have to implement a userland check for it.

Unfortunately, userland code to test the identity of cross-realm objects can't prevent all false positives. But this is not necessarily so bad, because an object that happens to contain all of the expected properties of Uint8Array or some other standard API was most likely intended to work like one, so a userland isUint8Array check like my isBlob example above is arguably low-risk.

And even though the issue with archiver is not a cross-realm compatibility issue per se, if you do implement cross-realm support with userland functions, it would happen to solve the issue with archiver.

jedwards1211 commented 2 weeks ago

@aBurmeseDev I just noticed that there's already duck typing support for Blob in the code... https://github.com/aws/aws-sdk-js-v3/blame/main/lib/lib-storage/src/chunker.ts#L26

  if (typeof (data as any).stream === "function") {
    // approximate support for Blobs.
    return getChunkStream<ReadableStream>((data as any).stream(), partSize, getDataReadableStream);
  }
kuhe commented 2 weeks ago

converting this to feature request to prefer duck checks globally