aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.07k stars 575 forks source link

S3.GetObject no longer returns the result as a string #1877

Closed igilham closed 1 year ago

igilham commented 3 years ago

Describe the bug I'm using the GetObjectCommand with an S3Client to pull a file down from S3. In v2 of the SDK I can write response.Body.toString('utf-8') to turn the response into a string. In v3 of the SDK response.Body is a complex object that does not seem to expose the result of reading from the socket.

It's not clear if the SDK's current behaviour is intentional, but the change in behaviour since v2 is significant and undocumented.

SDK version number 3.1.0

Is the issue in the browser/Node.js/ReactNative? Node.js

Details of the browser/Node.js/ReactNative version v12.18.0

To Reproduce (observed behavior)

import { GetObjectCommand, S3Client } from '@aws-sdk/client-s3';

export async function getFile() {
  const client = new S3Client({ region: 'eu-west-1' });
  const cmd = new GetObjectCommand({
    Bucket: 'my-bucket',
    Key: '/readme.txt',
  });
  const data = await client.send(cmd);

  console.log(data.Body.toString('utf-8'));
}

Expected behavior It should print the text of the file.

Additional context

data.Body is a complex object with circular references. Object.keys(data.Body) returns the following:

[
  "_readableState",
  "readable",
  "_events",
  "_eventsCount",
  "_maxListeners",
  "socket",
  "connection",
  "httpVersionMajor",
  "httpVersionMinor",
  "httpVersion",
  "complete",
  "headers",
  "rawHeaders",
  "trailers",
  "rawTrailers",
  "aborted",
  "upgrade",
  "url",
  "method",
  "statusCode",
  "statusMessage",
  "client",
  "_consuming",
  "_dumped",
  "req"
]
gailmiller commented 2 years ago

@sfwhite said:

It's still not noted in the docs that the latter two shapes are from DOM and only apply in browsers. I agree it's frustrating that helper methods for common scenarios aren't included with the client.

We've run into similar issues so often we've taken to wrapping all SDK clients in our own classes to extend their operations to handle situations like this. S3 shouldn't be one of the services we should have to do this for. With such a large library of offerings, I know its hard to keep up with everything, but S3 is hands down one of the most used resources Amazon has available, so it should be receiving the most attention when it comes to DX. Bad experiences on the most common use cases definitely sour the impression of your products, and lowers the likelihood of developer evangelization.

Here's to hoping 2021 closes out with a packaged implementation for this scenario.

I whole heartedly agree. I have wrappers now as well for exactly this reason.

I have had to read and reread and search for this thread (it doesn't google as easily as you might think).

This issue has had me converting 90% of my code to V3, then having to use V2 for any S3 or other randomly-converted-to-streams-but-they-dont-explain-that APIs because the documentation is atrocious. I still do not understand what is written in AWS S3 doco and I have read this thread entirely (so at least I understand what is going on - thank you so much everyone!!). Simply saying "the Body data" is well... useless. Add to that, if that work is in lambda, that's ME paying for cycles I didn't used to have to, and if you are trying to do near real-time work eg for CCaaS (Amazon Connect), it's getting even SLOWER. I find this whole situation really disappointing.

Will there be a lib_S3 like there is for dynamo which has all these helper functions in it? I hope that there will be and soon.

DanielSmith commented 2 years ago

AWS team: it is just about November 2021, and, still, compared to the SDK v2, reading in a simple JSON object from S3 is way too convoluted. If this your idea of a private joke, it's not funny. Fix it already.

ffxsam commented 2 years ago

I'm generally a bit concerned about this SDK being maintained over time. There are very serious issues like this one that have been open since May with no resolution.

zbagley commented 2 years ago

@all following this issue:

IMPORTANT UPDATE

I have received news that Bezos officially defunded the AWS core services. Do not fret! He's reallocated these funds to lawyers fees. These fees will be used to prevent SpaceX from progressing. Please, rest assured that the lack of basic, simple, progress on these core AWS libraries is clearly being put to good use!

Hope this update is found useful and promising to all engineers that rely on AWS S3.

BrianM0330 commented 2 years ago

It's ridiculous that people have to resort to this GitHub thread to parse their data. Why does a company like Amazon have such shitty and obscure documentation?

For any lost souls trying to parse simple JSON data using GetObjectCommand and don't want to mess with filestreams, readers, or buffers, https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-793028742 worked like a charm on my NodeJS/Express server.

I'm sure there's a lot of really good suggestions in this thread too, hope that there's a solution for this sometime soon.

montreux commented 2 years ago

I couldn't get the stream.on solution to work under React. I kept getting the error 'stream.on is not a function'. It turns out in this environment AWS returns a ReadableStream not a Readable. I ended up have to write my own converter, to handle ReadableStream and work under Typescript with eslint. No extra packages needed to be installed and used.

I thought I'd written my last do while loop a decade ago, but it turned out I still needed to use it here. I couldn't get an iterator solution to work, and I refused to do the tail recursion I'd seen in the getReader() examples, which wouldn't compile under Typescript anyway.

async function readableStreamToString(stream: ReadableStream): Promise<string> {
  const chunks: Buffer[] = [];

  const reader = stream.getReader();

  let moreData = true;
  do {
    // eslint-disable-next-line no-await-in-loop
    const { done, value } = await reader.read();
    if (done) {
      moreData = false;
    } else {
      chunks.push(Buffer.from(value as Uint8Array));
    }
  } while (moreData);

  return Buffer.concat(chunks).toString('utf-8');
}

And I call it like this:

async function loadJsonFileFromS3(bucket: string, key: string): Promise<[]> {
  const s3Response = await s3Client.send(
    new GetObjectCommand({ Bucket: bucket, Key: key })
  );
  if (!s3Response.Body) {
    const errorMessage = `${key} returned undefined`;
    throw new Error(errorMessage);
  }

  const fileContents = await readableStreamToString(
    s3Response.Body as ReadableStream
  );
  const contentsAsJson = JSON.parse(fileContents);
  return contentsAsJson;
}
windbeneathyourwings commented 2 years ago

I agree. I love AWS but finding this thread took a bit of research…

fishcharlie commented 2 years ago

Guess it's back to AWS SDK v2.

Can we please get an official comment from someone at AWS about this? Getting an object from S3 should NOT be this difficult...

tranvansang commented 2 years ago

Typescript version for node (not for browser).

import {GetObjectCommand, S3Client} from '@aws-sdk/client-s3'
import type {Readable} from 'stream'

const s3Client = new S3Client({
    apiVersion: '2006-03-01',
    region: 'us-west-2',
    credentials: {
        accessKeyId: '<access key>',
        secretAccessKey: '<access secret>',
    }
})
const response = await s3Client
    .send(new GetObjectCommand({
        Key: '<key>',
        Bucket: '<bucket>',
    }))
const stream = response.Body as Readable
// if you are using node version < 17.5.0
return new Promise<Buffer>((resolve, reject) => {
    const chunks: Buffer[] = []
    stream.on('data', chunk => chunks.push(chunk))
    stream.once('end', () => resolve(Buffer.concat(chunks)))
    stream.once('error', reject)
})

// if you are using node version >= 17.5.0
return Buffer.concat(await stream.toArray())

For browser usage, the reasons behind this making-thing-complicated decision in SDK v3, also the explanation of the typecasting, I explain it in my blog post.

maximelafarie commented 2 years ago

If it helps someone who doesn't want to use a library for converting streams to buffers, here's the custom function I'm using:

import { Stream } from 'stream';

export async function stream2buffer(stream: Stream): Promise<Buffer> {

    return new Promise<Buffer>((resolve, reject) => {

        const _buf = Array<any>();

        stream.on('data', chunk => _buf.push(chunk));
        stream.on('end', () => resolve(Buffer.concat(_buf)));
        stream.on('error', err => reject(`error converting stream - ${err}`));

    });
}

Moreover, it seems to be a little bit faster than the lib.

Then you can use it like that:

const data = await this.client.getObject({
  Key: path.replace(/^\//g, ''),
  Bucket: this.bucket
});

const file_stream = data.Body;
let content_buffer: Buffer | null = null;

if (file_stream instanceof Readable) {
  content_buffer = await stream2buffer(file_stream); // Here's the buffer
} else {
  throw new Error('Unknown object stream type.');
}
...
ricardobeat commented 2 years ago

Also ended up here after some time wasted trying to figure out the API. A year later, and there is still no proper documentation in docs.aws.amazon.com.

Will be reverting to v2. Despite the monolith design the API is simpler, smaller, and it only pulls in 10 dependencies.

park-brian commented 2 years ago

You can also use async iterators to consume readable streams if you're at nodejs 11.14.0 or above: https://nodejs.org/api/stream.html#readablesymbolasynciterator

const s3Response = await s3Client.send(
  new GetObjectCommand({
    Bucket: bucket,
    Key: key,
  })
);
let s3ResponseBody = "";
for await (const chunk of s3Response.Body) {
  s3ResponseBody += chunk;
}
// const result = JSON.parse(s3ResponseBody)
esemeniuc commented 2 years ago

A shorter version is:

const body: stream.Readable|undefined = s3result.Body as stream.Readable|undefined;
if (!body) return;
const payload = await body.read();
const output = Buffer.from(payload).toString())
joshribakoff-sm commented 2 years ago

The fact people are using a Typescript type assertion in many of these examples stems from the fact AWS has typed this thing poorly. Why is body even a union type? That is basically AWS telling us "we don't know the shape of the response we're going to give you, only that it will be one of these three things, but you need to write code to check which of these we gave you". I think someone at AWS should look into making this an intersection type instead of a union type?

Why should I have to check the return type? Shouldn't the library know what the return type is already?


  if (res.Body && res.Body instanceof Readable) {
    const payload = await stream2buffer(res.Body);
    console.log(payload.toString('utf-8'));
  }

It seems it is the case that the returned object can be used as any of the three interfaces (intersection type), and it does not seem to be the case that it can only be used as one of the interfaces (as the types would suggest with union type)

If a union type is in fact correct, it begs the question under what scenario I should expect the library to not return this interface to me. This is not documented. Why not just add an example to your docs that writes a "hello world" string and reads it back and outputs it, at the very least, using Typescript so you dogfood your own types (if you did this you'd realize your current types are "not optimal")

Because AWS has this as a union type, people's editors will suggest .toString() only (assuming the user has not discriminated the union). However this just prints "[object]" which is pretty poor DX especially considering toString() used to work for v2 of the API from the sounds of it. Perhaps consider structuring your types so that the "trigger suggest" command in VSCode guides the user into a working "hello world" command without all of this fuss.

MontoyaAndres commented 2 years ago

Reading the answers above, in my case, to resize an image to a specific size and do other little things with sharp, I just needed to remove the .toString method from the streamToString function.

const streamToString = (stream) => {
  return new Promise((resolve, reject) => {
    const chunks = [];
    stream.on("data", (chunk) => chunks.push(chunk));
    stream.on("error", reject);
    // I removed the .toString here
    stream.on("end", () => resolve(Buffer.concat(chunks)));
  });
};

And everything works great:

 const getObjectCommand = new GetObjectCommand({
    Bucket: bucket,
    Key: key,
 });
 const getObjectResponse = await s3Client.send(getObjectCommand);
 const body = await streamToString(getObjectResponse.Body);

 const image = sharp(body);
 const resizedImage = await image.resize({ ... }).toBuffer();
hustlerman commented 2 years ago

Bumping because of how much time I wasted trying to fix this issue...

AWS provides a migration guide that is clearly not thorough enough to simplify the process: https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/migrating-to-v3.html

If there's going to be a migration guide at all, details like this need to be covered.

sean-hernon commented 2 years ago

Why should everyone have to copy and paste the same boilerplate everywhere? Have you guys not heard of encapsulation? If you are making a new version that you want everyone to use, then it must be better, not worse. I've never seen something so obtuse.

ffxsam commented 2 years ago

Although the SDK is sometimes a little painful to use, I think it's worth pointing out that (from what I can gather) the JS SDK is modeled to be as close to the REST API as possible, for the sake of consistency. So the requests & responses are pretty much exactly the same.

If you find something to be too repetitive within the JS SDK, write your own wrapper for it. Since I interact with S3 more than any other part of AWS, I've created my own S3Object class (~400 lines of code) that makes it super easy for me to save S3 objects to /tmp, save data to S3, get a stream or buffer, get metadata, delete an object or multiple objects, list objects, and more.

Having said all that, I do wish some things were done differently, for example:

await lambda.send(
  new InvokeCommand({
    InvocationType: 'Event',
    FunctionName: 'myFunc',
    Payload: Buffer.from(JSON.stringify(payload)), // <-- why can't we just send a string instead Buffer??
  })
);

My problem with the imposed Buffer type is that it puts extra work on the developer, which is not a good developer experience. If it must be a buffer, then let us pass a string, and the SDK can transform that internally. DX (developer experience) is a really important aspect of APIs and tooling that is often overlooked.

But overall, I do prefer v3 over v2 SDK, mostly for its modularity. It results in much smaller builds than v2.

sean-hernon commented 2 years ago

Of course we can all write our own wrappers and functions, but then we are all repeating the same work and re-inventing the same wheel everywhere and we will all have to change it again when there is another API change. That's why in my mind it makes sense to centralise very common tasks in the library itself. We're not talking about something esoteric, here.

Imagine a 5-liner copied and pasted everywhere (such things tend to be posted on discussions such as this) and then someone realises there is a bug in it. It's easier to maintain things in one place.

Even if a bunch of different people decide to publish their solutions (i.e. as libraries), you end up with n solutions and potentially many bugs, so there is no shared benefit across the board when one is fixed, as there would be if we all subscribed to one central solution.

Also, PutObject lets you send a string, so the API is not symmetrical.

I agree about the build sizes, though, it's why I decided to use v3.

nitedani commented 2 years ago

The returned Body is of type internal.Readable | ReadableStream | Blob Is there a way to know the environment and have the types only for that environment? For example in node it wouldn't be a Blob. If that isn't possible, could the Body have a common interface in both browser and node environments?

Cloudmancermedia commented 2 years ago

I'm also very confused about how to read S3 Body responses with SDK v3. The SDK documentation for GetObjectCommand does not describe how to do it, and the SDK examples are also missing it (awsdocs/aws-doc-sdk-examples#1677).

I would ask the AWS SDK team to include in the SDK a simple way to read S3 Body responses. We don't want to re-implement complicated event handlers and helper functions for this simple purpose every time we use GetObject in a project.

In v2 we could just say something like JSON.parse(response.Body?.toString()). Please make it as simple in v3. Stream-based processing is also important, but it should be only an alternative for the simple case for parsing small JSON objects.

For reference, I was able to do this in Node.js by utilizing node-fetch. I would like something like this be included in AWS SDK.

npm install node-fetch
npm install --save-dev @types/node-fetch
import { Response } from 'node-fetch'

const response = new Response(s3Response.Body)
const data = await response.json()

I have been trying to find a solution that works for days...this answer works perfectly for parsing a small JSON object retrieved from S3. THANK YOU!!!

I will note that if you want to use "require()" as opposed to "import" for Response (like if using inside of a lambda), you can if you just use v2 of node-fetch instead of v3.

rui-ktei commented 2 years ago

I can't believe a commercial SDK like AWS will provide such useless interfaces.

Seriously? Are we in year 2022 or 2000?

internal.Readable | ReadableStream | Blob

Please showcase how your AWS dev teams use such interface without writing quite a few lines of helper functions. Why would you assume SDK customers want to use such interface?? You of course can provide low-level interface, but the design of good interface 101 already tells us: "make it simple for your users". Interface design is not about you - the author; it's about customers. Go back and check the SDKs of other platforms such as .NET, Java -- see how they work with customers. No one really wants to fiddle around with internal.Readable | ReadableStream | Blob, no matter how accurate it is in Computer Science. You may think it's a technically precise and perfect response type -- it doesn't matter, because it's useless to customers.

How hard is it to implement something more useful like below??

const response = await GetS3Object();
// don't want to play with response.Body? Fine, use helper functions:
const str = await response.asString()
const bytes = await resaponse.asBytes()
const theDataTypeMostCustomersWant = await response.asMostOfYouWant()

P.S. In case anyone simply wants to download text object:

export async function getObjectAsString(
  bucket: string,
  key: string
): Promise<string> {
  const client = new S3Client()
  const response = await client.send(
    new GetObjectCommand({
      Bucket: bucket,
      Key: key
    })
  )

  // The code like below should really be provided as nice interfaces by the SDK itself.
  return new Promise((resolve, reject) => {
    if (!response.Body) {
      reject("No Body on response.");
    } else {
      const chunks: Uint8Array[] = [];
      const bodyStream = response.Body! as Readable;
      bodyStream.on("data", (chunk) => chunks.push(Buffer.from(chunk)));
      bodyStream.on("end", () =>
        resolve(Buffer.concat(chunks).toString("utf-8"))
      );
    }
  });
}
timrobinson33 commented 2 years ago

This is confusing as hell. Even if you're not going to fix it right away, just put a massive note in the documentation that it does NOT return a Blob if you are using nodejs regardless of what the typescript declaration says

That would at least allow people to figure out what's wrong without having to spend an hour reading through the whole of this thread

lucasveronesi commented 2 years ago

After some time of research, that is confusing for sure, I finally did solve my problem, its not exactly the same problem, but its kinda similar. I was trying to use selectObjectContent from AWS S3, had some problems by reading that readable stream. I hope I could help someone that had the same problem as mine :)

const { S3Client, SelectObjectContentCommand } = require("@aws-sdk/client-s3");
const utilUtf8 = require('@aws-sdk/util-utf8-node');

var params = {
  Bucket: bucket, /* required */
  Expression: caso, /* required */
  ExpressionType: 'SQL', /* required */
  InputSerialization: { /* required */
    CompressionType: 'NONE',
    JSON: {
      Type: 'DOCUMENT'
    }
  },
  Key: key, /* required */
  OutputSerialization: { /* required */
    JSON: {
      RecordDelimiter: ','
    }
  },
};

const command = new SelectObjectContentCommand(params);

const data = await s3.send(command);

const eventStream = data.Payload;

let s3ResponseBody;

for await (const chunk of eventStream) {
  if(typeof(chunk.Records) != 'undefined'){
    s3ResponseBody = 
      JSON.parse(
        `[${utilUtf8.toUtf8(chunk.Records.Payload).slice(0, -1)}]`
      )
    ;
  }
}

return s3ResponseBody;
ptoussai commented 2 years ago

Starting from node 16.7, you can simply use the utility consumer functions :

import consumers from 'stream/consumers'

const { Body: stream } = await s3.getObject({
  Bucket: bucket,
  Key: key
})
const objectText = await consumers.text(stream)

Edit: Added await to consumers.text result. Thanks @AHaydar.

ffxsam commented 2 years ago

@ptoussai That's awesome, thanks! Hard to keep tabs on what's new in Node.js.

AHaydar commented 2 years ago

Starting from node 16.7, you can simply use the utility consumer functions :

import consumers from 'stream/consumers'

const { Body: stream } = await s3.getObject({
  Bucket: bucket,
  Key: key
})
const objectText = consumers.text(stream)

That's awesome. Thanks for sharing. Please note that consumers.text returns a promise. So we'd need to await it as well 🙌

jbingham17 commented 2 years ago

Starting from node 16.7, you can simply use the utility consumer functions :

import consumers from 'stream/consumers'

const { Body: stream } = await s3.getObject({
  Bucket: bucket,
  Key: key
})
const objectText = consumers.text(stream)

This answer makes a lot of sense but for some reason the import is failing for me here. The line:

import consumers from 'stream/consumers'

fails to the error:

Module not found: Error: Can't resolve 'stream/consumers'

This feels like such a dumb error, but I can't find a way around it. I've tried all of the following to no avail:

npm install stream

This works, but my error does not go away

npm install "stream/consumers"

This fails since the module does not exist

npm install stream-consumers

This works, but it's a different package so my error still doesn't go away.

The documentation on the node.js website: is not helpful here since they import it in a totally different way:


import {
  arrayBuffer,
  blob,
  buffer,
  json,
  text,
} from 'node:stream/consumers';

Could someone please tell me the correct way to install and import the 'stream/consumers' package. Thank you so much for the help!

kldavis4 commented 2 years ago

Are you sure you are using the correct version of node (node --version)?

This works for me: const consumers = require('stream/consumers') using 16.13 (and throws module not found for 14.x)

jbingham17 commented 2 years ago

Are you sure you are using the correct version of node (node --version)?

This works for me: const consumers = require('stream/consumers') using 16.13 (and throws module not found for 14.x)

Using 18.5 and it's not working

chrisbednarski commented 2 years ago

I'm also seeing a typescript compilation error when using this via "aws-sdk": "^2.1175.0" on node 14.x Isn't ReadableStream a node 16+ only interface?

node_modules/@aws-sdk/types/dist-types/serde.d.ts:58:33 - error TS2304: Cannot find name 'ReadableStream'.

58     transformToWebStream: () => ReadableStream;
                                   ~~~~~~~~~~~~~~
Found 1 error in node_modules/@aws-sdk/types/dist-types/serde.d.ts:58
fishcharlie commented 2 years ago

@chrisbednarski This entire repository is for AWS SDK v3. Not v2. I'd suggest posting your issue at the v2 repository: https://github.com/aws/aws-sdk-js.

Levi-Huynh commented 2 years ago

@AllanZhengYP Hi Allan, is there an eta of when this potential solution will be merged/available? https://github.com/aws/aws-sdk-js-v3/pull/3795

thanks!

jonah-nestrick-funimation commented 2 years ago

Starting from node 16.7, you can simply use the utility consumer functions :

import consumers from 'stream/consumers'

const { Body: stream } = await s3.getObject({
  Bucket: bucket,
  Key: key
})
const objectText = consumers.text(stream)

This answer makes a lot of sense but for some reason the import is failing for me here. The line:

import consumers from 'stream/consumers'

fails to the error:

Module not found: Error: Can't resolve 'stream/consumers'

This feels like such a dumb error, but I can't find a way around it. I've tried all of the following to no avail:

npm install stream

This works, but my error does not go away

npm install "stream/consumers"

This fails since the module does not exist

npm install stream-consumers

This works, but it's a different package so my error still doesn't go away.

The documentation on the node.js website: is not helpful here since they import it in a totally different way:

import {
  arrayBuffer,
  blob,
  buffer,
  json,
  text,
} from 'node:stream/consumers';

Could someone please tell me the correct way to install and import the 'stream/consumers' package. Thank you so much for the help!

Have you tried import * as consumers from 'stream/consumers'? Depending on how your imports are setup you may need to "Import the entire module into a single variable". Using it in your code will behave the same.

pankti-0627 commented 2 years ago

After some time of research, that is confusing for sure, I finally did solve my problem, its not exactly the same problem, but its kinda similar. I was trying to use selectObjectContent from AWS S3, had some problems by reading that readable stream. I hope I could help someone that had the same problem as mine :)

const { S3Client, SelectObjectContentCommand } = require("@aws-sdk/client-s3");
const utilUtf8 = require('@aws-sdk/util-utf8-node');

var params = {
  Bucket: bucket, /* required */
  Expression: caso, /* required */
  ExpressionType: 'SQL', /* required */
  InputSerialization: { /* required */
    CompressionType: 'NONE',
    JSON: {
      Type: 'DOCUMENT'
    }
  },
  Key: key, /* required */
  OutputSerialization: { /* required */
    JSON: {
      RecordDelimiter: ','
    }
  },
};

const command = new SelectObjectContentCommand(params);

const data = await s3.send(command);

const eventStream = data.Payload;

let s3ResponseBody;

for await (const chunk of eventStream) {
  if(typeof(chunk.Records) != 'undefined'){
    s3ResponseBody = 
      JSON.parse(
        `[${utilUtf8.toUtf8(chunk.Records.Payload).slice(0, -1)}]`
      )
    ;
  }
}

return s3ResponseBody;

I am using this exact same code to read a CSV file from s3 bucket but getting this error ( React Native ) source[Symbol.asyncIterator] is not a function. (In 'source[Symbol.asyncIterator]()', 'source[Symbol.asyncIterator]' is undefined).

Payload is getting blank array. The same data is getting fetched with python boto3 ( s3 query selector - select object content )

haljarrett commented 2 years ago

Poking around in the SDK a bit, it looks like there are some stream consumers already available for browser and node environments, respectively:

These methods appear to already be used in some of the protocol implementations, like https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-s3/src/protocols/Aws_restXml.ts#L12024

That being said, @AllanZhengYP 's change will be a nice usability improvement when it makes it in. Keep up the good work Allan and the SDK team! Can you comment on if these above interfaces are stable enough to depend on?

samchungy commented 1 year ago

So it looks like going by the latest PR: https://github.com/aws/aws-sdk-js-v3/pull/3977/files

Edit: See below post for official answer

The recommended way to do this is now:

import { GetObjectCommand, S3Client } from '@aws-sdk/client-s3';
import { sdkStreamMixin } from '@aws-sdk/util-stream-node';

const s3Client = new S3Client({});
const { Body } = await s3Client.send(
  new GetObjectCommand({
    Bucket: 'your-bucket',
    Key: 'your-key',
  }),
);
const objectString = await sdkStreamMixin(Body).transformToString(); // this throws if Body is undefined
kuhe commented 1 year ago

This is now documented in the root readme with an example: https://github.com/kuhe/aws-sdk-js-v3/tree/main#streams

You do not need to import sdkStreamMixin explicitly. As of that version It is applied to stream objects in command outputs.

import { S3 } from "@aws-sdk/client-s3";

const client = new S3({});

const getObjectResult = await client.getObject({
  Bucket: "...",
  Key: "...",
});

// env-specific stream with added mixin methods.
const bodyStream = getObjectResult.Body; 

// one-time transform.
const bodyAsString = await bodyStream.transformToString();

// throws an error on 2nd call, stream cannot be rewound.
const __error__ = await bodyStream.transformToString();
sfwhite commented 1 year ago

So it looks like going by the latest PR: https://github.com/aws/aws-sdk-js-v3/pull/3977/files

The recommended way to do this is now:

import { GetObjectCommand, S3Client } from '@aws-sdk/client-s3';
import { sdkStreamMixin } from '@aws-sdk/util-stream-node';

const s3Client = new S3Client({});
const { Body } = await s3Client.send(
  new GetObjectCommand({
    Bucket: 'your-bucket',
    Key: 'your-key',
  }),
);
const objectString = await sdkStreamMixin(Body).transformToString(); // this throws if Body is undefined

Took two solid years, but hey, we have an official solution..

zbagley commented 1 year ago

@sfwhite Thanks for the heads up on the throw. @kuhe Glad to see this in, and it probably would be useful in the docs to notate (in general) this should be a try { ... } catch { ... } for most use cases.

samchungy commented 1 year ago

This is now documented in the root readme with an example: https://github.com/kuhe/aws-sdk-js-v3/tree/main#streams

You do not need to import sdkStreamMixin explicitly. As of that version It is applied to stream objects in command outputs.

import { S3 } from "@aws-sdk/client-s3";

const client = new S3({});

const getObjectResult = await client.getObject({
  Bucket: "...",
  Key: "...",
});

// env-specific stream with added mixin methods.
const bodyStream = getObjectResult.Body; 

// one-time transform.
const bodyAsString = await bodyStream.transformToString();

// throws an error on 2nd call, stream cannot be rewound.
const __error__ = await bodyStream.transformToString();

Looks like we need to check for Body being undefined though or else we get Object is possibly 'undefined'.ts(2532) so

import { S3 } from "@aws-sdk/client-s3";

const client = new S3({});

const getObjectResult = await client.getObject({
  Bucket: "...",
  Key: "...",
});

if (!getObjectResult.Body) {
  // handle not found
  throw new Error("Object not found");
}

// env-specific stream with added mixin methods.
const bodyStream = getObjectResult.Body; 

// one-time transform.
const bodyAsString = await bodyStream.transformToString();

// throws an error on 2nd call, stream cannot be rewound.
const __error__ = await bodyStream.transformToString();
acommodari commented 1 year ago

With Node 18 introducing Web Streams API will this affect s3 download streams in any way? To my knowledge if you were working in node you could assume the Body was always gonna be a Readable. Will it now also support ReadableStream?

ryanblock commented 1 year ago

Hey @trivikr! Any updates or official word from your side on this? Haven't heard from you in this thread for over a year and a half, and it's especially relevant with Lambda nodejs18.x now out with SDK v3. Thanks! 💕

ShivamJoker commented 1 year ago

@kuhe any idea how do I get this working with sharp ? It's not taking the string or buffer or stream

misantronic commented 1 year ago

@kuhe any idea how do I get this working with sharp ? It's not taking the string or buffer or stream

this works for me with sharp:

async function streamToBuffer(stream: Readable): Promise<Buffer> {
    return await new Promise((resolve, reject) => {
        const chunks: Uint8Array[] = [];
        stream.on('data', (chunk) => chunks.push(chunk));
        stream.on('error', reject);
        stream.on('end', () => resolve(Buffer.concat(chunks)));
    });
}

const resp = await s3.send(new GetObjectCommand({ Bucket, Key }));

if (resp.Body) {
    resp.Body = (await streamToBuffer(resp.Body as Readable)) as any;
}
ShivamJoker commented 1 year ago

Can't we just use the stream instead of converting it?

misantronic commented 1 year ago

Can't we just use the stream instead of converting it?

I was trying to - no success. If you find a way, keep me posted.

ShivamJoker commented 1 year ago

Okay so after spending few hours I got it right. This way we can pipe our s3 response body into sharp and later use the .toBuffer() to push it to bucket.

  const getObj = new GetObjectCommand({
    Bucket,
    Key: objectKey,
  });

  const s3ImgRes = await s3Client.send(getObj);

  const sharpImg = sharp().resize({ width: 500 }).toFormat("webp");

  // pipe the body to sharp img
  s3ImgRes.Body.pipe(sharpImg);

  const putObj = new PutObjectCommand({
    Bucket,
    Key: `converted/${objectKey.replace(/[a-zA-Z]+$/, "webp")}`,
    Body: await sharpImg.toBuffer(),
  });

  await s3Client.send(putObj);

But AWS team please please you need update your docs, I know there is a lot to update but as developer its just so much struggle to use AWS services because of insufficient docs.

adcreare commented 1 year ago

Here is an example of how to download an object from S3 and write that as a file to disk, while keeping it as a stream. This example is typescript targeting node.

It seems silly to me if we're going to all this trouble of having a stream coming from AWS that we then convert that to a buffer or string to write to disk.

I also agree with the sentiments expressed by others in this thread. It is crazy that getObject has become such a complicated operation in the V3 SDK compared with the V2 SDK and is going to trip many people up for years to come.

import type { Readable } from 'node:stream';
import { pipeline } from 'node:stream/promises';
import fs from 'node:fs'
import { GetObjectCommand, S3 } from '@aws-sdk/client-s3';

async function downloadFile() {
  const s3 = new S3({});
  const s3Result = await s3.send(new GetObjectCommand({ Bucket: sourceBucket, Key: sourceKey }));
  if (!s3Result.Body) {
    throw new Error('received empty body from S3');
  }
  await pipeline(s3Result.Body as Readable, fs.createWriteStream('/tmp/filedownload.zip'));
}
rdt712 commented 1 year ago

Found an easy solution using transformToString if wanting to parse a JSON file in S3.

import { S3, GetObjectCommand } from '@aws-sdk/client-s3'

const s3 = new S3({});

const getObjectParams = {
  Bucket: 'my-bucket',
  Key: 'my-object',
};
const getObjectCommand = new GetObjectCommand(getObjectParams);
const s3Object = await s3.send(getObjectCommand);

const dataStr = await s3Object.Body?.transformToString();

let data;
if (dataStr) {
  data = JSON.parse(dataStr);
}