aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
2.96k stars 557 forks source link

input/outputFilterSensitiveLog logs entire inputs and outputs of S3 object commands #6101

Closed carera closed 1 month ago

carera commented 1 month ago

Checkboxes for prior research

Describe the bug

When for example s3Client is passed a logger, it will log absolutely everything, including input and output data. For example, if I try to upload a 20GB file to S3, it is also fully printed out via the logger.

SDK version number

@aws-sdk/client-s3@3.576.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v20.12.1

Reproduction Steps

Consider the following code:

import { PutObjectCommand, S3Client } from '@aws-sdk/client-s3';
import pino from 'pino';

const s3Config = {
  region,
  endpoint,
  logger: pino({ name: 'logger' }),
  forcePathStyle: true,
};
const client = new S3Client(s3Config);
client.send(
  new PutObjectCommand({
    Bucket: 'large-steps',
    Key: 'key',
    Body: new Uint8Array(Buffer.from('hello')),
  })
);

Observed Behavior

When ran, it prints out the contents of the body:

..."name":"logger","clientName":"S3Client","commandName":"PutObjectCommand","input":{"Bucket":"large-steps","Key":"key","Body":{"0":104,"1":101,"2":108,"3":108,"4":111}}...

Expected Behavior

I would expect the thing that has "filterSensitive" in its name to filter out sensitive data, such as the input data for PutObjectCommand

Possible Solution

The implementation for most is not doing much with the input/output Body contents. PUT specifically is for example

export const PutObjectRequestFilterSensitiveLog = (obj: PutObjectRequest): any => ({
  ...obj,
  ...(obj.SSECustomerKey && { SSECustomerKey: SENSITIVE_STRING }),
  ...(obj.SSEKMSKeyId && { SSEKMSKeyId: SENSITIVE_STRING }),
  ...(obj.SSEKMSEncryptionContext && { SSEKMSEncryptionContext: SENSITIVE_STRING }),
});

It doesn't seem to pay attention to the input/output Body contents, which IMHO should be scraped out:

export const PutObjectRequestFilterSensitiveLog = (obj: PutObjectRequest): any => ({
  ...obj,
  Body: SENSITIVE_STRING,
  ...(obj.SSECustomerKey && { SSECustomerKey: SENSITIVE_STRING }),
  ...(obj.SSEKMSKeyId && { SSEKMSKeyId: SENSITIVE_STRING }),
  ...(obj.SSEKMSEncryptionContext && { SSEKMSEncryptionContext: SENSITIVE_STRING }),
});

Additional Information/Context

Is this intended? I feel like this is not a very good behaviour for production usage, for two reasons:

kuhe commented 1 month ago

This is not specific to the JavaScript SDK since the service (S3) has not marked the object bodies as sensitive.

aBurmeseDev commented 1 month ago

Hi @carera - thanks for reaching out. As previously mentioned above, there are service models from S3 with various traits, one of the traits is called sensitive: mark the field as sensitive. In this particular case, object body output has not been marked as sensitive by S3 which is why you're seeing these output. From SDK perspective, we're simply logging the fields passed by service models. Hope that makes sense.

Please reach out to S3 directly on AWS Support or I could submit an internal request on behalf if you'd like. Best, John

carera commented 1 month ago

Hey, hey, thank you for your reply.

I understand your argument that object body is not officially marked as sensitive data, I could perhaps agree with this.

How about the argument that S3 objects can get ridiculously large (S3 is after all used for storing large objects)? Logging the entire object bodies feels wrong. Not only this inflates logs massively, but effectively turns our logging solution into a second S3 object storage, as the entire object bodies are streamed to both S3 as well as to whatever consumes the logs.

I'm, interesting in your opinion on this, @kuhe, @aBurmeseDev Thank you!

aBurmeseDev commented 1 month ago

I appreciate your response and agree with your statements. In this case, however, SDK doesn't have control over marking object bodies as sensitive, S3 does. Same goes for our other language SDKs, it will output the same because SDK would simply log the field passed down from upstream models.

Hope that makes sense!

github-actions[bot] commented 2 weeks ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.