aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.04k stars 569 forks source link

[S3] Impossible to set SSECustomerKey with binary AES256 key #5651

Closed tmccombs closed 7 months ago

tmccombs commented 8 months ago

Checkboxes for prior research

Describe the bug

This is yet another reiteration of https://github.com/aws/aws-sdk-js-v3/issues/4736. Because I'm pretty sure something is not working as intend here, and cannot figure out how to convert a UInt8Array to a string that is a valid key for SSECustomerKey in the S3 APIs.

Basically, if I have a UInt8Array of 32 random bytes for an AES256 key, there doesn't seem to be any way to pass that to SSECustomerKey that complies with the tyepscript types for CopyObjectCommandInput, PutObjectCommandInput, GetObjectCommandInput, etc.

SDK version number

@aws-sdk/package-name@version, ...

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

18.18.0

Reproduction Steps

import {S3Client, PutObjectCommand} from '@aws-sdk/client-s3';
import * as crypto from 'crypto';
import {promisify} from 'util';

const randomBytes = promisify(crypto.randomBytes);

const client = new S3Client();
key = await randomBytes(32);
const response = await client.send(new PutObjectCommnd({Bucket: 'mybucket', Body: 'This is a test', SSECustomerKey: key.toString(), SSECustomerAlgorithm: 'AES256});

Observed Behavior

The request fails with a 400, with the error "InvalidArgument: The secret key was invalid for the specified algorithm."

This happens regardless of the encoding used. I have tried calling toString on the key with the following encodings:

All with the same result.

Expected Behavior

It should be possible to use an arbitrary, randomly generated key for the SSE customer key, even if it isn't a valid UTF-8 string.

Possible Solution

As @DavidZbarsky-at pointed out in the original issue. It loos like the code actually handles being passed in a UInt8Array directly, and indeed if I change the reproduction steps to pass in the random bytes directly (SSECustomerKey: key), then the command succeeds.

So probably just accept that type?

Another possible solution would be to not have the middleware base64 encode the SSECustomerKey value, and let the user do the base64 encoding themselves. However that would not be backwards compatible, since it would break any usage that depended on the library doing the base64 encoding for you.

Maybe it could do some heuristics to look at the length of the string, and if it looks like it is already base64 encoded, then just use it as is?

Additional Information/Context

From my inspection of looking at the code at https://github.com/aws/aws-sdk-js-v3/blob/f5c19c0c314ea1e02f067c169c8d6e7dacb733ac/packages/middleware-ssec/src/index.ts#L36C6-L53C8

I believe what is happening is:

If the key is a string, then we encode it using "utf-8", then base64 encode that to a string that we use in the actual request.

However, if you have a key that is not a valid utf8 string, which is common if it is randomly generated, then it isn't possible to convert it as a string that will encode using the utf8 encoding back to the original bytes. If you encode it using "utf8", then any invalid sequences will be transcoded as U+FFFD. If you use the binary or latin1 encoding, then the utf8 encoding will corrupt any characters that are multi-byte in utf-8 (i.e. non-ascii characters).

Regarding this comment in the previous thread:

this is actually not considered as a bug. The type expected for SSECustomerKey is a string and needs to be provided as so. This type is actually defined by the service itself and the SDK just follows its directives regarding typing.

I think that would be true if the library wasn't doing any pre-processing before passing the value on to the S3 service. However, this library is doing preprocessing. Namely base64 encoding the key. Since base64 encoding transforms binary data into a string, I think that the type of this parameter should accept a UInt8Array.

If there is some way to encode a UInt8Array of random bytes into a string that would work with this, I'd love to hear about it. Although, that still has the issue that it unnecessarily goes from a byte array, to a string, and then back to a byte array.

One final note: The SDK documentation for SSECustomerKey makes no mention that the library performs this base64 encoding. So, since the expected type was a string, and the documentation for the http API said that the value needed to be base64 encoded, I assumed that I was responsible for base64 encoding the key. So, either way that documentation should probably be updated to explain what format the key is expected to be in, and whether the base64 encoding is the responsibility of the caller or the library.

RanVaknin commented 8 months ago

Hi @tmccombs ,

Thanks for reaching out and raising this issue. I don't have enough context to know why this function accepts binary data and implicitly converts it to base64 when modeled as string. This might have been added as a customization or maybe something that existed in v2 and we tried to maintain the feature parity by copying the same functionality, but Im not sure 100% sure.

I agree with you on needing to either fix the type or the doc, the problem is both are controlled upstream. We cannot change the type since that will be a backwards incompatible change, and we cannot change the doc because it is the doc string for the API operation itself and will not have SDK specific information.

I think the way we will tackle this is either adding documentation to our developer guide, or better, if possible to change the middleware code that base64 this so it wont try to base64 strings which are already in that format.

I will discuss it with the team and see what we can do.

Thanks, Ran~

tmccombs commented 8 months ago

or maybe something that existed in v2 and we tried to maintain the feature parity

This did exist in v2. And in v2 it tooke either a steing or a Uint8Array, and the documentation specifically stated that it performed base64 encoding as a convenience.

If it weren't for backwards compatibility, I'd say that the upstream documentation should say that it expects a base64 encoded string, and not do the base64encoding automatically.

Is there some way that the type could be overridden from the upstream value?

RanVaknin commented 7 months ago

Hi @tmccombs ,

Can you please install the latest version of the SDK, after today's release? probably in a few hours? I pushed a fix for this. This should now be working both with supplying binary directly, and with supplying the binary in a base64 format.

Thanks, Ran~

tmccombs commented 7 months ago

It looks like that works.

github-actions[bot] commented 7 months ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.