aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.16k stars 834 forks source link

S3Exception with null information for object redirect location with non-ASCII character #1828

Open garretwilson opened 4 years ago

garretwilson commented 4 years ago

There are a various related issues; here are the two main ones. I note issues in bold in the description below.

I'm using software.amazon.awssdk:bom:2.13.8. I want to have an S3 object new-ā with content, and an object old-ā to serve as a redirct to new-ā.

I do a PUT for old-ā. From the SDK debug messages, I can see that this got encoded automatically (encodedPath) to /my-bucket/new-%C4%81. I infer from this message that S3 will automatically encode S3 object keys as needed for URIs. If so, that is great! But there is no documentation that I can find that explicitly explains the encoding S3 will perform on object key names. The official object key documentation doesn't really address it. It mentions that some keys "likely need to be URL encoded", but that's not helpful and probably not even totally correct. (URL-encoding uses + for spaces, while URI-encoding, the more appropriate encoding, uses %20 for spaces.) And what does "likely" mean? If I pre-encode my key names, will S3 then re-encode them, that is, will it re-encode the % signs that I use for encoding, resulting in duplicate (and erroneous) encoding? There is absolutely no guidance on any of this, and AWS leaves us in the dark with conflicting information.

After the PUT to new-ā succeeds, I try to PUT a zero-byte object old-ā, setting a Website-Redirect-Location (via the SDK) of /new-ā. (Note that the documentation for object redirects is not clear that redirect destination keys need to be in absolute path form, i.e. starting with a slash, but that is a separate issue.) In this case the PUT fails, but I don't know why. Here is the stack trace.

Caused by: software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 403, Request ID: null)
        at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
        at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:106)
        at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:84)
        at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:42)
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:94)
        at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$4(BaseClientHandler.java:214)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:77)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:39)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:64)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:34)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
        at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:189)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:121)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:147)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:101)
        at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55)
        at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:7376)

Note that the null in the exception is completely unhelpful in telling me what went wrong.

I am guessing that S3 did not like the redirect target of /new-ā. If so, why didn't S3 encode the redirect target S3 key automatically, just like it encodes the S3 key itself? And of course, the encoding procedure for the object redirect target key is not documented either.

So there are several problems here, exacerbated by missing documentation, and obscured by exceptions containing null instead of useful information.

Perhaps I'm doing something wrong in the code, but if so the SDK is not helping me to know what it is.

I hope this list of problems can help you make the SDK better, and I would really appreciate some improvements. Moreover I'll bet I'm not the only developer who would be very thankful for some updated documentation! Thank you.

garretwilson commented 4 years ago

I am guessing that S3 did not like the redirect target of /new-ā.

That turned out to be a correct guess. I changed the program to encode the /new-ā redirect path (resulting in /new-%C4%81), and the above error did not occur; the PUT was successful. In addition the Website-Redirect-Location metadata in the console shows /new-%C4%81.

I did a CloudFront distribution of the S3 website, and it looks like the encoded form correctly redirects to the /new-ā key, so this is very good news.

To summarize the issues in a single sentence: S3 automatically encodes key names but not object redirect key names, but there is no documentation that explains this inconsistency, and moreover the SDK exception contains null instead of any explanation when the redirect path contains an unsupported, unencoded character.

(I can guess why there is a discrepancy between the auto-encoding of the key and the redirect location. Even though the documentation indicates that the object redirect location can contain an S3 object key, it is in reality just the actual content of the HTTP Location header which will be sent back unchanged, so it's not really semantically an object key, in S3 doesn't do anything special with it; the communication happens at the HTTP web server level before it gets to S3 I imagine. In any case, this is undocumented, and what I've written here is many times over anything I could find on the web. It would be nice if AWS could provide us some in-depth, authoritative documentation on this!)

Matthew-Han commented 2 years ago

I'm using software.amazon.awssdk:aws-sdk-java:2.17.139, Alright, it's the latest version but it still has this bug. This bug occurs when I try to add Chinese characters to metadata objects.

yasminetalby commented 1 year ago

Hello @Matthew-Han ,

Apologies for the long silence on this issue. Unfortunately there is a known service limitation regarding the use of Non US-ASCII character.

We currently have open issue submission regarding this in both AWS Java SDK v1 and AWS Java SDK v2 repository. Theses issues are marked as service-api which means that it is caused by service limitations.

The current documented recommendation from the service team regarding this behavior is:


To avoid issues around the presentation of these metadata values, you should conform to using US-ASCII characters when using REST and UTF-8 when using SOAP or browser-based uploads via POST.

When using non US-ASCII characters in your metadata values, the provided unicode string is examined for non US-ASCII characters. Values of such headers are character decoded as per RFC 2047 before storing and encoded as per RFC 2047 to make them mail-safe before returning. If the string contains only US-ASCII characters, it is presented as is.


See Using Metadata S3 documentation

If your use case is different than the one referred in the issue above please let me know.

Best, Yasmine

garretwilson commented 1 year ago

When using non US-ASCII characters in your metadata values, …

This confused me at first, but are you saying that @Matthew-Han was reporting a different issue? If so, that should be placed in a separate ticket.

What is the status of this ticket as original reported and described?

MatiMonaco commented 1 year ago

Hello @yasminetalby. I'm trying to do a copy object, and while doing it adding new metadata with the metadata directive REPLACE.

In the documentation says (https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html):

When using non US-ASCII characters in your metadata values, the provided unicode string is examined for non US-ASCII characters. Values of such headers are character decoded as per RFC 2047 before storing and encoded as per RFC 2047 to make them mail-safe before returning. If the string contains only US-ASCII characters, it is presented as is.

Reading that I understand that if I send for example x-amz-meta-test='?UTF-8?B?w4RNw4Raw5XDkSBTMw==?=', S3 should decode it and save 'ÄMÄZÕÑ S3' in the file metadata. Maybe I'm understanding it wrong.

Also, in the documentation examples shows:

PUT /Key HTTP/1.1 Host: awsexamplebucket1.s3.amazonaws.com x-amz-meta-nonascii: ÄMÄZÕÑ S3

But if I do for example: aws s3api copy-object --copy-source src-bucket/src-key --key dest-key --bucket dest-bucket --metadata-directive REPLACE --metadata test='ÄMÄZÕÑ S3'

Non ascii characters found in S3 metadata for key "test", value: "ÄMÄZÕÑ S3".
S3 metadata can only contain ASCII characters.