aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.19k stars 845 forks source link

503s from `S3Client.headObject()` are not correctly identified as throttled exceptions #5414

Open drewschleit opened 3 months ago

drewschleit commented 3 months ago

Upcoming End-of-Support

Describe the bug

When a call to S3Client.headObject() fails with a 503 Slow Down error, I observe that for the resulting exception, S3Exception.isThrottlingException() returns false. For 503 failures with other APIs such as .getObject(), it returns true.

The isThrottlingException() method is used as part of the retry strategy: when set to LEGACY mode, throttling exceptions do not consume from the token bucket. The impact of this bug is that, even when setting a high number of retries to persistently retry throttled exceptions (with appropriate backoff settings, of course), I still see frequent failures after only a few retries due to token bucket exhaustion.

In my particular usecase, I'm executing an Apache Iceberg workload which executes a large number of headObject() requests, and the job is failing due to retry exhaustion despite having set a large number of the maximum number of retries. I imagine other big data workloads which extensively use this API could see the same behavior.

Expected Behavior

When a call to S3Client.headObject() fails with a 503 Slow Down error, I expect that S3Exception.isThrottlingException() returns true.

Current Behavior

When a call to S3Client.headObject() fails with a 503 Slow Down error, I observe that S3Exception.isThrottlingException() returns false.

Reproduction Steps

I only reproduced this in my-at scale application making a large number of requests to S3.

After enabling wire logging, I observe that S3's raw response is as follows. I imagine that the issue can be reproduced by mocking this response.

24/07/19 21:37:04 DEBUG wire: http-outgoing-2925 << "HTTP/1.1 503 Slow Down[\r][\n]"
24/07/19 21:37:04 DEBUG wire: http-outgoing-2925 << "x-amz-request-id: DNJ0YBW4S9H9X8DP[\r][\n]"
24/07/19 21:37:04 DEBUG wire: http-outgoing-2925 << "x-amz-id-2: vkQITUSJv6LxBRzJkgy+5stqWmlS7+L/dW41DhlDmXStNpxtWBO+WRKYDSWhXo/C5YmDYmm0AaX/Cc532WgTWaaM4DB7d36a[\r][\n]"
24/07/19 21:37:04 DEBUG wire: http-outgoing-2925 << "Content-Type: application/xml[\r][\n]"
24/07/19 21:37:04 DEBUG wire: http-outgoing-2925 << "Date: Fri, 19 Jul 2024 21:37:03 GMT[\r][\n]"
24/07/19 21:37:04 DEBUG wire: http-outgoing-2925 << "Server: AmazonS3[\r][\n]"
24/07/19 21:37:04 DEBUG wire: http-outgoing-2925 << "Connection: close[\r][\n]"

Note that there is no XML body provided.

Here's what a logging of the exception looks like. Code

LOG.error(
            "Got service exception. Is throttling? {}. Error details: {}.",
            e.isThrottlingException(),
            e.awsErrorDetails(),
            e);

Log output

Got service exception. Is throttling? false. Error details: AwsErrorDetails(serviceName=S3). 
software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 503, Request ID: DNJ9ZP64AW5HP5ZT, Extended Request ID: uBunIlZ0ytEiYNOyt7KND7OOpngDTjsSrYKveakQTyxO80MX0sHVOxLuu6jnbBSQlUq53yUkCiKPuknvXvOAW4ewXKhquD4P)

Here, notice that

  1. isThrottlingException() returned false
  2. AwsErrorDetails.errorCode wasn't printed, so it must be null
  3. The first field of the exception text is null, which is another field that's pulled from the error XML.

Possible Solution

For HEAD requests, S3 does not provide an error XML. I speculate that this is the problem. From a cursory reading of the code, it appears that the data source for AwsServiceException.isThrottlingException() is awsErrorDetails.errorCode(), and this field is derived from the error XML in AwsXmlErrorUnmarshaller.unmarshall(). To solve this problem, the implementation of AwsServiceException.isThrottlingException() would need to look at the HTTP status code when the error code was not provided.

Additional Information/Context

No response

AWS Java SDK version used

2.22.12

JDK version used

8

Operating System and version

EMR Serverless

debora-ito commented 3 months ago

Moving to the Java SDK 2.x repo.