aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.21k stars 853 forks source link

NPE in RetryOnErrorCodeCondition.shouldRetry() #4600

Open steveloughran opened 1 year ago

steveloughran commented 1 year ago

Describe the bug

NPE in error handling code of RetryOnErrorCodeCondition.shouldRetry(); in bundle-2.20.128.jar


Caused by: java.lang.NullPointerException
    at software.amazon.awssdk.awscore.retry.conditions.RetryOnErrorCodeCondition.shouldRetry(RetryOnErrorCodeCondition.java:45) ~[bundle-2.20.128.jar:?]
    at software.amazon.awssdk.core.retry.conditions.OrRetryCondition.lambda$shouldRetry$0(OrRetryCondition.java:46) ~[bundle-2.20.128.jar:?]
    at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90) ~[?:1.8.0_382]

looking at source its caught AwsServiceException but the error details are (somehow) null and there's no handling of that

        if (ex instanceof AwsServiceException) {
            AwsServiceException exception = (AwsServiceException) ex;

            return retryableErrorCodes.contains(exception.awsErrorDetails().errorCode()); /** here*/
        }

Expected Behavior

A call of S3AFileSystem.isDirectory() to return true if there's a directory at the path, that is: does LIST path or HEAD path+"/" return something.

Current Behavior

Caused by: java.lang.NullPointerException
    at software.amazon.awssdk.awscore.retry.conditions.RetryOnErrorCodeCondition.shouldRetry(RetryOnErrorCodeCondition.java:45)
    at software.amazon.awssdk.core.retry.conditions.OrRetryCondition.lambda$shouldRetry$0(OrRetryCondition.java:46)
    at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90)
    at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
    at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
    at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.anyMatch(ReferencePipeline.java:516)
    at software.amazon.awssdk.core.retry.conditions.OrRetryCondition.shouldRetry(OrRetryCondition.java:46)
    at software.amazon.awssdk.core.retry.conditions.AndRetryCondition.lambda$shouldRetry$0(AndRetryCondition.java:47)
    at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90)
    at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
    at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
    at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.allMatch(ReferencePipeline.java:521)
    at software.amazon.awssdk.core.retry.conditions.AndRetryCondition.shouldRetry(AndRetryCondition.java:47)
    at software.amazon.awssdk.core.retry.conditions.AndRetryCondition.lambda$shouldRetry$0(AndRetryCondition.java:47)
    at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90)
    at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
    at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
    at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
    at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.allMatch(ReferencePipeline.java:521)
    at software.amazon.awssdk.core.retry.conditions.AndRetryCondition.shouldRetry(AndRetryCondition.java:47)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.retryPolicyAllowsRetry(RetryableStageHelper.java:109)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:66)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
    at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
    at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:196)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
    at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
    at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
    at software.amazon.awssdk.services.s3.DefaultS3Client.headObject(DefaultS3Client.java:5445)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$8(S3AFileSystem.java:2758)
    at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
    at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:431)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2746)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2726)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3857)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3785)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getFileStatus$24(S3AFileSystem.java:3762)
    at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
    at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
    at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2571)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2590)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3760)
    at org.apache.hadoop.hive.metastore.Warehouse.isDir(Warehouse.java:853)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$8.run(HiveMetaStore.java:1449)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$8.run(HiveMetaStore.java:1447)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database_core(HiveMetaStore.java:1447)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1575)
    ... 41 more

Reproduction Steps

see stack trace; this is hadoop trunk up to HADOOP-18908. Improve S3A region handling; but with some internals on the execution chain (s3a auditing)

Possible Solution

look for a null errorDetails() field before calling errorCode() in it; if null it is non-retryable.

 return retryableErrorCodes.contains(exception.awsErrorDetails().errorCode());

Additional Information/Context

there is some underlying deployment quirk triggering this. fix the NPE and we'll be able to find out what

AWS Java SDK version used

2.20.128

JDK version used

OpenJDK Runtime Environment (build 1.8.0_382-b05)

Operating System and version

linux

debora-ito commented 1 year ago

Is it possible that retryableErrorCodes is null, and not errorDetails?

steveloughran commented 1 year ago

maybe both; all we saw was the failure from the missing error code