aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.2k stars 853 forks source link

SdkClientException is thrown periodically when using InstanceProfileCredentialsProvider to access S3 #3939

Closed tsuyoshizawa closed 1 year ago

tsuyoshizawa commented 1 year ago

Describe the bug

The system I am working on handles S3 access using IAM roles. This access processing is done using the InstanceProfileCredentialsProvider in the AWS Java SDK.

The following stocktrace began to spit out around the end of January 2023. Curiously, about once a week at about the same time, all application servers began to spit out the same error.

Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from any of the providers in the chain AwsCredentialsProviderChain(credentialsProviders=[SystemPropertyCredentialsProvider(), EnvironmentVariableCredentialsProvider(), WebIdentityTokenCredentialsProvider(), ProfileCredentialsProvider(profileName=default, profileFile=ProfileFile(profilesAndSectionsMap=[{***=Profile(name=***, properties=[role_arn, credential_source])}, {}])), ContainerCredentialsProvider(), InstanceProfileCredentialsProvider()]) : [SystemPropertyCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., EnvironmentVariableCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., WebIdentityTokenCredentialsProvider(): Either the environment variable AWS_WEB_IDENTITY_TOKEN_FILE or the javaproperty aws.webIdentityTokenFile must be set., ProfileCredentialsProvider(profileName=default, profileFile=ProfileFile(profilesAndSectionsMap=[{***=Profile(name=***, properties=[role_arn, credential_source])}, {}])): Profile file contained no credentials for profile 'default': ProfileFile(profilesAndSectionsMap=[{***=Profile(name=***, properties=[role_arn, credential_source])}, {}]), ContainerCredentialsProvider(): Cannot fetch credentials from container - neither AWS_CONTAINER_CREDENTIALS_FULL_URI or AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variables are set., InstanceProfileCredentialsProvider(): Failed to load credentials from IMDS.]
        at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
        at software.amazon.awssdk.auth.credentials.AwsCredentialsProviderChain.resolveCredentials(AwsCredentialsProviderChain.java:117)
        at software.amazon.awssdk.auth.credentials.internal.LazyAwsCredentialsProvider.resolveCredentials(LazyAwsCredentialsProvider.java:45)
        at software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider.resolveCredentials(DefaultCredentialsProvider.java:126)
        at software.amazon.awssdk.core.internal.util.MetricUtils.measureDuration(MetricUtils.java:50)
        at software.amazon.awssdk.awscore.internal.authcontext.AwsCredentialsAuthorizationStrategy.resolveCredentials(AwsCredentialsAuthorizationStrategy.java:100)
        at software.amazon.awssdk.awscore.internal.authcontext.AwsCredentialsAuthorizationStrategy.addCredentialsToExecutionAttributes(AwsCredentialsAuthorizationStrategy.java:77)
        at software.amazon.awssdk.services.s3.internal.signing.DefaultS3Presigner.invokeInterceptorsAndCreateExecutionContext(DefaultS3Presigner.java:366)
        at software.amazon.awssdk.services.s3.internal.signing.DefaultS3Presigner.presign(DefaultS3Presigner.java:308)
        at software.amazon.awssdk.services.s3.internal.signing.DefaultS3Presigner.presignGetObject(DefaultS3Presigner.java:230)
        ...

Regarding the processing, there is an error at the point where S3Presigner.create() is called and presignGetObject is executed.

I am not currently aware of any changes made to the infrastructure environment around AWS prior to this issue.

Expected Behavior

Continuous access to S3 without a single credentials loading error.

Current Behavior

Get an error reading credentials only once a week.

Reproduction Steps

No reproducible steps.

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

AWS SDK 2.20.47

JDK version used

OpenJDK Runtime Environment Corretto-11.0.18.10.1 (build 11.0.18+10-LTS)

Operating System and version

Amazon Linux release 2 (Karoo)

debora-ito commented 1 year ago

If you were expecting InstanceProfileCredentialsProvider to be picked up, this is the relevant error message:

InstanceProfileCredentialsProvider(): Failed to load credentials from IMDS.

Since the error is intermittent (once a week), and based on previous issue reports, this is probably caused by latency in the IMDS endpoint. You can enable the SDK client-side metrics to obtain more insights on the duration of the credential fetching step over time.

To set expectations, EC2 instance credentials need to communicate with the IMDS endpoint to obtain the temporary session token, this process will be affected by connectivity and latency issues, so credential fetching errors can occur.

tsuyoshizawa commented 1 year ago

@debora-ito Thank you for confirming what the issue is.

I looked into the latency issue and the advice you gave me and found the following similar problem and solution. https://medium.com/expedia-group-tech/service-slow-to-retrieve-aws-credentials-ebc02a38e95b

It seems to be a known issue, as described in that blog. I should have looked into this GitHub Issue more.

The sample you provided to try to get metrics adds a MetricPublisher through the overrideConfiguration method in the S3Client. However, the S3Presigner class does not seem to have an API to add a similar MetricPublisher.

If I want to measure with a mechanism like the sample, is it possible to do so if there is a similar API?

I hope this will be a reference for improving the AWS SDK Java library.

debora-ito commented 1 year ago

However, the S3Presigner class does not seem to have an API to add a similar MetricPublisher?

That's because S3Presigner simply generates a signed request, the SDK does not control how that request will be sent to the service so there's no way to track the duration of the steps in the request lifecycle. I apologize, I should have noticed you were using S3Presigner before I suggested the client metrics.

Let us know if you have any other question.

tsuyoshizawa commented 1 year ago

Thank you. I will try to measure the latency slowdown on my end.

If AWS provides a solution to slow latency with that, please let me know. until then, you can have this issue closed.

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

tsuyoshizawa commented 1 year ago

I am leaving this as a note because I may have possibly solved the problem. As mentioned in the comments above, I am using S3Presigner for my project.

I specified the same CredentialsProvider and Region that I use for the upload process, and the timeout that used to occur only once a week for all instances no longer occurs.

Before

S3Presigner.create()

After

S3Presigner
  .builder()
  .credentialsProvider(awsCredentialsProviderChain)
  .region(awsRegionProviderChain.getRegion)
  .build()

I am skeptical because both ultimately use the CredentialsProvider generated by InstanceProfileCredentialsProvider.create().

hendisantika commented 6 days ago

I am using like this:

@Configuration
@AllArgsConstructor
public class AmazonS3Config {

    private final AwsProperties awsProperties;

    @Bean("awsCredentials")
    public AwsCredentialsProvider awsCredentials() {
        return DefaultCredentialsProvider.create();
    }

    @Bean("s3AsyncClient")
    public S3AsyncClient s3AsyncClient(@Qualifier("awsCredentials") AwsCredentialsProvider awsCredentials) {
        return S3AsyncClient
                .builder()
                .credentialsProvider(awsCredentials)
                .region(Region.of(awsProperties.getS3().getRegion()))
                .build();
    }

    @Bean("s3Client")
    public S3Client s3Client(@Qualifier("awsCredentials") AwsCredentialsProvider awsCredentials) {
        return S3Client
                .builder()
                .credentialsProvider(awsCredentials)
                .region(Region.of(awsProperties.getS3().getRegion()))
                .build();
    }
}

It is OK on my local. But, after deployed to VM error occurred like above.

I am using Spring Boot 3.3.5 n AWS SDK S3 2.x.x

Any suggestions?

Thanks