awslabs / aws-java-nio-spi-for-s3

A Java NIO.2 service provider for Amazon S3
Apache License 2.0
53 stars 19 forks source link

DetermineBucketLocation method called even when region is defined #443

Closed UQIQLP5 closed 1 day ago

UQIQLP5 commented 3 weeks ago

Currently, we are facing a problem in the AWS environment. Even though the aws.region is configured to eu-central-1, the S3ClientProvider always tries to first call determineBucketLocation. For this call, the DEFAULT_CLIENT is used.

private static final S3AsyncClient DEFAULT_CLIENT = S3AsyncClient.builder() .endpointOverride(URI.create("https://s3.us-east-1.amazonaws.com")) .crossRegionAccessEnabled(true) .region(Region.US_EAST_1).build();

Unfortunately, in our case, the us-east endpoint is not reachable from our cluster, resulting in a timeout exception.

java.util.concurrent.ExecutionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: connection timed out after 2000 ms: xxx.s3.us-east-1.amazonaws.com at software.amazon.nio.spi.s3.S3ClientProvider.generateClient(S3ClientProvider.java:85) at software.amazon.nio.spi.s3.S3FileSystem.client(S3FileSystem.java:415) at software.amazon.nio.spi.s3.S3FileSystemProvider.getCompletableFutureForHead(S3FileSystemProvider.java:653) at software.amazon.nio.spi.s3.S3FileSystemProvider.checkAccess(S3FileSystemProvider.java:607) at java.base/java.nio.file.Files.exists(Files.java:2440)

If I try to call the "DetermineBucketLocation" with manually created client configured to the "eu-central-1" it works as expected. In general, it is not clear to me why there is even a need to call DetermineBucketLocation when the region is already defined…

Are we doing something wrong, or is this a bug?"

markjschreiber commented 3 weeks ago

The problem lies with the fact that S3 buckets predate AWS regions. This means that neither the bucket arn nor the S3 URI will tell you which location the bucket is located in. Cross region access of buckets is fine so in theory we don't need to know where it is but all AWS requests must be signed following the Sigv4 spec which requires that the region in the signature matches the region in the bucket endpoint. S3 has an API called GetBucketLocation which can tell you the region of a bucket but all calls to GetBucketLocation must be directed to us-east-1. The library also uses some other tricks to determine a bucket location based on response headers but will try the above API call first.

In your case it seems it doesn't gracefully recover from the attempt to get the location from the GetBucketLocation api.

Some other complications - it looks like the GetBucketLocation api is deprecated in favor of the HeadBucket api but is supported for backward compatibility.

For this library we probably need to reconsider the strategy used when determining locations and likely we will need to make it configurable. In reality you usually know where your bucket is, except in the case of public buckets or buckets shared by another account that you don't own so a "just use my region" configuration would not be unreasonable. So, not entirely a bug but certainly something we could see if we can improve.

As a work around, you might want to allow list https://s3.us-east-1.amazonaws.com at least for the purposes of calling the GetBucketLocation API. I'm not sure if your cluster can be that granular in it's access though. A secondary work around would be to for the library and change the endpoint of the default client. You may also have to remove the logic that calls the GetBucketLocation api as well though?

hubfruser commented 2 weeks ago

I got similar error and wonder it was caused by this bug. My app was deployed a region which was not in "us-east-1", and had Environment value "AWS_REGION" set to the app's region. My s3 bucket was created and confirmed exist in same region of my app. When my app ran, I got connection error which indicated the nio.file operation tried to connect to "xxxx.s3.us-east-1.amazonaws.com". "xxxx" is my S3 bucket name. My sample code is below. and use this library's version 1.2.4.

        String s3BucketName = "xxx" //some bucket name which exists and accessble by the service running the code.
        URI s3BucketURI = createS3BucketURI(s3BucketName);
        S3FileSystemProvider provider = new S3FileSystemProvider();
        FileSystem = provider.newFileSystem(s3BucketURI);
        Path filePath = fileSystem.getPath(someBucketFilePathName);
        Writer writer = Files.newBufferedWriter(filePath);

Error was thrown in above new writer line.

Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connect to xxxx.s3.us-east-1.amazonaws.com:443 failed: connect timed out
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
at 

software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223) at

software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53) at software.amazon.awssdk.services.s3.DefaultS3Client.getBucketLocation(DefaultS3Client.java:4005) at software.amazon.awssdk.services.s3.S3Client.getBucketLocation(S3Client.java:6383) at software.amazon.nio.spi.s3.S3ClientProvider.generateAsyncClient(S3ClientProvider.java:250) at software.amazon.nio.spi.s3.S3ClientProvider.generateAsyncClient(S3ClientProvider.java:165) at software.amazon.nio.spi.s3.S3FileSystem.client(S3FileSystem.java:195) at software.amazon.nio.spi.s3.S3FileSystemProvider.newByteChannel(S3FileSystemProvider.java:321) at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) at java.nio.file.Files.newOutputStream(Files.java:216)

I am not using S3 client directly, not sure how UQIQLP5 could manually invoke "DetermineBucketLocation" (which seems not a public method). And Mark, I don't quite follow your workaround 1 since my bucket was not in us-east-1 and somehow the library thought my s3 bucket was in us-east-1 as indicated in above error. and wonder if you could point me to an example with more details for the 2 workarounds, or other suggestion you might have for my code usage. Thanks !

markjschreiber commented 2 weeks ago

This could well be the same issue. BTW, what version of the library are you using?

Work around 1. would require figuring out what part of your network setup might be responsible for blocking the access to the us-east-1 endpoint.

Work around 2 (currently) would require forking the code, removing the region location check and then recompiling (not impossible but perhaps not simple).

Longer term I am thinking we might want to change the way that the library tries to resolve the bucket region, maybe by first assuming it will be in the same region and then if that doesn't work falling back to trying other approaches.

hubfruser commented 2 weeks ago

This could well be the same issue. BTW, what version of the library are you using?

Work around 1. would require figuring out what part of your network setup might be responsible for blocking the access to the us-east-1 endpoint.

Work around 2 (currently) would require forking the code, removing the region location check and then recompiling (not impossible but perhaps not simple).

Longer term I am thinking we might want to change the way that the library tries to resolve the bucket region, maybe by first assuming it will be in the same region and then if that doesn't work falling back to trying other approaches.

I am using version 1.2.4 (not able to use 2.0.x because I have to stay with Java 8 for a while). The "xxxx" in the error was my s3 bucket name: Connect to xxxx.s3.us-east-1.amazonaws.com:443 failed: connect timed out .

For workaround 1, even my cluster network is configured to allow access us-east-1.amazonaws.com, but there is no s3 bucket with name "xxxx" in that region. So I think the error (or other error like Can not find s3 bucket) will still exist with invalid bucket.

It seems I can not do workaround 2 which needs forking code( regular users like me probably don't want to bother to do this . We have been hoping declaring a maven dependency will do the work instead of we maining our own version of library myself).

And a question: is it correct to configure Environment value "AWS_REGION" (the README says "aws.region" for system parameter but environment value is normally upper case). Since my app deployed on aws are in the same AZ as my s3 bucket, and app had permission of access the s3 bucket set, I don't think I need to configure aws.accessKey or aws.secretAccessKey. And I don't need "s3.spi.read.fragment-number" or size, assuming default values should work.

Update: I configured now both "AWS_REGION" and "aws.region" in environment variables for the service running my app deployed in aws. Both region env values are not us-east-1, but error still the same which tried to access xxxx bucket in us-east-1.amazonaws.com, even my bucket is not in us-east-1 region.

Now I feel this issue may not be an enhancement, but a blocker issue for me (unless the issue I am seeing is different from the issue of original post). The workaround 1 or 2 seems may not be working for me. This issue seems only occurred for my app deployed in aws (no credentials are configured except the aws service of my app has read/write permission to access my s3 bucket). My local running apps did not have this issue (it was able to access my bucket in non us-east-1 region with credentials, access keys and secret keys). Please let me know if Mark or anyone else have input or ideas.

Thanks!

markjschreiber commented 3 days ago

@hubfruser, @UQIQLP5 I think this should now be fixed in main for version 2.x.

Let me know if it is still an issue.

UQIQLP5 commented 2 days ago

It seems it really helped. Thank you! I also noticed that you also changed the timeout length. I am not sure if it was intentional, though. When I tried it first in AWS, I was getting connection timeouts on that request. Once I changed that change back, it worked. It might have been caused by some unrelated problem in my setup, so I would have to test it again to confirm if this is a problem

markjschreiber commented 2 days ago

Thanks, can you tell me which time out you want changed. I can fix that. Or you could make a pull request.

markjschreiber commented 2 days ago

I think I found the location and have patched main. Let me know if this works.