Closed kadwanev closed 4 months ago
It's unfortunate that the workaround doesn't work on Hadoop environments, but we don't have plans to fix the vpce access in the v1 s3 client. We are closing stale v1 issues before going into Maintenance Mode, so if this issue is still relevant in v2 please open a new issue in the v2 repo.
This issue is now closed.
Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.
Describe the feature
When configuring an S3 VPCE interface endpoint, and enabling ForceGlobalBucketAccess to true, the endpoint region gets set to "vpce".
Reference: https://github.com/aws/aws-sdk-java/issues/2992
This bug solution proposes creating multiple clients but this is not possible when accessing S3 in a Hadoop environment like Spark.
Hadoop is using ForceGlobalBucketAccess in it's AWS S3 FileSystem: https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java#L288
So even if the endpoint region is configured explicitly, it gets changed to "vpce" and the following error happens: Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The authorization header is malformed; the region 'vpce' is wrong; expecting 'us-east-1' (Service: Amazon S3; Status Code: 400; Error Code: AuthorizationHeaderMalformed;
Use Case
Use VPCE endpoint to access S3 bucket through Spark.
Proposed Solution
No response
Other Information
No response
Acknowledgements
AWS Java SDK version used
1.12.42
JDK version used
11.0.19
Operating System and version
RHEL 7.9