Describe the bug
This bug was reported here. I am reopening in a new issue at the request of @akuzin1
To Reproduce
Create buckets in multiple regions in your s3 account (e.g. us-west-1, us-west-2). Then create a new Athena Data Source for mysql RDS instance in a private VPC, ensuring a VPC endpoint for S3 has been enabled. Even with correct security groups and subnet ids, the lambda will fail because the way it calls listBuckets the lambda will try to list buckets not within your current region's VPC endpoint for S3.
Expected behavior
Able to run a basic select * from mysql_data_source without errors in the connector if your s3 account has buckets in multiple regions. The code should be updated to not call listBuckets and instead only interact with the spill-bucket configured.
Screenshots / Exceptions / Errors
GENERIC_USER_ERROR: Encountered an exception[java.lang.RuntimeException] from your LambdaFunction[...] executed in context[retrieving meta-data] with message[Error while checking bucket ownership for ...]
This query ran against the "..." database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: ...
Connector Details (please complete the following information):
Version: 2023.49.2
Name: mysql
Athena Query IDs [if applicable]
Additional context
From the previously closed issue, two questions remain unanswered:
Can the docs be clarified for spill-bucket? Do we enter the bucket name or URL? I used bucket name - was that wrong?
Can this connector be updated to never call listBuckets since there's no way to avoid different regions (outside of the VPC endpoint)? Instead, could it only try to make requests involving the spill-bucket? For instance, can only getBucket be used?
CloudWatch logs:
2024-01-14 22:07:29 ... WARN CompositeHandler:116 - handleRequest: Completed with an exception.
java.lang.RuntimeException: Error while checking bucket ownership for ...
at com.amazonaws.athena.connector.lambda.domain.spill.SpillLocationVerifier.updateBucketState(SpillLocationVerifier.java:100) ~[task/:?]
at com.amazonaws.athena.connector.lambda.domain.spill.SpillLocationVerifier.checkBucketAuthZ(SpillLocationVerifier.java:74) ~[task/:?]
at com.amazonaws.athena.connector.lambda.handlers.MetadataHandler.doHandleRequest(MetadataHandler.java:288) ~[task/:?]
at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:144) ~[task/:?]
at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:112) [task/:?]
at lambdainternal.EventHandlerLoader$2.call(EventHandlerLoader.java:925) [aws-lambda-java-runtime-0.2.0.jar:?]
at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:268) [aws-lambda-java-runtime-0.2.0.jar:?]
at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:207) [aws-lambda-java-runtime-0.2.0.jar:?]
at lambdainternal.AWSLambda.main(AWSLambda.java:196) [aws-lambda-java-runtime-0.2.0.jar:?]
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ...; S3 Extended Request ID: ...=; Proxy: null)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879) ~[task/:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418) ~[task/:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387) ~[task/:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157) ~[task/:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814) ~[task/:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781) ~[task/:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755) ~[task/:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715) ~[task/:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697) ~[task/:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) ~[task/:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) ~[task/:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5520) ~[task/:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5467) ~[task/:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5461) ~[task/:?]
at com.amazonaws.services.s3.AmazonS3Client.listBuckets(AmazonS3Client.java:1056) ~[task/:?]
at com.amazonaws.services.s3.AmazonS3Client.listBuckets(AmazonS3Client.java:1062) ~[task/:?]
at com.amazonaws.athena.connector.lambda.domain.spill.SpillLocationVerifier.updateBucketState(SpillLocationVerifier.java:88) ~[task/:?]
Describe the bug This bug was reported here. I am reopening in a new issue at the request of @akuzin1
To Reproduce Create buckets in multiple regions in your s3 account (e.g. us-west-1, us-west-2). Then create a new Athena Data Source for mysql RDS instance in a private VPC, ensuring a VPC endpoint for S3 has been enabled. Even with correct security groups and subnet ids, the lambda will fail because the way it calls listBuckets the lambda will try to list buckets not within your current region's VPC endpoint for S3.
Expected behavior Able to run a basic
select * from mysql_data_source
without errors in the connector if your s3 account has buckets in multiple regions. The code should be updated to not calllistBuckets
and instead only interact with thespill-bucket
configured.Screenshots / Exceptions / Errors
Connector Details (please complete the following information):
Additional context
From the previously closed issue, two questions remain unanswered:
spill-bucket
? Do we enter the bucket name or URL? I used bucket name - was that wrong?listBuckets
since there's no way to avoid different regions (outside of the VPC endpoint)? Instead, could it only try to make requests involving thespill-bucket
? For instance, can only getBucket be used?CloudWatch logs: