awslabs / aws-athena-query-federation

The Amazon Athena Query Federation SDK allows you to customize Amazon Athena with your own data sources and code.
Apache License 2.0
560 stars 297 forks source link

[BUG] Error while checking bucket ownership with connector athena-federation-jdbc to mysql #1702

Open evbo opened 10 months ago

evbo commented 10 months ago

Describe the bug This bug was reported here. I am reopening in a new issue at the request of @akuzin1

To Reproduce Create buckets in multiple regions in your s3 account (e.g. us-west-1, us-west-2). Then create a new Athena Data Source for mysql RDS instance in a private VPC, ensuring a VPC endpoint for S3 has been enabled. Even with correct security groups and subnet ids, the lambda will fail because the way it calls listBuckets the lambda will try to list buckets not within your current region's VPC endpoint for S3.

Expected behavior Able to run a basic select * from mysql_data_source without errors in the connector if your s3 account has buckets in multiple regions. The code should be updated to not call listBuckets and instead only interact with the spill-bucket configured.

Screenshots / Exceptions / Errors

GENERIC_USER_ERROR: Encountered an exception[java.lang.RuntimeException] from your LambdaFunction[...] executed in context[retrieving meta-data] with message[Error while checking bucket ownership for ...] This query ran against the "..." database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: ...

Connector Details (please complete the following information):

Additional context

From the previously closed issue, two questions remain unanswered:

CloudWatch logs:

2024-01-14 22:07:29 ... WARN CompositeHandler:116 - handleRequest: Completed with an exception. java.lang.RuntimeException: Error while checking bucket ownership for ... at com.amazonaws.athena.connector.lambda.domain.spill.SpillLocationVerifier.updateBucketState(SpillLocationVerifier.java:100) ~[task/:?] at com.amazonaws.athena.connector.lambda.domain.spill.SpillLocationVerifier.checkBucketAuthZ(SpillLocationVerifier.java:74) ~[task/:?] at com.amazonaws.athena.connector.lambda.handlers.MetadataHandler.doHandleRequest(MetadataHandler.java:288) ~[task/:?] at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:144) ~[task/:?] at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:112) [task/:?] at lambdainternal.EventHandlerLoader$2.call(EventHandlerLoader.java:925) [aws-lambda-java-runtime-0.2.0.jar:?] at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:268) [aws-lambda-java-runtime-0.2.0.jar:?] at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:207) [aws-lambda-java-runtime-0.2.0.jar:?] at lambdainternal.AWSLambda.main(AWSLambda.java:196) [aws-lambda-java-runtime-0.2.0.jar:?] Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ...; S3 Extended Request ID: ...=; Proxy: null) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879) ~[task/:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418) ~[task/:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387) ~[task/:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157) ~[task/:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814) ~[task/:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781) ~[task/:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755) ~[task/:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715) ~[task/:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697) ~[task/:?] at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) ~[task/:?] at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) ~[task/:?] at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5520) ~[task/:?] at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5467) ~[task/:?] at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5461) ~[task/:?] at com.amazonaws.services.s3.AmazonS3Client.listBuckets(AmazonS3Client.java:1056) ~[task/:?] at com.amazonaws.services.s3.AmazonS3Client.listBuckets(AmazonS3Client.java:1062) ~[task/:?] at com.amazonaws.athena.connector.lambda.domain.spill.SpillLocationVerifier.updateBucketState(SpillLocationVerifier.java:88) ~[task/:?]

evbo commented 10 months ago

This issue is discussed here: https://stackoverflow.com/a/67261680/1080804

@akuzin1 Instead of listBuckets could getBucketLocation be called and then filter out any buckets not in the current Lambda's region?