Closed NBaySellier closed 6 months ago
Sounds fair to me, should be addressed in #2727
@jaidisido Forgive me if I'm wrong but in that MR, the s3_output is not actually being propagated to any of the functions that are being called within to_iceberg, right? Therefore the actual functionality doesn't seem to have changed?
Describe the bug
We can not pass
s3_output
toathena.to_iceberg
.However, within
athena.to_iceberg
, when calling the functions_start_query_execution
which is called in multiple places, we do not pass anys3_output
parameter to that function call.Instead, a default
s3_output
is then constructed from theboto3_session
, which is based on theaccount_id
andregion
. However, this can lead to unexpected access related issues as the caller may not have access to this bucket.For example:
InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: Unable to verify/create output bucket aws-athena-query-results-XXXX-YYYY
This could be fixed by allowing the user to explicitly pass
s3_output
toathena.to_iceberg
which is then passed down to the corresponding function calls.How to Reproduce
The error only occurs if we do not have e.g. StartQueryExecution permission on the default created bucket
aws-athena-query-results-ACCOUNT-REGION
Call
Expected behavior
No access related issues for some unspecified generated bucket. Instead, should be able to pass location of s3_output myself.
Your project
No response
Screenshots
No response
OS
Mac M1
Python version
3.9
AWS SDK for pandas version
3.7
Additional context
No response