aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.85k stars 681 forks source link

Can't query shared Lake Formation cross account table #426

Closed whrocha closed 3 years ago

whrocha commented 3 years ago

Describe the bug I can query a table shared through Lake Formation to a cross account by Athena, but the same Table is not possible to query through WR.

If I use ctas_approach=False works only if I don't have a list column, which is my case, then the WR error message tell me to use ctas_approuch=True.

error message

UnsupportedType: List data type is not support with ctas_approach=False. Please use ctas_approach=True for List columns.

But when I put ctas_approuch=True I gotta this error:

AccessDeniedException: An error occurred (AccessDeniedException) when calling the DeleteTable operation: Insufficient Lake Formation permission(s): Required Drop on temp_table_f225f0cb9ad4491390163ca0d0c16f98

The point is, the same sql script run without errors in Athena console.

To Reproduce

import awswrangler as wr

df = wr.athena.read_sql_query("SELECT * FROM table_name LIMIT 10", database="db_credit_engine", s3_output="s3://bucket-name", ctas_approach=True)

complete error msg:

---------------------------------------------------------------------------
QueryFailed                               Traceback (most recent call last)
~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/awswrangler/athena/_read.py in _resolve_query_without_cache(sql, database, ctas_approach, categories, chunksize, s3_output, workgroup, encryption, kms_key, keep_files, ctas_temp_table_name, use_threads, boto3_session)
    484                 data_source=data_source,
--> 485                 s3_output=_s3_output,
    486                 keep_files=keep_files,

~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/awswrangler/athena/_read.py in _resolve_query_without_cache_ctas(sql, database, s3_output, keep_files, chunksize, categories, encryption, workgroup, kms_key, wg_config, name, use_threads, boto3_session)
    392                 "Please, define all columns names in your query. (E.g. 'SELECT MAX(col1) AS max_col1, ...')"
--> 393             )
    394         if "Column type is unknown" in msg:

~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/awswrangler/athena/_read.py in _resolve_query_without_cache_ctas(sql, database, s3_output, keep_files, chunksize, categories, encryption, workgroup, kms_key, wg_config, name, use_threads, boto3_session)
    375             raise exceptions.InvalidCtasApproachQuery(
--> 376                 "Is not possible to wrap this query into a CTAS statement. Please use ctas_approach=False."
    377             )

~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/awswrangler/athena/_utils.py in _get_query_metadata(query_execution_id, boto3_session, categories, query_execution_payload)
    168             reason: str = query_execution_payload["Status"]["StateChangeReason"]
--> 169             raise exceptions.QueryFailed(f"Query error: {reason}")
    170         _query_execution_payload: Dict[str, Any] = query_execution_payload

~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/awswrangler/athena/_utils.py in wait_query(query_execution_id, boto3_session)
    646     --------
--> 647     >>> import awswrangler as wr
    648     >>> res = wr.athena.wait_query(query_execution_id='query-execution-id')

QueryFailed: Insufficient permissions to execute the query.  Insufficient Lake Formation permission(s) on temp_table_f225f0cb9ad4491390163ca0d0c16f98 . You may need to manually clean the data at location 's3://aws-athena-query-results-650770491766/tables/64ac979a-3c68-45cb-8a86-2e83666d429a' before retrying. Athena will not delete data in your account.

During handling of the above exception, another exception occurred:

AccessDeniedException                     Traceback (most recent call last)
<ipython-input-84-c2c3cff2a4f5> in <module>
      4                               database="db_credit_engine",
      5                               s3_output="s3://aws-athena-query-results-650770491766/",
----> 6                               ctas_approach=True)
      7 
      8 df

~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/awswrangler/_config.py in wrapper(*args_raw, **kwargs)
    261                 del args[name]
    262                 args = {**args, **keywords}
--> 263         return function(**args)
    264 
    265     wrapper.__doc__ = _inject_config_doc(doc=function.__doc__, available_configs=available_configs)

~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/awswrangler/athena/_read.py in read_sql_query(sql, database, ctas_approach, categories, chunksize, s3_output, workgroup, encryption, kms_key, keep_files, ctas_temp_table_name, use_threads, boto3_session, max_cache_seconds, max_cache_query_inspections)
    692             "Queries with ctas_approach=True (default) does not support "
    693             "data_source values different than None and 'AwsDataCatalog'. "
--> 694             "Please check the related tutorial for more details "
    695             "(https://github.com/awslabs/aws-data-wrangler/blob/master/"
    696             "tutorials/006%20-%20Amazon%20Athena.ipynb)"

~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/awswrangler/athena/_read.py in _resolve_query_without_cache(sql, database, ctas_approach, categories, chunksize, s3_output, workgroup, encryption, kms_key, keep_files, ctas_temp_table_name, use_threads, boto3_session)
    486                 keep_files=keep_files,
    487                 chunksize=chunksize,
--> 488                 categories=categories,
    489                 encryption=encryption,
    490                 workgroup=workgroup,

~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/awswrangler/_config.py in wrapper(*args_raw, **kwargs)
    261                 del args[name]
    262                 args = {**args, **keywords}
--> 263         return function(**args)
    264 
    265     wrapper.__doc__ = _inject_config_doc(doc=function.__doc__, available_configs=available_configs)

~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/awswrangler/catalog/_delete.py in delete_table_if_exists(database, table, catalog_id, boto3_session)
     81         return True
     82     except client_glue.exceptions.EntityNotFoundException:
---> 83         return False
     84 
     85 

~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    314                     "%s() only accepts keyword arguments." % py_operation_name)
    315             # The "self" in this scope is referring to the BaseClient.
--> 316             return self._make_api_call(operation_name, kwargs)
    317 
    318         _api_call.__name__ = str(py_operation_name)

~/miniconda/envs/crc-sandbox/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    633             error_code = parsed_response.get("Error", {}).get("Code")
    634             error_class = self.exceptions.from_code(error_code)
--> 635             raise error_class(parsed_response, operation_name)
    636         else:
    637             return parsed_response

AccessDeniedException: An error occurred (AccessDeniedException) when calling the DeleteTable operation: Insufficient Lake Formation permission(s): Required Drop on temp_table_f225f0cb9ad4491390163ca0d0c16f98
igorborgest commented 3 years ago

To be able to work with nested types in the results you need to query with ctas_approach=True and it requires permission to delete table. Check the related tutorial for more.