Closed sandra-selfdecode closed 6 months ago
@sandra-selfdecode we had a breaking change in 3.7.0 that might be related to this but I can't tell without more details. In short, Glue tables of type GOVERNED
are not supported anymore. Is your Glue table of that type by any chance?
To help debug, can you please provide:
wr.s3.to_parquet(path='s3://bucket/...'
)Does GOVERNED
include EXTERNAL
? It's not lake formation, but it's made with this cdk code:
return CfnTable(
scope,
id,
catalog_id=ENVIRONMENT.aws_account_id,
database_name=ATHENA_DATABASE_NAME,
table_input=CfnTable.TableInputProperty(
name=table_name,
description=description,
parameters={
"EXTERNAL": "TRUE",
"has_encrypted_data": False,
"parquet.compression": "GZIP",
},
partition_keys=[
CfnTable.ColumnProperty(name=key[0], type=key[1])
for key in partition_keys
]
if partition_keys
else None,
storage_descriptor=CfnTable.StorageDescriptorProperty(
columns=[
CfnTable.ColumnProperty(name=column[0], type=column[1])
for column in columns
],
input_format=(
"org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
),
output_format=(
"org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
),
location=location,
serde_info=CfnTable.SerdeInfoProperty(
serialization_library=(
"org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
),
),
),
table_type="EXTERNAL_TABLE",
),
)
Here is the write command:
return wr.s3.to_parquet(
df=df,
dataset=True,
compression="gzip",
use_threads=True,
partition_cols=[partition],
database=DATABASE_NAME,
table=TABLE,
dtype=column_dtypes,
mode="overwrite_partitions",
)
We believe we have identified the issue and it should be fixed with #2711. Patch release (3.7.1) will follow
After upgrading to 3.7.0 our lambda stopped working with the error:
Why is adding this permission necessary? I do not want the lambda to be able to create a new table within the database, I only want it to be able to add parquet files to an existing table.