Closed lautarortega closed 1 month ago
Hi @lautarortega looks like the table has not been updated correctly. Can you refer the query you used to update the table?
Hi @kukushking! Yes, sure. This is the command I used.
ALTER TABLE test_table CHANGE COLUMN new_age age int
Thanks, I am able to reproduce this with the following snippet:
import awswrangler as wr
import pandas as pd
DATABASE = "default"
BUCKET = "<REDACTED>"
TABLE_NAME = "iceberg1"
data = {'first_name': ['John'],
'city': ['Nashville']
}
df = pd.DataFrame(data)
wr.athena.to_iceberg(
df=df,
database=DATABASE,
table=TABLE_NAME,
table_location=f"s3://{BUCKET}/{TABLE_NAME}",
temp_path=f"s3://{BUCKET}/temp/{TABLE_NAME}",
schema_evolution=True,
keep_files=False,
)
wr.athena.start_query_execution(f"ALTER TABLE {TABLE_NAME} CHANGE COLUMN first_name new_first_name string", database=DATABASE)
data = {'new_first_name': ['Lily'],
'city': ['Ontario']
}
df = pd.DataFrame(data)
wr.athena.to_iceberg(
df=df,
database=DATABASE,
table=TABLE_NAME,
table_location=f"s3://{BUCKET}/{TABLE_NAME}",
temp_path=f"s3://{BUCKET}/temp/{TABLE_NAME}",
schema_evolution=True,
keep_files=False,
)
Traceback (most recent call last):
...
raise exceptions.QueryFailed(response["Status"].get("StateChangeReason"))
awswrangler.exceptions.QueryFailed: COLUMN_NOT_FOUND: Insert column name does not exist in target table: first_name. If a data manifest file was generated at 's3://<REDACTED>/f6a7953b-dbe7-4699-a5ce-2f13168f0253-manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account.
Looking into the fix.
This relates to https://github.com/apache/iceberg/issues/7584 in which Glue still displays old columns as if they were present in the schema, while subsequent INSERT
statements include the columns that are no longer considered "current" by Iceberg.
Describe the bug
I created an iceberg table in Athena through AWS Wrangler. I rename a column through an Athena query. When I want to write more rows to the table with the new col in the df, I get this error
QueryFailed: TYPE_MISMATCH: Insert query has mismatched column types: Table: [varchar, integer, varchar], Query: [varchar, integer, varchar, integer]. If a data manifest file was generated at 's3://aws-athena-query-results-494340620388-eu-west-1/68cb1983-852e-4f12-9d17-7af88520b02a-manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account.
The error mentioning this S3 path makes me believe it is not using the temp path I passed as an argument.
How to Reproduce
Expected behavior
No response
Your project
No response
Screenshots
No response
OS
Mac
Python version
3.10
AWS SDK for pandas version
3.9.1
Additional context
No response