Closed nlm4145 closed 2 months ago
I have tried this snippet and it seems to do what you are describing without issue:
df = pd.DataFrame({"foo": ["a", "b", "c"]})
wr.s3.to_parquet(df, f"{path}test.parquet")
with redshift_con.cursor() as cursor:
cursor.execute(
f"""
CREATE TABLE {schema}.{redshift_table} (
id BIGINT IDENTITY(1, 1),
foo VARCHAR(100),
PRIMARY KEY(id)
);
"""
)
wr.redshift.copy_from_files(
path=path,
path_suffix=f".parquet",
con=redshift_con,
table=redshift_table,
data_format="parquet",
schema=schema,
iam_role=databases_parameters["redshift"]["role"],
)
df_out = wr.redshift.read_sql_table(redshift_table, redshift_con)
returns
id foo
0 4 b
1 2 a
2 6 c
Describe the bug
In reference to this issue, it appears we are still unable to run copy_from_files when attempting to copy parquet data into a redshift table that has an identity column. It works with to_sql, but not copy_from_files.
How to Reproduce
redshift_connector.error.ProgrammingError: {'S': 'ERROR', 'C': '42601', 'M': 'NOT NULL column without DEFAULT must be included in column list', 'F': '../src/pg/src/backend/commands/commands_copy.c', 'L': '2836', 'R': 'DoTheCopy'}
Expected behavior
Expected the copy query to succeed and the resulting table to contain the same data as the parquet file, with the identity column auto-incrementing.
Your project
No response
Screenshots
No response
OS
Linux
Python version
3.11.7
AWS SDK for pandas version
3.8.0
Additional context
No response