Unable to use copy_from_files to load into a redshift table w/ an identity column.

aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Apache License 2.0

3.9k stars 693 forks source link

Describe the bug

In reference to this issue, it appears we are still unable to run copy_from_files when attempting to copy parquet data into a redshift table that has an identity column. It works with to_sql, but not copy_from_files.

How to Reproduce

*P.S. Please do not attach files as it's considered a security risk. Add code snippets directly in the message body as much as possible.*

Upload parquet data into S3
Create a table with an Identity column for said file
run awswrangler.redshift.copy_from_files() to copy the file into the target table. This will return the error:

redshift_connector.error.ProgrammingError: {'S': 'ERROR', 'C': '42601', 'M': 'NOT NULL column without DEFAULT must be included in column list', 'F': '../src/pg/src/backend/commands/commands_copy.c', 'L': '2836', 'R': 'DoTheCopy'}

Expected behavior

Expected the copy query to succeed and the resulting table to contain the same data as the parquet file, with the identity column auto-incrementing.

Your project

No response

Screenshots

No response

OS

Linux

Python version

3.11.7

AWS SDK for pandas version

3.8.0

Additional context

No response

df = pd.DataFrame({"foo": ["a", "b", "c"]}) wr.s3.to_parquet(df, f"{path}test.parquet") with redshift_con.cursor() as cursor: cursor.execute( f""" CREATE TABLE {schema}.{redshift_table} ( id BIGINT IDENTITY(1, 1), foo VARCHAR(100), PRIMARY KEY(id) ); """ ) wr.redshift.copy_from_files( path=path, path_suffix=f".parquet", con=redshift_con, table=redshift_table, data_format="parquet", schema=schema, iam_role=databases_parameters["redshift"]["role"], ) df_out = wr.redshift.read_sql_table(redshift_table, redshift_con)

aws / aws-sdk-pandas