aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.94k stars 702 forks source link

Error : upsert to governed table #1091

Closed alwem82 closed 2 years ago

alwem82 commented 2 years ago

P.S. Don't attach files. Please, prefer add code snippets directly in the message body. when trying to do upsert to governed table, i got below error. combination of two keys(registration_dttm and id) are unique. could anyone give the light on this issue?

Code delta_df = wr.lakeformation.read_sql_query( sql=f"SELECT * FROM {'governed_demo_table'};", database='governed_demo') delta_df.loc[delta_df['salary']>100000,'segment']='firstclass' delta_df.loc[delta_df['salary']<=100000,'segment']='secondclass' wr.s3.merge_upsert_table( delta_df=delta_df, database='governed_demo', table='governed_demo_table', primary_key=['registration_dttm','id'] )

ERROR MSG

FailedQualityCheck Traceback (most recent call last)

in ----> 1 wr.s3.merge_upsert_table( 2 delta_df=delta_df, 3 database='governed_demo', 4 table='governed_demo_table', 5 primary_key=['registration_dttm','id'] ~/opt/anaconda3/lib/python3.9/site-packages/awswrangler/s3/_merge_upsert_table.py in merge_upsert_table(delta_df, database, table, primary_key, boto3_session) 113 existing_df = wr.s3.read_parquet_table(database=database, table=table, boto3_session=boto3_session) 114 # Check if data quality inside dataframes to be merged are sufficient --> 115 if _is_data_quality_sufficient(existing_df=existing_df, delta_df=delta_df, primary_key=primary_key): 116 # If data quality is sufficient then merge upsert the table 117 _update_existing_table( ~/opt/anaconda3/lib/python3.9/site-packages/awswrangler/s3/_merge_upsert_table.py in _is_data_quality_sufficient(existing_df, delta_df, primary_key) 68 if len(error_messages) > 0: 69 _logger.info("error_messages %s", error_messages) ---> 70 raise FailedQualityCheck("Data quality is insufficient to allow a merge. Please check errors above.") 71 return True 72 FailedQualityCheck: Data quality is insufficient to allow a merge. Please check errors above.
jaidisido commented 2 years ago

Hi @alwem82, I want to first clarify something which is that the row level upserts are currently not supported for Glue governed tables. My understanding is that they are on the Glue roadmap but not yet available. So you won't be able to use the merge_upsert_table method on a governed table.

That being said, the above error is more generic. It's triggered by this method which checks if there are any discrepancies between the delta df and the one you are attempting to upsert into. So one way to check the issue is if you could compare the two dataframes to understand what caused the method to raise the error