awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 300 forks source link

GlueContext.purgeTable does not work for Governed Tables #129

Closed tjtoll closed 2 years ago

tjtoll commented 2 years ago

I have tried to include a purge_table command within a governed table transaction but I receive the below error. I've tried adding a transactionId param to the purge_table command directly or within the options dict and neither has worked.

com.amazonaws.services.glue.model.InvalidInputException: Specify Either Transaction Id or Query AsOf Time (Service: AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID: 12935b4e-5b84-47b4-9c8b-12ff59a8c916; Proxy: null)

Here is what I tried:

dest_path = <s3 path>
db = <database name>
tbl = <table name>

tx_id = glue_context.start_transaction(False)

sink = glue_context.getSink(
    connection_type="s3", path=dest_path,
    enableUpdateCatalog=True,
    transactionId=tx_id
)
sink.setFormat("glueparquet")
sink.setCatalogInfo(
    catalogDatabase=db, catalogTableName=tbl
)

try:
    glue_context.purge_table(db, tbl,
    options={'transactionId':tx_id})
    sink.writeFrame(glue_data_frame)
    glue_context.commit_transaction(tx_id)
except Exception:
    glue_context.cancel_transaction(tx_id)
    raise
job.commit()
moomindani commented 2 years ago

Thank you for reporting this issue. Currently Governed tables do not support this PurgeTable operations.

Instead, you need to make UpdateTableObjects calls to delete the objects.