An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
I'm working on a Change Data Capture and my goal is to replicate data from a parquet into a Delta table by making the required inserts, updates and deletes. I followed the tutorial in https://docs.delta.io/latest/delta-update.html#write-change-data-into-a-delta-table but for some reason the delta table ends-up not applying any updates or deletes.
The problem
Hello,
I'm working on a Change Data Capture and my goal is to replicate data from a parquet into a Delta table by making the required inserts, updates and deletes. I followed the tutorial in https://docs.delta.io/latest/delta-update.html#write-change-data-into-a-delta-table but for some reason the delta table ends-up not applying any updates or deletes.
The code
I'm using 0.7.0 and the following spark configs.
Reading a parquet from S3 and adding a delete column. The Op column contains U,D or I, meaning, update, delete or insert.
Creating the Delta table and saving on S3.
Creating 2 auxiliaries DataFrames.
Creating the unionDf containing the I, D and last U.
This DataFrame contains the following testing registers
And other columns and registers that is not convenient to show here.
Reading the Delta Table which I just saved on S3
Executing the Merge
I tried to use dataframe.Op = 'D' instead of dataframe.value.deleted = true, but the results were the same.
Reading and printing the deltaTable that I just used.
My testing registers in this Delta Table end-up like this
And this visualization
like this
Meaning that, it only inserts registers, it doesn't update neither deletes the record