Closed lymedo closed 1 month ago
In your first code sample, are you expecting whenNotMatchedInsert
to be executed for rows where the condition above it is not met?
.whenMatchedUpdate(
condition="source.row_hash <> target.row_hash AND target.row_is_current == 1",
In my understanding the whenNotMatchedInsert
would only be executed if the condition in the merge is not met:
.merge(
df_merge.alias('source'),
f"source.reference_code = target.reference_code"
)
That is why your second code sample works. I'm assuming f"source.reference_code = target.reference_code"
always match
I've realised what I need to make this work as expected. The source DataFrame requires two rows for the changed record. One with a merge key matching the reference_code and one with a merge key as NULL.
Bug
Describe the problem
I've followed the SCD2 pattern in the docs but the changed records are not being inserted as part of the merge and only the existing record in the target table is being updated. New records are being inserted as expected.
Am I misunderstanding the behaviour?
Steps to reproduce
Observed results
The effective to date on record A is updated as expected. Record B is inserted. Updated version of record A is not inserted.
It only works if I add a second merge:
Expected results
I was expecting to see original record A updated and new records A & B inserted.
Further details
Environment information
Databricks runtime 15.4LTS
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?