Open felipepessoto opened 1 year ago
We just ran into this very bug using Delta 2.2.0 / 2.4.0. I really think that this is something that should be fixed.
Our workaround is to include the update-condition into the merge-condition as well.
For us, this reduced execution time from 55 minutes to 10 minutes.
@johanl-db can you tell if this issue is covered by #1827?
@keen85 If your merge only contains whenMatched
update/delete clauses (and no whenNotMatched
or whenNotMatchedBySource
clauses) then it will benefit from https://github.com/delta-io/delta/pull/1851 (part of a series of improvements in https://github.com/delta-io/delta/issues/1827). You will need to upgrade to Delta 3.0 or above to benefit from that change.
thanks a lot @johanl-db.
Should this issue then be marked as resolved?
Feature request
Which Delta project/connector is this regarding?
Overview
The method findTouchedFiles in MergeIntoCommand only filter files by the condition (ON clause). Rewriting the entire table even when the match clause is false.
Motivation
This is a big problem when you merge two big tables and match clause is mostly false, but ON clause matches most of the target table, like the example below.
Further details
Observed results
numTargetRowsCopied -> 1000 numOutputRows -> 1000 numTargetFilesRemoved -> 1000 numTargetFilesAdded -> 1000
Expected results
numTargetRowsCopied -> 0 numOutputRows -> 0 numTargetFilesRemoved -> 0 numTargetFilesAdded -> 0
Further details
Environment information
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?