ClickHouse / dbt-clickhouse

The Clickhouse plugin for dbt (data build tool)
Apache License 2.0
253 stars 113 forks source link

Skip lightweight delete when there is no data to delete #370

Open Magicbeanbuyer opened 1 month ago

Magicbeanbuyer commented 1 month ago

Summary

ClickHouse's lightweight delete function, which utilizes the mutation ALTER TABLE table UPDATE _row_exists = 0 WHERE condition, indiscriminately mutates all data parts of a given table. This occurs regardless of whether the data parts contain rows that need to be deleted or not, as detailed in this issue.

Furthermore, when lightweight delete operations and regular merges are performed on the same table concurrently, the lightweight delete operation is forced to wait for the merge to complete before it can proceed. This leads to unnecessary delays when there are no rows to delete, rendering the DELETE FROM... operation wasteful.

Therefore, I propose implementing a preliminary check to determine if data in the table meets the deletion criteria before initiating a lightweight delete. If no matching data is found, the delete operation should be skipped, optimizing resource usage and improving overall efficiency.

Checklist

Delete items not relevant to your PR:

CLAassistant commented 1 month ago

CLA assistant check
All committers have signed the CLA.

BentsiLeviav commented 3 weeks ago

Hi again

Thank you for your contribution! Before reviewing your PR, it is required to add a short description with a link to this PR the the changelog (please keep the current format we have in the Changelog file).