ClickHouse / clickhouse-kafka-connect

ClickHouse Kafka Connector
Apache License 2.0
147 stars 40 forks source link

Support Delete mode #31

Open mshustov opened 1 year ago

mshustov commented 1 year ago

@cwurm requested support for tombstone messages to delete records from the storage. The use case is essential for customers sharing the same message pipeline among several DB and expecting records to be removed from every destination.

Depends on lightweight deletes https://github.com/ClickHouse/ClickHouse/pull/42126

mlivirov commented 9 months ago

I've spent some time figuring out on how to handle this scenario until this feature is yet to be available and just wanna share my findings.

For those who uses debezium to read data from the source database there is an SMT available that adds a field "__deleted" for deleted records.

See for details: https://debezium.io/documentation/reference/stable/transformations/event-flattening.html

This field can be stored along with another fields into ReplacingMergeTree table.

After this a TTL logic can be added to the table which will vacuum clean all the records which are marked for deletion like following:

alter table database.table modify TTL timestamp + interval 1 hour where __deleted = 'true'