PeerDB-io / peerdb

Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
https://peerdb.io
Other
2.17k stars 88 forks source link

clickhouse: fix normalizing updates to primary key #2113

Closed serprex closed 20 hours ago

serprex commented 1 day ago

when primary key updated row becomes duplicate because old version isn't marked outdated, generate deletion records behind updates

disabled by default due to perf impact, can be enabled with PEERDB_CLICKHOUSE_ENABLE_PRIMARY_UPDATE

if custom ordering is on a non replica identity column then row data may be missing, causing this to fail to prevent duplication

serprex commented 1 day ago

Given updating primary keys is quite rare (difficult to do with pg if you use foreign keys) it may be better to have this disabled by default so that updates aren't generating twice as many rows on destination