Resync is becoming an increasingly utilized feature. Users of our connectors, particularly Clickhouse, sometimes wish to make small table definition changes on target or source and then want to repopulate data. This is alongside the usual recovery use-cases of resync.
Failure points
RenameTables (or resync in general) currently can fail in the following ways:
If, between original mirror kick off and resync, a column was added to the table and a row not inserted after, then that column would not have been added to the target via schema changes. In this case, upon resync, the _resync table and the original table have different schemas, causing the soft-delete transfer step to fail. This can then lead to:
It processes some tables but not all and hits a failure - upon retrying it tries to resync the first table again but the _resynctable of it is dropped since it succeeded before.
Resyncing again midway through a resync (initial load) can result in duplicate data in the _resync table if initial load of that _resync table was done already.
Fixes
In light of these scenarios, this PR puts in place the following guards:
Perform CREATE OR REPLACE in SetupNormalize() if it is being called in a resync
If the _resync table does not exist, skip rename for the table
If the original target table does not exist, skip just the soft-delete transfer step
TODO:
[x] Functionally test the failure points mentioned above for the warehouse peers.
[x] Also implement clearing of stats after resync : Done by #2029
Overview
Resync is becoming an increasingly utilized feature. Users of our connectors, particularly Clickhouse, sometimes wish to make small table definition changes on target or source and then want to repopulate data. This is alongside the usual recovery use-cases of resync.
Failure points
RenameTables
(or resync in general) currently can fail in the following ways:_resync
table and the original table have different schemas, causing the soft-delete transfer step to fail. This can then lead to:_resync
table of it is dropped since it succeeded before._resync
table if initial load of that_resync
table was done already.Fixes
In light of these scenarios, this PR puts in place the following guards:
CREATE OR REPLACE
inSetupNormalize()
if it is being called in a resync_resync
table does not exist, skip rename for the tableTODO: