Fixes an issue in the TargetVerifier if there happened to be any configured schema or table rewrites.
This patch correctly handles any configured rewrites by reversing the configurations to lookup DML events using the Source TableSchemaCache from DML events streamed on the Target.
Note that If there are duplicate values in the original rewrites, an error will be raised and Target Verification will not be able to be used as Ghostferry will not be able to correctly map the DML back to the Source.
Details
Previous to this fix, if running Ghostferry with schema or table rewrites, it would fail to inspect binlog events on the Target, and, as a result, fail to update its position in the StateTracker.
As you can see in the snippet below, the BinlogStreamer in the TargetVerifier would silently return from any events that did not exist in the TableSchema, which is correct, because Ghostferry should only ever care about the schema and tables for which it is configured:
However, because the TargetVerifier may be running against a Target with configured rewrites (differently named databases or tables), these lookups must either:
Be mapped back to the original TableSchema of the Source or
Be mapped to another TableSchema with the configured rewrites applied.
I originally attempted the second approach, but abandoned the idea after failing to track down some supposed side effects that were occurring on the f.Tables (the Source TableSchemaCache) within an allotted amount of time. Eventually, I was able to get a working solution to the first approach, which is this PR.
Other considerations
We should consider rethinking the TableSchema and TableSchemaCache components and how they are used within the Ferry. The abstractions are difficult to work with and made this patch a bit more difficult than it should have been.
Performing the lookups and applying the rewrites to the DML events in all the various places (BatchWriter, InlineVerifier, BinlogWriter, and now BinlogStreamer) is a bit of a duplicated effort. It may make more sense to apply the rewrites to the applicable TableSchemas and TableSchemaCaches once and then hand them off to their respective components.
Overview
Fixes an issue in the
TargetVerifier
if there happened to be any configured schema or table rewrites.This patch correctly handles any configured rewrites by reversing the configurations to lookup DML events using the Source
TableSchemaCache
from DML events streamed on the Target.Note that If there are duplicate values in the original rewrites, an error will be raised and Target Verification will not be able to be used as Ghostferry will not be able to correctly map the DML back to the Source.
Details
Previous to this fix, if running Ghostferry with schema or table rewrites, it would fail to inspect binlog events on the Target, and, as a result, fail to update its position in the
StateTracker
.As you can see in the snippet below, the
BinlogStreamer
in theTargetVerifier
would silently return from any events that did not exist in theTableSchema
, which is correct, because Ghostferry should only ever care about the schema and tables for which it is configured:https://github.com/Shopify/ghostferry/blob/203cd1a16ef38300638770b1c9fcc7fa6e230ab0/binlog_streamer.go#L353-L356
However, because the
TargetVerifier
may be running against a Target with configured rewrites (differently named databases or tables), these lookups must either:TableSchema
of the Source orTableSchema
with the configured rewrites applied.I originally attempted the second approach, but abandoned the idea after failing to track down some supposed side effects that were occurring on the
f.Tables
(the SourceTableSchemaCache
) within an allotted amount of time. Eventually, I was able to get a working solution to the first approach, which is this PR.Other considerations
We should consider rethinking the
TableSchema
andTableSchemaCache
components and how they are used within the Ferry. The abstractions are difficult to work with and made this patch a bit more difficult than it should have been.Performing the lookups and applying the rewrites to the DML events in all the various places (
BatchWriter
,InlineVerifier
,BinlogWriter
, and nowBinlogStreamer
) is a bit of a duplicated effort. It may make more sense to apply the rewrites to the applicableTableSchema
s andTableSchemaCache
s once and then hand them off to their respective components.cc @Shopify/db-engineering