Shopify / ghostferry

The swiss army knife of live data migrations
https://shopify.github.io/ghostferry
MIT License
694 stars 65 forks source link

Fix BinlogStreamer for TargetVerifier by correctly handling rewrites #258

Closed fjordan closed 3 years ago

fjordan commented 3 years ago

Overview

Fixes an issue in the TargetVerifier if there happened to be any configured schema or table rewrites.

This patch correctly handles any configured rewrites by reversing the configurations to lookup DML events using the Source TableSchemaCache from DML events streamed on the Target.

Note that If there are duplicate values in the original rewrites, an error will be raised and Target Verification will not be able to be used as Ghostferry will not be able to correctly map the DML back to the Source.

Details

Previous to this fix, if running Ghostferry with schema or table rewrites, it would fail to inspect binlog events on the Target, and, as a result, fail to update its position in the StateTracker.

As you can see in the snippet below, the BinlogStreamer in the TargetVerifier would silently return from any events that did not exist in the TableSchema, which is correct, because Ghostferry should only ever care about the schema and tables for which it is configured:

https://github.com/Shopify/ghostferry/blob/203cd1a16ef38300638770b1c9fcc7fa6e230ab0/binlog_streamer.go#L353-L356

However, because the TargetVerifier may be running against a Target with configured rewrites (differently named databases or tables), these lookups must either:

  1. Be mapped back to the original TableSchema of the Source or
  2. Be mapped to another TableSchema with the configured rewrites applied.

I originally attempted the second approach, but abandoned the idea after failing to track down some supposed side effects that were occurring on the f.Tables (the Source TableSchemaCache) within an allotted amount of time. Eventually, I was able to get a working solution to the first approach, which is this PR.

Other considerations

We should consider rethinking the TableSchema and TableSchemaCache components and how they are used within the Ferry. The abstractions are difficult to work with and made this patch a bit more difficult than it should have been.

Performing the lookups and applying the rewrites to the DML events in all the various places (BatchWriter, InlineVerifier, BinlogWriter, and now BinlogStreamer) is a bit of a duplicated effort. It may make more sense to apply the rewrites to the applicable TableSchemas and TableSchemaCaches once and then hand them off to their respective components.

cc @Shopify/db-engineering