Running Ghostferry with the source database being a replica

Running Ghostferry with the source database as a replica is subject to a race condition where when we stop the binlog streamer (and start the cutover stage), pending writes on the source database master might not have propagated to the binlogs of the replicas. Since Ghostferry has no idea about these upstream servers, it could miss writes and thus cause data corruption.

I'm not quite sure yet if we should integrate something that checks the upstream binlog position matches the replica's binlog position direct into ghostferry.Ferry. However, we can make it an API that we provide as a part of library and integrate it into copydb.

@BoGs @pushrax @hkdsun

Note this is an issue with the master as well if sync_binlog != 1. In these cases we recommend that you call FLUSH BINARY LOGS, which I assume will flush all pending writes to disk as it closes the current binary log file and open a new one with a separate file name. We've never tested this scenario to my knowledge.

Shopify / ghostferry

Running Ghostferry with the source database being a replica #19