Shopify / ghostferry

The swiss army knife of live data migrations
https://shopify.github.io/ghostferry
MIT License
748 stars 70 forks source link

Running Ghostferry with the source database being a replica #19

Closed shuhaowu closed 6 years ago

shuhaowu commented 6 years ago

Running Ghostferry with the source database as a replica is subject to a race condition where when we stop the binlog streamer (and start the cutover stage), pending writes on the source database master might not have propagated to the binlogs of the replicas. Since Ghostferry has no idea about these upstream servers, it could miss writes and thus cause data corruption.

I'm not quite sure yet if we should integrate something that checks the upstream binlog position matches the replica's binlog position direct into ghostferry.Ferry. However, we can make it an API that we provide as a part of library and integrate it into copydb.

@BoGs @pushrax @hkdsun

Note this is an issue with the master as well if sync_binlog != 1. In these cases we recommend that you call FLUSH BINARY LOGS, which I assume will flush all pending writes to disk as it closes the current binary log file and open a new one with a separate file name. We've never tested this scenario to my knowledge.

hkdsun commented 6 years ago

There is still some follow-up work to be done to make this API nicer but I think we can consider this complete