Running Ghostferry with the source database as a replica is subject to a race condition where when we stop the binlog streamer (and start the cutover stage), pending writes on the source database master might not have propagated to the binlogs of the replicas. Since Ghostferry has no idea about these upstream servers, it could miss writes and thus cause data corruption.
I'm not quite sure yet if we should integrate something that checks the upstream binlog position matches the replica's binlog position direct into ghostferry.Ferry. However, we can make it an API that we provide as a part of library and integrate it into copydb.
@BoGs @pushrax @hkdsun
Note this is an issue with the master as well if sync_binlog != 1. In these cases we recommend that you call FLUSH BINARY LOGS, which I assume will flush all pending writes to disk as it closes the current binary log file and open a new one with a separate file name. We've never tested this scenario to my knowledge.
Running Ghostferry with the source database as a replica is subject to a race condition where when we stop the binlog streamer (and start the cutover stage), pending writes on the source database master might not have propagated to the binlogs of the replicas. Since Ghostferry has no idea about these upstream servers, it could miss writes and thus cause data corruption.
I'm not quite sure yet if we should integrate something that checks the upstream binlog position matches the replica's binlog position direct into
ghostferry.Ferry
. However, we can make it an API that we provide as a part of library and integrate it intocopydb
.@BoGs @pushrax @hkdsun
Note this is an issue with the master as well if
sync_binlog != 1
. In these cases we recommend that you callFLUSH BINARY LOGS
, which I assume will flush all pending writes to disk as it closes the current binary log file and open a new one with a separate file name. We've never tested this scenario to my knowledge.