Closed sunxiayi closed 1 month ago
Hi @sunxiayi!
Thank you for your pull request and welcome to our community.
In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.
In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.
Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed
. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.
If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!
I am sorry for a partial review, but a full review depends on how some of the current comments are going to be resolved.
What is the plan with the two sets of binlog positions, once the mismatch is fixed or its absence is confirmed?
in the client side, we can reset replica and set global gtid_purged so it can resume replication from that gtid.
sunxiayi @.***> writes:
What is the plan with the two sets of binlog positions, once the mismatch is fixed or its absence is confirmed?
in the client side, we can reset replica and set global gtid_purged so it can resume replication from that gtid.
Right, but I am asking specifically about the two sets of binlog positions that should be equal: one from pfs.log_status and one from the binlog itself.
sunxiayi @.**> writes: What is the plan with the two sets of binlog positions, once the mismatch is fixed or its absence is confirmed? in the client side, we can reset replica and set global gtid_purged so it can resume replication from that gtid. Right, but I am asking specifically about the two sets* of binlog positions that should be equal: one from pfs.log_status and one from the binlog itself.
If the two sets match, we would use that gtid to resume replication in client. If they are not the same, we would abort the clone, log the mismatch details, then make plans to fix the bug.
@sunxiayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Summary:
Summary
When synchronizing engines, get the synchronization coordinates from P_S.log_status table, send from the server plugin to client plugin. Upon receiving, client plugin writes to a file
#clone/#synchronization_coordinates
.Approach
Protocol Add a new response type
COM_RES_GTID_V4
and update latest protocol toCLONE_PROTOCOL_VERSION_V4
.Common
synchronize_engines()
call was inHa_clone_common_cbk
. But it actually is only called from server or local, not client. So I move this function into the child class(client, server, local, with client implementation is a no-op). The reason I make this change is because, I want to send the gtid from the plugin layer without calling into engine again, and movingsynchronize_engines()
intoServer_Cbk
has the advantage that I can get the server handler, while inHa_clone_common_cbk
I cannot.The log_status query and set_log_stop steps are moved into a new function
synchronize_logs
underHa_clone_common_cbk
.populate_synchronization_coordinates
would populate a Key_Values data structure of:We want to record both gtid from log_status and from binlog_file/offset because they seem to be out of sync in prod, and need a way to confirm this.
Client
synchronize_engines()
is a no-op and errors out upon calling.Upon receiving
COM_RES_GTID_V4
, deserialize the coordinates and persist that in#clone/#synchronization_coordinates
.Server
synchronize_engines()
would callsynchronize_logs
and get the server handle to send the coordinates one by one, utilizing existing helper functions.Local
synchronize_engines()
would callsynchronize_logs
, then get the client handle to persist coordinates in#clone/#synchronization_coordinates
.Handle version mismatch In server, only send coordinate if negotiated version >= V4. In client, #synchronization_coordinates file is cleaned upon start of clone.
Test Plan:
MTR
local_create_synchronization_coordinates
remote_create_synchronization_coordinates
Local clone
Test on my debug build in devserver after
install plugin clone SONAME 'mysql_clone.so';
: 1/mysql> CLONE LOCAL DATA DIRECTORY = '/home/sunxiayi/mysql/mysql-fork/_build-8.0-Debug/mysql-test/var/tmp/mysqld.1/data_new';
2/ Check synchronization coordinate from log_status is the same as in file:Remote clone, normal
1/ Take udb35350.ftw5:3301 as the test server, install my debug build. Take udb12221.atn5:3301 as the client server, install my debug build. Install plugin on both servers. 2/ On udb12221.atn5:3301, do
3/ Check the new file is written correctly 4/ Repeat the clone, file is overwritten correctly.
Remote clone, sev scenario, apply-logs
Issue a clone command using instance whose mysql.gtid_executed table has hole as donor in the raft world. Also this is copying from secondary to secondary meaning binlog is an apply-logs. Check file on recipient is correct.
Remote clone, donor and client version mismatch
Only update recipient. Clone finishes, file is not there.
Only update donor. Clone finishes, file is not there.
Update both and do a successful clone. Then only update client. File is not there.
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: https://phabricator.intern.facebook.com/D55614528