github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.39k stars 1.26k forks source link

Changes to the migrated tables are not applied to the ghost table #847

Closed eric239 closed 4 years ago

eric239 commented 4 years ago

This is a question / asking for help / an advice.

I'm probably missing something obvious but can't nail it and will greatly appreciate some assistance from the community...

The problem, as I observe it, is that while gh-ost happily copies all data as it was at the start of the migration, it does not take new changes, so something is screwed up with the binlog. Basically I'm running gh-ost with postpone-cut-over and letting it finish with the main copy, then running a single DML stmt (same behavior with insert or update) against the original table, seeing it reflected in the original table on the replica, but not seeing it in the ghost table. Actually allowing the cut-over to happen (obviously you can deduce I'm running this against a test schema) results in the new table missing the change.

I am running slightly tweaked (*) the latest master of gh-ost (built locally) against RDS setup. I know for sure master-to-replica replication is happening (and the lag is low) and I do know for sure the replica itself is setup correctly to expose binlog in ROW mode. In fact, we're running an ETL tool Fivetran which does something very similar to gh-ost in terms of hooking up to binlog stream of a replica, and that tool works fine.

(*) The slight tweak disables bailing out on triggers -- I intend to deal with them separately but as far as I can tell the presence of a trigger on the migrated table should not impact the process.

Here's the command line:

./gh-ost \
  --user ghost \
  --password xxx \
  --host replica_host \
  --initially-drop-ghost-table \
  --assume-master-host master_host \
  --database test_schema \
  --assume-rbr \
  --max-lag-millis 2000 \
  --chunk-size 1000 \
  --postpone-cut-over-flag-file /tmp/postpone-cut-over.lock \
  --table=test_table \
  --alter='add column test_col int, add column test_col1 int default 123' \
  --execute \
  --verbose

The output appears to be normal:

2020-05-26 23:36:36 INFO starting gh-ost d3c93cff30ed09bf4dfb000b5c64a52bfcbe31b7
2020-05-26 23:36:36 INFO Migrating `test_schema`.`test_table`
2020-05-26 23:36:36 INFO connection validated on replica_host:3306
2020-05-26 23:36:36 INFO User has REPLICATION CLIENT, REPLICATION SLAVE privileges, and has ALL privileges on `test_schema`.*
2020-05-26 23:36:36 INFO binary logs validated on replica_host:3306
2020-05-26 23:36:36 INFO Inspector initiated on ip-10-2-1-144:3306, version 5.7.26-log
2020-05-26 23:36:36 INFO Table found. Engine=InnoDB
2020-05-26 23:36:36 INFO Estimated number of rows via EXPLAIN: 12809304
2020-05-26 23:36:36 INFO Master forced to be master_host:3306
2020-05-26 23:36:36 INFO log_slave_updates validated on replica_host:3306
2020-05-26 23:36:36 INFO connection validated on replica_host:3306
2020-05-26 23:36:36 INFO Connecting binlog streamer at mysql-bin-changelog.001933:11998917
[2020/05/26 23:36:36] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql replica_host 3306 ghost    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/05/26 23:36:36] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin-changelog.001933, 11998917)
[2020/05/26 23:36:36] [info] binlogsyncer.go:203 register slave for master server replica_host:3306
[2020/05/26 23:36:36] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.001933, 11998917)
2020-05-26 23:36:36 INFO rotate to next log from mysql-bin-changelog.001933:0 to mysql-bin-changelog.001933
2020-05-26 23:36:36 INFO connection validated on master_host:3306
2020-05-26 23:36:36 INFO connection validated on master_host:3306
2020-05-26 23:36:36 INFO will use time_zone='UTC' on applier
2020-05-26 23:36:36 INFO Examining table structure on applier
2020-05-26 23:36:36 INFO Applier initiated on ip-10-2-0-216:3306, version 5.6.44-log
2020-05-26 23:36:36 INFO Dropping table `test_schema`.`_test_table_gho`
2020-05-26 23:36:36 INFO Table dropped
2020-05-26 23:36:36 INFO Dropping table `test_schema`.`_test_table_ghc`
2020-05-26 23:36:36 INFO Table dropped
2020-05-26 23:36:36 INFO Creating changelog table `test_schema`.`_test_table_ghc`
2020-05-26 23:36:36 INFO Changelog table created
2020-05-26 23:36:36 INFO Creating ghost table `test_schema`.`_test_table_gho`
2020-05-26 23:36:36 INFO Ghost table created
2020-05-26 23:36:36 INFO Altering ghost table `test_schema`.`_test_table_gho`
2020-05-26 23:36:36 INFO Ghost table altered
2020-05-26 23:36:36 INFO Created postpone-cut-over-flag-file: /tmp/postpone-cut-over.lock
2020-05-26 23:36:36 INFO Waiting for ghost table to be migrated. Current lag is 0s
2020-05-26 23:36:37 INFO Intercepted changelog state GhostTableMigrated
2020-05-26 23:36:37 INFO Handled changelog state GhostTableMigrated
2020-05-26 23:36:37 INFO Chosen shared unique key is PRIMARY
2020-05-26 23:36:37 INFO Shared columns are id,company_id,return_code
2020-05-26 23:36:37 INFO Listening on unix socket file: /tmp/gh-ost.test_schema.test_table.sock
2020-05-26 23:36:37 INFO Migration min values: [1022]
2020-05-26 23:36:37 INFO Migration max values: [13090632]
2020-05-26 23:36:37 INFO Waiting for first throttle metrics to be collected
2020-05-26 23:36:37 ERROR parsing time "" as "2006-01-02T15:04:05.999999999Z07:00": cannot parse "" as "2006"
2020-05-26 23:36:37 INFO First throttle metrics collected
# Migrating `test_schema`.`test_table`; Ghost table is `test_schema`.`_test_table_gho`
# Migrating ip-10-2-0-216:3306; inspecting ip-10-2-1-144:3306; executing on automation
# Migration started at Tue May 26 23:36:36 +0000 2020
# chunk-size: 1000; max-lag-millis: 2000ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/postpone-cut-over.lock [set]
# Serving on unix socket: /tmp/gh-ost.test_schema.test_table.sock
Copy: 0/12809304 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin-changelog.001933:12025560; Lag: 0.00s, State: migrating; ETA: N/A
-- [Skipping some]
Copy: 2060000/12809304 16.1%; Applied: 0; Backlog: 0/1000; Time: 38s(total), 38s(copy); streamer: mysql-bin-changelog.001933:129421238; Lag: 0.04s, State: migrating; ETA: 3m18s
Copy: 2127000/12809304 16.6%; Applied: 0; Backlog: 0/1000; Time: 39s(total), 39s(copy); streamer: mysql-bin-changelog.001933:133358057; Lag: 0.04s, State: migrating; ETA: 3m15s
[2020/05/26 23:37:16] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.001934, 4)
[2020/05/26 23:37:16] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.001934, 4)
2020-05-26 23:37:16 INFO rotate to next log from mysql-bin-changelog.001934:134218542 to mysql-bin-changelog.001934
2020-05-26 23:37:16 INFO rotate to next log from mysql-bin-changelog.001934:0 to mysql-bin-changelog.001934
Copy: 2195000/12809304 17.1%; Applied: 0; Backlog: 0/1000; Time: 40s(total), 40s(copy); streamer: mysql-bin-changelog.001934:2961566; Lag: 0.04s, State: migrating; ETA: 3m13s
-- [Skipping some]
Copy: 12759000/12809304 99.6%; Applied: 0; Backlog: 0/1000; Time: 3m23s(total), 3m23s(copy); streamer: mysql-bin-changelog.001938:66782594; Lag: 0.04s, State: migrating; ETA: 0s
[2020/05/26 23:40:00] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.001939, 4)
[2020/05/26 23:40:00] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.001939, 4)
2020-05-26 23:40:00 INFO rotate to next log from mysql-bin-changelog.001939:66782651 to mysql-bin-changelog.001939
2020-05-26 23:40:00 INFO rotate to next log from mysql-bin-changelog.001939:0 to mysql-bin-changelog.001939
Copy: 12808000/12809304 100.0%; Applied: 0; Backlog: 0/1000; Time: 3m24s(total), 3m24s(copy); streamer: mysql-bin-changelog.001939:1855075; Lag: 0.44s, State: migrating; ETA: 0s
Copy: 13051000/12809304 101.9%; Applied: 0; Backlog: 0/1000; Time: 3m30s(total), 3m30s(copy); streamer: mysql-bin-changelog.001939:14181748; Lag: 3.14s, State: throttled, lag=3.041316s; ETA: due
2020-05-26 23:40:21 INFO Row copy complete
Copy: 13086785/13086785 100.0%; Applied: 0; Backlog: 0/1000; Time: 3m44s(total), 3m44s(copy); streamer: mysql-bin-changelog.001939:20943578; Lag: 0.54s, State: migrating; ETA: due
Copy: 13086785/13086785 100.0%; Applied: 0; Backlog: 0/1000; Time: 3m45s(total), 3m44s(copy); streamer: mysql-bin-changelog.001939:22848326; Lag: 0.04s, State: postponing cut-over; ETA: due
tomkrouper commented 4 years ago

Could you re-run with the -debug option and comment with the output please?

eric239 commented 4 years ago

@tomkrouper thanks for checking in. Here's my very simple setup

create table test_table (
    id int AUTO_INCREMENT PRIMARY KEY , 
    name varchar(20), 
    updated_at timestamp ON UPDATE now()
);
insert into test_table (name) VALUE ('John');

here's the initial log from running gh-ost (options as described above + --debug); db is the master host alias, db-ro is the replica host alias.

2020-05-27 19:57:27 INFO starting gh-ost d3c93cff30ed09bf4dfb000b5c64a52bfcbe31b7
2020-05-27 19:57:27 INFO Migrating `test_schema`.`test_table`
2020-05-27 19:57:27 INFO connection validated on db-ro:3306
2020-05-27 19:57:27 INFO User has REPLICATION CLIENT, REPLICATION SLAVE privileges, and has ALL privileges on `test_schema`.*
2020-05-27 19:57:27 INFO binary logs validated on db-ro:3306
2020-05-27 19:57:27 INFO Inspector initiated on ip-10-2-1-144:3306, version 5.7.26-log
2020-05-27 19:57:27 INFO Table found. Engine=InnoDB
2020-05-27 19:57:27 DEBUG Estimated number of rows via STATUS: 1
2020-05-27 19:57:27 DEBUG Validated no foreign keys exist on table
2020-05-27 19:57:27 DEBUG Validated no triggers exist on table
2020-05-27 19:57:27 INFO Estimated number of rows via EXPLAIN: 1
2020-05-27 19:57:27 DEBUG Potential unique keys in test_table: [PRIMARY (auto_increment): [id]; has nullable: false]
2020-05-27 19:57:27 INFO Master forced to be db:3306
2020-05-27 19:57:27 INFO log_slave_updates validated on db-ro:3306
2020-05-27 19:57:27 INFO connection validated on db-ro:3306
2020-05-27 19:57:27 DEBUG Streamer binlog coordinates: mysql-bin-changelog.002184:19548168
2020-05-27 19:57:27 INFO Connecting binlog streamer at mysql-bin-changelog.002184:19548168
[2020/05/27 19:57:27] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql db-ro 3306 ghost    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/05/27 19:57:27] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin-changelog.002184, 19548168)
[2020/05/27 19:57:27] [info] binlogsyncer.go:203 register slave for master server db-ro:3306
2020-05-27 19:57:27 DEBUG Beginning streaming
[2020/05/27 19:57:27] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.002184, 19548168)
2020-05-27 19:57:27 INFO rotate to next log from mysql-bin-changelog.002184:0 to mysql-bin-changelog.002184
2020-05-27 19:57:27 INFO connection validated on db:3306
2020-05-27 19:57:27 INFO connection validated on db:3306
2020-05-27 19:57:27 INFO will use time_zone='UTC' on applier
2020-05-27 19:57:27 INFO Examining table structure on applier
2020-05-27 19:57:27 INFO Applier initiated on ip-10-2-0-216:3306, version 5.6.44-log
2020-05-27 19:57:27 INFO Dropping table `test_schema`.`_test_table_gho`
2020-05-27 19:57:27 INFO Table dropped
2020-05-27 19:57:27 INFO Dropping table `test_schema`.`_test_table_ghc`
2020-05-27 19:57:27 INFO Table dropped
2020-05-27 19:57:27 INFO Creating changelog table `test_schema`.`_test_table_ghc`
2020-05-27 19:57:27 INFO Changelog table created
2020-05-27 19:57:27 INFO Creating ghost table `test_schema`.`_test_table_gho`
2020-05-27 19:57:27 INFO Ghost table created
2020-05-27 19:57:27 INFO Altering ghost table `test_schema`.`_test_table_gho`
2020-05-27 19:57:27 DEBUG ALTER statement: alter /* gh-ost */ table `test_schema`.`_test_table_gho` add column test_col1 int default 123
2020-05-27 19:57:27 INFO Ghost table altered
2020-05-27 19:57:27 INFO Created postpone-cut-over-flag-file: /tmp/postpone-cut-over.lock
2020-05-27 19:57:27 INFO Waiting for ghost table to be migrated. Current lag is 0s
2020-05-27 19:57:27 INFO Intercepted changelog state GhostTableMigrated
2020-05-27 19:57:27 INFO Handled changelog state GhostTableMigrated
2020-05-27 19:57:27 DEBUG ghost table migrated
2020-05-27 19:57:27 DEBUG Potential unique keys in _test_table_gho: [PRIMARY (auto_increment): [id]; has nullable: false]
2020-05-27 19:57:27 INFO Chosen shared unique key is PRIMARY
2020-05-27 19:57:27 INFO Shared columns are id,name,updated_at
2020-05-27 19:57:27 INFO Listening on unix socket file: /tmp/gh-ost.test_schema.test_table.sock
2020-05-27 19:57:27 DEBUG Reading migration range according to key: PRIMARY
2020-05-27 19:57:27 INFO Migration min values: [1]
2020-05-27 19:57:27 DEBUG Reading migration range according to key: PRIMARY
2020-05-27 19:57:27 INFO Migration max values: [1]
2020-05-27 19:57:27 INFO Waiting for first throttle metrics to be collected
2020-05-27 19:57:27 INFO First throttle metrics collected
2020-05-27 19:57:27 DEBUG Operating until row copy is complete
# Migrating `test_schema`.`test_table`; Ghost table is `test_schema`.`_test_table_gho`
# Migrating ip-10-2-0-216:3306; inspecting ip-10-2-1-144:3306; executing on automation
# Migration started at Wed May 27 19:57:27 +0000 2020
# chunk-size: 1000; max-lag-millis: 2000ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/postpone-cut-over.lock [set]
# Serving on unix socket: /tmp/gh-ost.test_schema.test_table.sock
2020-05-27 19:57:27 DEBUG Getting nothing in the write queue. Sleeping...
Copy: 0/1 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin-changelog.002184:19563250; Lag: 0.03s, State: migrating; ETA: N/A
2020-05-27 19:57:28 DEBUG Issued INSERT on range: [1]..[1]; iteration: 0; chunk-size: 1000
2020-05-27 19:57:28 DEBUG Iteration complete: no further range to iterate
2020-05-27 19:57:28 DEBUG Getting nothing in the write queue. Sleeping...
2020-05-27 19:57:28 INFO Row copy complete
Copy: 0/1 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin-changelog.002184:19642544; Lag: 0.02s, State: migrating; ETA: N/A
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin-changelog.002184:19642823; Lag: 0.02s, State: migrating; ETA: due
2020-05-27 19:57:28 DEBUG checking for cut-over postpone
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin-changelog.002184:19732409; Lag: 0.02s, State: postponing cut-over; ETA: due
2020-05-27 19:57:29 DEBUG Getting nothing in the write queue. Sleeping...

at this point, I ran the following

mysql: ghost@db [test_schema]> select * from test_table;
+----+------+------------+
| id | name | updated_at |
+----+------+------------+
|  1 | John | NULL       |
+----+------+------------+
1 row in set (0.00 sec)

mysql: ghost@db [test_schema]> select * from _test_table_gho;
+----+------+------------+-----------+
| id | name | updated_at | test_col1 |
+----+------+------------+-----------+
|  1 | John | NULL       |       123 |
+----+------+------------+-----------+
1 row in set (0.00 sec)

mysql: ghost@db [test_schema]> insert into test_table (name) values ('Tom');
Query OK, 1 row affected (0.00 sec)

mysql: ghost@db [test_schema]> commit;
Query OK, 0 rows affected (0.00 sec)

mysql: ghost@db [test_schema]> select * from test_table;
+----+------+------------+
| id | name | updated_at |
+----+------+------------+
|  1 | John | NULL       |
|  2 | Tom  | NULL       |
+----+------+------------+
2 rows in set (0.00 sec)

-- wait a few seconds...

mysql: ghost@db [test_schema]> select * from _test_table_gho;
+----+------+------------+-----------+
| id | name | updated_at | test_col1 |
+----+------+------------+-----------+
|  1 | John | NULL       |       123 |
+----+------+------------+-----------+
1 row in set (0.01 sec)

clearly the new row is not getting propagated (I rechecked in a minute just to be sure).

I completed the cut-over

2020-05-27 20:01:28 DEBUG Getting nothing in the write queue. Sleeping...
2020-05-27 20:01:28 DEBUG checking for cut-over postpone
2020-05-27 20:01:28 DEBUG checking for cut-over postpone: complete
2020-05-27 20:01:28 INFO Grabbing voluntary lock: gh-ost.2534488.lock
2020-05-27 20:01:28 INFO Setting LOCK timeout as 6 seconds
2020-05-27 20:01:28 INFO Looking for magic cut-over table
2020-05-27 20:01:28 INFO Creating magic cut-over table `test_schema`.`_test_table_del`
2020-05-27 20:01:28 INFO Magic cut-over table created
2020-05-27 20:01:28 INFO Locking `test_schema`.`test_table`, `test_schema`.`_test_table_del`
2020-05-27 20:01:28 INFO Tables locked
2020-05-27 20:01:28 INFO Session locking original & magic tables is 2534488
2020-05-27 20:01:28 INFO Writing changelog state: AllEventsUpToLockProcessed:1590609688203455267
2020-05-27 20:01:28 INFO Waiting for events up to lock
2020-05-27 20:01:28 INFO Intercepted changelog state AllEventsUpToLockProcessed
2020-05-27 20:01:28 INFO Handled changelog state AllEventsUpToLockProcessed
2020-05-27 20:01:29 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1590609688203455267
2020-05-27 20:01:29 INFO Done waiting for events up to lock; duration=929.916822ms
# Migrating `test_schema`.`test_table`; Ghost table is `test_schema`.`_test_table_gho`
# Migrating ip-10-2-0-216:3306; inspecting ip-10-2-1-144:3306; executing on automation
# Migration started at Wed May 27 19:57:27 +0000 2020
# chunk-size: 1000; max-lag-millis: 2000ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/postpone-cut-over.lock
# Serving on unix socket: /tmp/gh-ost.test_schema.test_table.sock
2020-05-27 20:01:29 DEBUG Getting nothing in the write queue. Sleeping...
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 4m1s(total), 1s(copy); streamer: mysql-bin-changelog.002185:9436329; Lag: 0.02s, State: migrating; ETA: due
2020-05-27 20:01:29 INFO Setting RENAME timeout as 3 seconds
2020-05-27 20:01:29 INFO Session renaming tables is 2534490
2020-05-27 20:01:29 INFO Issuing and expecting this to block: rename /* gh-ost */ table `test_schema`.`test_table` to `test_schema`.`_test_table_del`, `test_schema`.`_test_table_gho` to `test_schema`.`test_table`
2020-05-27 20:01:29 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2020-05-27 20:01:29 INFO Checking session lock: gh-ost.2534488.lock
2020-05-27 20:01:29 INFO Connection holding lock on original table still exists
2020-05-27 20:01:29 INFO Will now proceed to drop magic table and unlock tables
2020-05-27 20:01:29 INFO Dropping magic cut-over table
2020-05-27 20:01:29 INFO Releasing lock from `test_schema`.`test_table`, `test_schema`.`_test_table_del`
2020-05-27 20:01:29 INFO Tables unlocked
2020-05-27 20:01:29 INFO Tables renamed
2020-05-27 20:01:29 INFO Lock & rename duration: 982.403088ms. During this time, queries on `test_table` were blocked
2020-05-27 20:01:29 INFO Looking for magic cut-over table
[2020/05/27 20:01:29] [info] binlogsyncer.go:164 syncer is closing...
[2020/05/27 20:01:29] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
2020-05-27 20:01:29 DEBUG Done streaming
2020-05-27 20:01:29 INFO Closed streamer connection. err=<nil>
[2020/05/27 20:01:29] [info] binlogsyncer.go:179 syncer is closed
2020-05-27 20:01:29 INFO Dropping table `test_schema`.`_test_table_ghc`
2020-05-27 20:01:29 INFO Table dropped
2020-05-27 20:01:29 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2020-05-27 20:01:29 INFO -- drop table `test_schema`.`_test_table_del`
2020-05-27 20:01:29 INFO Done migrating `test_schema`.`test_table`
2020-05-27 20:01:29 INFO Removing socket file: /tmp/gh-ost.test_schema.test_table.sock
2020-05-27 20:01:29 INFO Tearing down inspector
2020-05-27 20:01:29 INFO Tearing down applier
2020-05-27 20:01:29 DEBUG Tearing down...
2020-05-27 20:01:29 INFO Tearing down streamer
2020-05-27 20:01:29 INFO Tearing down throttler
2020-05-27 20:01:29 DEBUG Tearing down...
# Done

And, no surprise, the newly migrated table does lack the id=2, name='Tom' row:

mysql: ghost@db [test_schema]> select * from test_table;
+----+------+------------+-----------+
| id | name | updated_at | test_col1 |
+----+------+------------+-----------+
|  1 | John | NULL       |       123 |
+----+------+------------+-----------+
1 row in set (0.01 sec)

So I'm back to scratching my head really hard -- clearly I'm missing something as so many ppl use the tool successfully... 🤯

tomkrouper commented 4 years ago

I'm curious why test_col is not getting created. You have:

  --alter='add column test_col int, add column test_col1 int default 123' \

But the alter is just adding test_col1

2020-05-27 19:57:27 INFO Altering ghost table `test_schema`.`_test_table_gho`
2020-05-27 19:57:27 DEBUG ALTER statement: alter /* gh-ost */ table `test_schema`.`_test_table_gho` add column test_col1 int default 123
2020-05-27 19:57:27 INFO Ghost table altered

@jfudally, do you have time to look at this this week? If not I can probably take a look.

jfudally commented 4 years ago

@tomkrouper Yeah I can take a look this week

eric239 commented 4 years ago

I'm curious why test_col is not getting created. You have:

@tomkrouper @jfudally begging pardon, I only added the single column in the 2nd (more contrived) test, so there is no issue with the DDL. Just with the data not being picked up from the binlog stream.

eric239 commented 4 years ago

for the reference, here's the full cmd line for the 2nd run

./gh-ost \
  --user ghost \
  --password xxx \
  --host db-ro \
  --initially-drop-ghost-table \
  --assume-master-host=db \
  --database=test_schema \
  --assume-rbr \
  --max-lag-millis=2000 \
  --chunk-size=1000 \
  --postpone-cut-over-flag-file=/tmp/postpone-cut-over.lock \
  --table=test_table \
  --alter='add column test_col1 int default 123' \
  --execute \
  --verbose \
  --debug
jfudally commented 4 years ago

@eric239 👋 So far I've tested on a local cluster using the community MySQL version 5.7.26 and that seems to work just fine. I'm currently spinning up a test RDS cluster to test next. Are you running this against MySQL in RDS, or MySQL compatible Aurora?

eric239 commented 4 years ago

@jfudally thanks for taking this. We're running "regular" (not Aurora) MySQL RDS.

I've tried with both our dev "cluster" where master and replica are 5.7 and prod one where master is 5.6 and replica is 5.7 (temporary arrangement).

Focusing on the last run (reported in this issue), the master is 5.6.44, binlog on master is mixed. The replica is 5.7.26, binlog on replica is row. Happy to share any other parameters / config that might be helpful.

Do I understand correctly that nothing in the log itself provides a clue?

eric239 commented 4 years ago

👋 @jfudally any updates? I'm happy to run any specific tests if you can direct me

jfudally commented 4 years ago

👋 @eric239 So I've been able to spin up an RDS cluster using MySQL 5.7.26, and I'm having a little bit of difficulty getting ghost to run with the same command you provided above when using ROW based replication. Can you verify with the following command on the primary and replica?:

 select @@global.log_bin, @@global.binlog_format;
jfudally commented 4 years ago

Ah I didn't realize RDS required backups enabled on the replicas for the parameters to actually take effect...

So @eric239 I've been able to run the same scenario as you posted above, and everything seems to be working as expected. It might also be helpful to see the parameters from the parameter group used to configure the primary and replica node.

Here's what I ran:

Initial table:

mysql> select * from test_table;
+----+------+------------+
| id | name | updated_at |
+----+------+------------+
|  1 | John | NULL       |
+----+------+------------+
1 row in set (0.08 sec)

Command run:

./bin/gh-ost \
  --user *** \
  --password *** \
  --host ***.rds.amazonaws.com \
  --initially-drop-ghost-table \
  --assume-master-host ***.rds.amazonaws.com \
  --database test \
  --assume-rbr \
  --max-lag-millis 2000 \
  --chunk-size 1000 \
  --postpone-cut-over-flag-file /tmp/postpone-cut-over.lock \
  --table=test_table \
  --alter='add column test_col int, add column test_col1 int default 123' \
  --execute \
  --verbose

Initial output:

[2020/06/02 14:55:31] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql ***.rds.amazonaws.com 3306 admin    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/06/02 14:55:31] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin-changelog.000010, 439)
[2020/06/02 14:55:31] [info] binlogsyncer.go:203 register slave for master server ***.rds.amazonaws.com:3306
[2020/06/02 14:55:31] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000010, 439)
# Migrating `test`.`test_table`; Ghost table is `test`.`_test_table_gho`
# Migrating ip-10-1-5-136:3306; inspecting ip-1***:3306; executing on ***
# Migration started at Tue Jun 02 14:55:29 -0500 2020
# chunk-size: 1000; max-lag-millis: 2000ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# postpone-cut-over-flag-file: /tmp/postpone-cut-over.lock [set]
# Serving on unix socket: /tmp/gh-ost.test.test_table.sock
Copy: 0/1 0.0%; Applied: 0; Backlog: 0/1000; Time: 5s(total), 0s(copy); streamer: mysql-bin-changelog.000010:7211; Lag: 0.12s, State: migrating; ETA: N/A
Copy: 0/1 0.0%; Applied: 0; Backlog: 0/1000; Time: 7s(total), 1s(copy); streamer: mysql-bin-changelog.000010:11795; Lag: 0.09s, State: migrating; ETA: N/A
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 7s(total), 1s(copy); streamer: mysql-bin-changelog.000010:14403; Lag: 0.09s, State: migrating; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 8s(total), 1s(copy); streamer: mysql-bin-changelog.000010:16740; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 9s(total), 1s(copy); streamer: mysql-bin-changelog.000010:21163; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 10s(total), 1s(copy); streamer: mysql-bin-changelog.000010:25383; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 11s(total), 1s(copy); streamer: mysql-bin-changelog.000010:29600; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 12s(total), 1s(copy); streamer: mysql-bin-changelog.000010:33822; Lag: 0.11s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 13s(total), 1s(copy); streamer: mysql-bin-changelog.000010:38042; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 14s(total), 1s(copy); streamer: mysql-bin-changelog.000010:42265; Lag: 0.10s, State: postponing cut-over; ETA: due

Ghost table:

mysql> select * from _test_table_gho;
+----+------+------------+----------+-----------+
| id | name | updated_at | test_col | test_col1 |
+----+------+------------+----------+-----------+
|  1 | John | NULL       |     NULL |       123 |
+----+------+------------+----------+-----------+
1 row in set (0.07 sec)

Insert on primary:

mysql> insert into test_table (name) values ('Tom');
Query OK, 1 row affected (0.08 sec)

Select on replica:

mysql> select * from _test_table_gho;
+----+------+------------+----------+-----------+
| id | name | updated_at | test_col | test_col1 |
+----+------+------------+----------+-----------+
|  1 | John | NULL       |     NULL |       123 |
|  2 | Tom  | NULL       |     NULL |       123 |
+----+------+------------+----------+-----------+
2 rows in set (0.07 sec)

Final log after cutover:

...
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 14s(total), 1s(copy); streamer: mysql-bin-changelog.000010:42265; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 15s(total), 1s(copy); streamer: mysql-bin-changelog.000010:46485; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 16s(total), 1s(copy); streamer: mysql-bin-changelog.000010:50707; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 17s(total), 1s(copy); streamer: mysql-bin-changelog.000010:54929; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 18s(total), 1s(copy); streamer: mysql-bin-changelog.000010:59145; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 19s(total), 1s(copy); streamer: mysql-bin-changelog.000010:63366; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 20s(total), 1s(copy); streamer: mysql-bin-changelog.000010:67587; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 21s(total), 1s(copy); streamer: mysql-bin-changelog.000010:71808; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 22s(total), 1s(copy); streamer: mysql-bin-changelog.000010:76031; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 23s(total), 1s(copy); streamer: mysql-bin-changelog.000010:80252; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 24s(total), 1s(copy); streamer: mysql-bin-changelog.000010:84473; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 25s(total), 1s(copy); streamer: mysql-bin-changelog.000010:88692; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 26s(total), 1s(copy); streamer: mysql-bin-changelog.000010:92915; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 27s(total), 1s(copy); streamer: mysql-bin-changelog.000010:97136; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 28s(total), 1s(copy); streamer: mysql-bin-changelog.000010:101357; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 29s(total), 1s(copy); streamer: mysql-bin-changelog.000010:105578; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 30s(total), 1s(copy); streamer: mysql-bin-changelog.000010:109799; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 31s(total), 1s(copy); streamer: mysql-bin-changelog.000010:114023; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 32s(total), 1s(copy); streamer: mysql-bin-changelog.000010:117495; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 33s(total), 1s(copy); streamer: mysql-bin-changelog.000010:120594; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 34s(total), 1s(copy); streamer: mysql-bin-changelog.000010:123316; Lag: 0.33s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 35s(total), 1s(copy); streamer: mysql-bin-changelog.000010:126040; Lag: 0.33s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 36s(total), 1s(copy); streamer: mysql-bin-changelog.000010:128387; Lag: 0.33s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 37s(total), 1s(copy); streamer: mysql-bin-changelog.000010:132980; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 38s(total), 1s(copy); streamer: mysql-bin-changelog.000010:137204; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 39s(total), 1s(copy); streamer: mysql-bin-changelog.000010:141426; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 40s(total), 1s(copy); streamer: mysql-bin-changelog.000010:145648; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 41s(total), 1s(copy); streamer: mysql-bin-changelog.000010:149872; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 42s(total), 1s(copy); streamer: mysql-bin-changelog.000010:154092; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 43s(total), 1s(copy); streamer: mysql-bin-changelog.000010:158314; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 44s(total), 1s(copy); streamer: mysql-bin-changelog.000010:162535; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 45s(total), 1s(copy); streamer: mysql-bin-changelog.000010:166758; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 46s(total), 1s(copy); streamer: mysql-bin-changelog.000010:170976; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 47s(total), 1s(copy); streamer: mysql-bin-changelog.000010:175200; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 48s(total), 1s(copy); streamer: mysql-bin-changelog.000010:179424; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 49s(total), 1s(copy); streamer: mysql-bin-changelog.000010:183271; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 50s(total), 1s(copy); streamer: mysql-bin-changelog.000010:187866; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 51s(total), 1s(copy); streamer: mysql-bin-changelog.000010:192088; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 52s(total), 1s(copy); streamer: mysql-bin-changelog.000010:196312; Lag: 0.20s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 53s(total), 1s(copy); streamer: mysql-bin-changelog.000010:200536; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 54s(total), 1s(copy); streamer: mysql-bin-changelog.000010:204760; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 55s(total), 1s(copy); streamer: mysql-bin-changelog.000010:208982; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 56s(total), 1s(copy); streamer: mysql-bin-changelog.000010:213204; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 57s(total), 1s(copy); streamer: mysql-bin-changelog.000010:217428; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 58s(total), 1s(copy); streamer: mysql-bin-changelog.000010:221652; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 59s(total), 1s(copy); streamer: mysql-bin-changelog.000010:225868; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m0s(total), 1s(copy); streamer: mysql-bin-changelog.000010:229717; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m5s(total), 1s(copy); streamer: mysql-bin-changelog.000010:249305; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m10s(total), 1s(copy); streamer: mysql-bin-changelog.000010:268514; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m15s(total), 1s(copy); streamer: mysql-bin-changelog.000010:287356; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m20s(total), 1s(copy); streamer: mysql-bin-changelog.000010:306950; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m25s(total), 1s(copy); streamer: mysql-bin-changelog.000010:326164; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m30s(total), 1s(copy); streamer: mysql-bin-changelog.000010:345366; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m35s(total), 1s(copy); streamer: mysql-bin-changelog.000010:360465; Lag: 0.18s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m40s(total), 1s(copy); streamer: mysql-bin-changelog.000010:377812; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m45s(total), 1s(copy); streamer: mysql-bin-changelog.000010:397032; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m50s(total), 1s(copy); streamer: mysql-bin-changelog.000010:416234; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m55s(total), 1s(copy); streamer: mysql-bin-changelog.000010:435452; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m0s(total), 1s(copy); streamer: mysql-bin-changelog.000010:454658; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m5s(total), 1s(copy); streamer: mysql-bin-changelog.000010:473875; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m10s(total), 1s(copy); streamer: mysql-bin-changelog.000010:493076; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m15s(total), 1s(copy); streamer: mysql-bin-changelog.000010:512290; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m20s(total), 1s(copy); streamer: mysql-bin-changelog.000010:531512; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m25s(total), 1s(copy); streamer: mysql-bin-changelog.000010:550726; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m30s(total), 1s(copy); streamer: mysql-bin-changelog.000010:569950; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m35s(total), 1s(copy); streamer: mysql-bin-changelog.000010:585426; Lag: 0.36s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m40s(total), 1s(copy); streamer: mysql-bin-changelog.000010:602400; Lag: 0.19s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m45s(total), 1s(copy); streamer: mysql-bin-changelog.000010:621614; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m50s(total), 1s(copy); streamer: mysql-bin-changelog.000010:640836; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m55s(total), 1s(copy); streamer: mysql-bin-changelog.000010:660052; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 3m0s(total), 1s(copy); streamer: mysql-bin-changelog.000010:679268; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 1; Backlog: 0/1000; Time: 3m30s(total), 1s(copy); streamer: mysql-bin-changelog.000010:792705; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 1; Backlog: 0/1000; Time: 4m0s(total), 1s(copy); streamer: mysql-bin-changelog.000010:899616; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 1; Backlog: 0/1000; Time: 4m30s(total), 1s(copy); streamer: mysql-bin-changelog.000010:1011759; Lag: 0.11s, State: postponing cut-over; ETA: due
[2020/06/02 15:00:00] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000011, 4)
[2020/06/02 15:00:00] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000011, 4)
Copy: 1/1 100.0%; Applied: 1; Backlog: 0/1000; Time: 5m0s(total), 1s(copy); streamer: mysql-bin-changelog.000011:93104; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 1/1 100.0%; Applied: 1; Backlog: 0/1000; Time: 5m30s(total), 1s(copy); streamer: mysql-bin-changelog.000011:205912; Lag: 0.09s, State: postponing cut-over; ETA: due
# Migrating `test`.`test_table`; Ghost table is `test`.`_test_table_gho`
# Migrating ip-***:3306; inspecting ip-***:3306; executing on ***
# Migration started at Tue Jun 02 14:55:29 -0500 2020
# chunk-size: 1000; max-lag-millis: 2000ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# postpone-cut-over-flag-file: /tmp/postpone-cut-over.lock 
# Serving on unix socket: /tmp/gh-ost.test.test_table.sock
Copy: 1/1 100.0%; Applied: 1; Backlog: 0/1000; Time: 5m38s(total), 1s(copy); streamer: mysql-bin-changelog.000011:232484; Lag: 0.17s, State: migrating; ETA: due
[2020/06/02 15:01:09] [info] binlogsyncer.go:164 syncer is closing...
[2020/06/02 15:01:09] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
[2020/06/02 15:01:09] [info] binlogsyncer.go:179 syncer is closed
# Done

Final select on replica:

mysql> select * from test_table;
+----+------+------------+----------+-----------+
| id | name | updated_at | test_col | test_col1 |
+----+------+------------+----------+-----------+
|  1 | John | NULL       |     NULL |       123 |
|  2 | Tom  | NULL       |     NULL |       123 |
+----+------+------------+----------+-----------+
2 rows in set (0.09 sec)
eric239 commented 4 years ago

@jfudally thank you for confirming gh-ost works as expected; I thought it would 😄

Hopefully you can help me shed some light on what is different in my setup.

Firstly, the settings:

And yes, replica has backups enabled for 1 day.

Secondly, I think the key line I see in your log that I don't see in mine is

Copy: 1/1 100.0%; Applied: 1; Backlog: 0/1000; Time: 5m0s(total), 1s(copy); streamer: mysql-bin-changelog.000011:93104; Lag: 0.09s, State: postponing cut-over; ETA: due

I think Applied: 1 actually refers to having picked up a change from the stream. Is that correct interpretation? In my case, it's always Applied: 0.

Is there any additional logging I can tweak to try to troubleshoot?

I'm also going to create brand new pair of DBs in our env as a sanity check.

Thanks again 🙏

jfudally commented 4 years ago

@eric239 I see your primary node is set to MIXED binlog_format. If the data coming through the binlog is in STATEMENT format, I don't think gh-ost will be able to parse it. I haven't tested this theory yet, but if you are provisioning a new cluster I would suggest setting the primary node to ROW as well, and see if that fixes the issue.

eric239 commented 4 years ago

@jfudally thank you for the hint; what I don't understand though is why how the master is configured would matter given that gh-ost is instructed to receive binlog stream from the replica which is replicating in ROW format.

jfudally commented 4 years ago

@eric239 Yes, yes you're right. Since we're connecting to the replica gh-ost is only writing to the primary, and not reading the binlog from it. I've just tested the scenario where the primary RDS node is MIXED and the replica ROW, and it seems to work just fine 🤷

./bin/gh-ost \
  --user *** \
  --password *** \
  --host ***.rds.amazonaws.com \
  --initially-drop-ghost-table \
  --assume-master-host ***.rds.amazonaws.com \
  --database test \
  --assume-rbr \
  --max-lag-millis 2000 \
  --chunk-size 1000 \
  --postpone-cut-over-flag-file /tmp/postpone-cut-over.lock \
  --table=test_table \
  --alter='add column test_col2 int default 123' \
  --execute \
  --verbose
Primary:
mysql> insert into test_table (name) values ('Jane');
Query OK, 1 row affected (0.07 sec)

Replica:
mysql> select * from _test_table_gho;
+----+------+------------+----------+-----------+-----------+
| id | name | updated_at | test_col | test_col1 | test_col2 |
+----+------+------------+----------+-----------+-----------+
|  1 | John | NULL       |     NULL |       123 |       123 |
|  2 | Tom  | NULL       |     NULL |       123 |       123 |
|  3 | Jane | NULL       |     NULL |       123 |       123 |
+----+------+------------+----------+-----------+-----------+
3 rows in set (0.06 sec)

And the log reflects Applied: 1; during the insert:

...
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 26s(total), 1s(copy); streamer: mysql-bin-changelog.000273:97374; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 27s(total), 1s(copy); streamer: mysql-bin-changelog.000273:101597; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 28s(total), 1s(copy); streamer: mysql-bin-changelog.000273:105821; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 29s(total), 1s(copy); streamer: mysql-bin-changelog.000273:110039; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 2/2 100.0%; Applied: 1; Backlog: 0/1000; Time: 30s(total), 1s(copy); streamer: mysql-bin-changelog.000273:114827; Lag: 0.09s, State: postponing cut-over; ETA: due
Copy: 2/2 100.0%; Applied: 1; Backlog: 0/1000; Time: 31s(total), 1s(copy); streamer: mysql-bin-changelog.000273:119051; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 2/2 100.0%; Applied: 1; Backlog: 0/1000; Time: 32s(total), 1s(copy); streamer: mysql-bin-changelog.000273:123275; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 2/2 100.0%; Applied: 1; Backlog: 0/1000; Time: 33s(total), 1s(copy); streamer: mysql-bin-changelog.000273:127497; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 2/2 100.0%; Applied: 1; Backlog: 0/1000; Time: 34s(total), 1s(copy); streamer: mysql-bin-changelog.000273:131721; Lag: 0.10s, State: postponing cut-over; ETA: due
Copy: 2/2 100.0%; Applied: 1; Backlog: 0/1000; Time: 35s(total), 1s(copy); streamer: mysql-bin-changelog.000273:135945; Lag: 0.09s, State: postponing cut-over; ETA: due
...
eric239 commented 4 years ago

@jfudally I finally confirmed that there is no issue with brand new RDS setup. I appreciate your time and apologize for not checking vanilla setup as the first step. I'll keep scratching my head and looking for reasons what make my "real" setup different but since there's no issue with gh-ost itself closing this.

tomkrouper commented 4 years ago

@eric239, if you find the issue, please report back so we can update the docs. Thanks @jfudally

eric239 commented 4 years ago

Epilogue: after many hours of digging through MySQL logs and engaging AWS support to no avail, the issue was solved by the (replica) instance reboot... 🤷 The "turn it off, wait, turn it on" magic.