github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.4k stars 1.26k forks source link

Cutover not succeeding #687

Open JGulbronson opened 5 years ago

JGulbronson commented 5 years ago

When I try to run a migration to add an index on a relatively small table, the cutover never succeeds. I've run it with verbose output, and get the following:

2018-12-06 14:43:42 INFO Locking `COMPANY_prod`.`retailers`, `COMPANY_prod`.`_retailers_del`
Copy: 91156/91156 100.0%; Applied: 2; Backlog: 0/1000; Time: 1m35s(total), 39s(copy); streamer: mysql-bin-changelog.002933:13183392; State: migrating; ETA: due
2018-12-06 14:43:48 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:43:48 INFO Looking for magic cut-over table
2018-12-06 14:43:48 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:43:48 INFO Dropping magic cut-over table
2018-12-06 14:43:48 INFO Dropping table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:43:48 INFO Table dropped
2018-12-06 14:43:49 INFO Grabbing voluntary lock: gh-ost.12668.lock
2018-12-06 14:43:49 INFO Setting LOCK timeout as 6 seconds
2018-12-06 14:43:49 INFO Looking for magic cut-over table
2018-12-06 14:43:49 INFO Creating magic cut-over table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:43:50 INFO Magic cut-over table created
2018-12-06 14:43:50 INFO Locking `COMPANY_prod`.`retailers`, `COMPANY_prod`.`_retailers_del`
Copy: 91156/91156 100.0%; Applied: 4; Backlog: 0/1000; Time: 1m40s(total), 39s(copy); streamer: mysql-bin-changelog.002933:13689223; State: migrating; ETA: due
2018-12-06 14:43:56 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:43:56 INFO Looking for magic cut-over table
2018-12-06 14:43:56 INFO Dropping magic cut-over table
2018-12-06 14:43:56 INFO Dropping table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:43:56 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:43:56 INFO Table dropped
Copy: 91156/91156 100.0%; Applied: 4; Backlog: 0/1000; Time: 1m45s(total), 39s(copy); streamer: mysql-bin-changelog.002933:14037660; State: migrating; ETA: due
2018-12-06 14:43:57 INFO Grabbing voluntary lock: gh-ost.12680.lock
2018-12-06 14:43:57 INFO Setting LOCK timeout as 6 seconds
2018-12-06 14:43:57 INFO Looking for magic cut-over table
2018-12-06 14:43:57 INFO Creating magic cut-over table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:43:57 INFO Magic cut-over table created
2018-12-06 14:43:57 INFO Locking `COMPANY_prod`.`retailers`, `COMPANY_prod`.`_retailers_del`
Copy: 91156/91156 100.0%; Applied: 4; Backlog: 0/1000; Time: 1m50s(total), 39s(copy); streamer: mysql-bin-changelog.002933:14381575; State: migrating; ETA: due
2018-12-06 14:44:03 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:44:03 INFO Looking for magic cut-over table
2018-12-06 14:44:03 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:44:03 INFO Dropping magic cut-over table
2018-12-06 14:44:03 INFO Dropping table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:44:03 INFO Table dropped
2018-12-06 14:44:05 INFO Grabbing voluntary lock: gh-ost.12685.lock
2018-12-06 14:44:05 INFO Setting LOCK timeout as 6 seconds
2018-12-06 14:44:05 INFO Looking for magic cut-over table
2018-12-06 14:44:05 INFO Creating magic cut-over table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:44:05 INFO Magic cut-over table created
2018-12-06 14:44:05 INFO Locking `COMPANY_prod`.`retailers`, `COMPANY_prod`.`_retailers_del`
Copy: 91156/91156 100.0%; Applied: 4; Backlog: 0/1000; Time: 1m55s(total), 39s(copy); streamer: mysql-bin-changelog.002933:15039450; State: migrating; ETA: due

It will repeat this ad infinitum, until I kill it after 3-5 minutes. We are using Aurora, but have RBR set and run-on-master. We're able to run other migrations, except for this (relatively) small one, so I'm trying to figure out what possibly could cause this so my debugging can be a bit more guided.

So far, love the tool and the fact I don't have to worry about cleanup after, it's been great for us. Just need to figure out this occasional issue!

JGulbronson commented 5 years ago

Update, my coworker was able to run the migration no problem after I posted this. Is it possible something wasn't cleaned up properly the first time that prevented subsequent attempts from succeeding?

ggunson commented 5 years ago

@JGulbronson When gh-ost is in the cut-over process, it attempts (among other things) to get a write lock on the table being migrated; if it cannot (because of other connections' locks on the table) it will time out on that lock wait (hence the Error 1205: Lock wait timeout exceeded; try restarting transaction. By default it times out after a few seconds so that it won't block other activity on the table by continuing to wait; and then retries. See https://github.com/github/gh-ost/issues/82

I'd guess there might have been some activity on that table, like a long-running transaction holding locks, that blocked gh-ost from getting that exclusive table lock, though I can't say for sure.

If that happens again I'd suggest running

SHOW FULL PROCESSLIST;
SHOW ENGINE INNODB STATUS\G

And see if there are running transactions holding locks on the table.

babinomec commented 5 years ago

I had the same problem (looping tries on cut over), even on relatively unused tables (along with syncer errors during binlog streaming). All this was resolved by using the latest master version (at the time, commit e48844de0bee9a8db611a06cd6080cac4dab25cb)

peppy commented 5 years ago

I'm seeing this same thing on a mysql 8 host. This is on a table which is very rarely used and also very small (intentional for testing):

2019-02-14 05:31:47 INFO Looking for magic cut-over table
2019-02-14 05:31:47 INFO Dropping magic cut-over table
2019-02-14 05:31:47 INFO Dropping table `osu`.`_phpbb_disallow_del`
2019-02-14 05:31:47 INFO Table dropped
2019-02-14 05:31:48 INFO Grabbing voluntary lock: gh-ost.15579113.lock
2019-02-14 05:31:48 INFO Setting LOCK timeout as 20 seconds
2019-02-14 05:31:48 INFO Looking for magic cut-over table
2019-02-14 05:31:48 INFO Creating magic cut-over table `osu`.`_phpbb_disallow_del`
2019-02-14 05:31:48 INFO Magic cut-over table created
2019-02-14 05:31:48 INFO Locking `osu`.`phpbb_disallow`, `osu`.`_phpbb_disallow_del`
2019-02-14 05:31:48 INFO Tables locked
2019-02-14 05:31:48 INFO Session locking original & magic tables is 15579113
2019-02-14 05:31:48 INFO Writing changelog state: AllEventsUpToLockProcessed:1550122308341602368
2019-02-14 05:31:48 INFO Intercepted changelog state AllEventsUpToLockProcessed
2019-02-14 05:31:48 INFO Handled changelog state AllEventsUpToLockProcessed
2019-02-14 05:31:48 INFO Waiting for events up to lock
2019-02-14 05:31:49 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1550122308341602368
2019-02-14 05:31:49 INFO Done waiting for events up to lock; duration=934.82674ms
# Migrating `osu`.`phpbb_disallow`; Ghost table is `osu`.`_phpbb_disallow_gho`
# Migrating db-master:3306; inspecting db-master:3306; executing on db-master
# Migration started at Thu Feb 14 05:25:55 +0000 2019
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-control-replicas count: 1
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.osu.phpbb_disallow.sock
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 5m53s(total), 1s(copy); streamer: mysql-bin.005307:279157087; State: migrating; ETA: due
2019-02-14 05:31:49 INFO Setting RENAME timeout as 10 seconds
2019-02-14 05:31:49 INFO Session renaming tables is 15579206
2019-02-14 05:31:49 INFO Issuing and expecting this to block: rename /* gh-ost */ table `osu`.`phpbb_disallow` to `osu`.`_phpbb_disallow_del`, `osu`.`_phpbb_disallow_gho` to `osu`.`phpbb_disallow`
2019-02-14 05:31:49 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2019-02-14 05:31:49 INFO Checking session lock: gh-ost.15579113.lock
2019-02-14 05:31:49 INFO Connection holding lock on original table still exists
2019-02-14 05:31:49 INFO Will now proceed to drop magic table and unlock tables
2019-02-14 05:31:49 INFO Dropping magic cut-over table
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 5m55s(total), 1s(copy); streamer: mysql-bin.005307:282332887; State: migrating; ETA: due
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m0s(total), 1s(copy); streamer: mysql-bin.005307:292088677; State: migrating; ETA: due
2019-02-14 05:31:59 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m5s(total), 1s(copy); streamer: mysql-bin.005307:303311125; State: migrating; ETA: due
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m10s(total), 1s(copy); streamer: mysql-bin.005307:313333415; State: migrating; ETA: due
2019-02-14 05:32:09 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2019-02-14 05:32:09 INFO Releasing lock from `osu`.`phpbb_disallow`, `osu`.`_phpbb_disallow_del`
2019-02-14 05:32:09 INFO Tables unlocked
2019-02-14 05:32:09 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2019-02-14 05:32:09 INFO Looking for magic cut-over table
2019-02-14 05:32:09 INFO Dropping magic cut-over table
2019-02-14 05:32:09 INFO Dropping table `osu`.`_phpbb_disallow_del`
2019-02-14 05:32:09 INFO Table dropped
2019-02-14 05:32:10 INFO Grabbing voluntary lock: gh-ost.15579113.lock
2019-02-14 05:32:10 INFO Setting LOCK timeout as 20 seconds
2019-02-14 05:32:10 INFO Looking for magic cut-over table
2019-02-14 05:32:10 INFO Creating magic cut-over table `osu`.`_phpbb_disallow_del`
2019-02-14 05:32:10 INFO Magic cut-over table created
2019-02-14 05:32:10 INFO Locking `osu`.`phpbb_disallow`, `osu`.`_phpbb_disallow_del`
2019-02-14 05:32:10 INFO Tables locked
2019-02-14 05:32:10 INFO Session locking original & magic tables is 15579113
2019-02-14 05:32:10 INFO Writing changelog state: AllEventsUpToLockProcessed:1550122330350015577
2019-02-14 05:32:10 INFO Intercepted changelog state AllEventsUpToLockProcessed
2019-02-14 05:32:10 INFO Handled changelog state AllEventsUpToLockProcessed
2019-02-14 05:32:10 INFO Waiting for events up to lock
Copy: 103/103 100.0%; Applied: 0; Backlog: 1/1000; Time: 6m15s(total), 1s(copy); streamer: mysql-bin.005307:323107210; State: migrating; ETA: due
2019-02-14 05:32:11 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1550122330350015577
2019-02-14 05:32:11 INFO Done waiting for events up to lock; duration=929.661604ms
# Migrating `osu`.`phpbb_disallow`; Ghost table is `osu`.`_phpbb_disallow_gho`
# Migrating db-master:3306; inspecting db-master:3306; executing on db-master
# Migration started at Thu Feb 14 05:25:55 +0000 2019
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-control-replicas count: 1
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.osu.phpbb_disallow.sock
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m15s(total), 1s(copy); streamer: mysql-bin.005307:324603780; State: migrating; ETA: due
2019-02-14 05:32:11 INFO Setting RENAME timeout as 10 seconds
2019-02-14 05:32:11 INFO Session renaming tables is 15579261
2019-02-14 05:32:11 INFO Issuing and expecting this to block: rename /* gh-ost */ table `osu`.`phpbb_disallow` to `osu`.`_phpbb_disallow_del`, `osu`.`_phpbb_disallow_gho` to `osu`.`phpbb_disallow`
2019-02-14 05:32:11 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2019-02-14 05:32:11 INFO Checking session lock: gh-ost.15579113.lock
2019-02-14 05:32:11 INFO Connection holding lock on original table still exists
2019-02-14 05:32:11 INFO Will now proceed to drop magic table and unlock tables
2019-02-14 05:32:11 INFO Dropping magic cut-over table
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m20s(total), 1s(copy); streamer: mysql-bin.005307:334280054; State: migrating; ETA: due
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m25s(total), 1s(copy); streamer: mysql-bin.005307:343985711; State: migrating; ETA: due
2019-02-14 05:32:21 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m30s(total), 1s(copy); streamer: mysql-bin.005307:355004881; State: migrating; ETA: due

SHOW ENGINE INNODB STATUS reports nothing of interest.

SHOW PROCESSLIST has just the one RENAME line:

15579113,repl,Query,18,Waiting for column statistics lock,drop /* gh-ost */ table if exists `osu`.`_phpbb_disallow_del`

performance_schema.metadata_locks:

TABLESPACE      osu/_phpbb_disallow_del     139531072737760 INTENTION_EXCLUSIVE TRANSACTION GRANTED lock.cc:793 15580335    1245
TABLESPACE      osu/phpbb_disallow      139531073279360 INTENTION_EXCLUSIVE TRANSACTION GRANTED lock.cc:793 15580335    1245
COLUMN STATISTICS   osu _phpbb_disallow_del id  139531073479632 SHARED_READ STATEMENT   GRANTED sql_base.cc:544 15580335    1245
BACKUP LOCK             139531074200928 INTENTION_EXCLUSIVE TRANSACTION GRANTED sql_backup_lock.cc:97   15580335    1247
COLUMN STATISTICS   osu _phpbb_disallow_del db_roll_ptr 139531074610192 EXCLUSIVE   TRANSACTION GRANTED histogram.cc:207    15580335    1247
COLUMN STATISTICS   osu _phpbb_disallow_del db_trx_id   139531085308512 EXCLUSIVE   TRANSACTION GRANTED histogram.cc:207    15580335    1247
COLUMN STATISTICS   osu _phpbb_disallow_del id  139531073337008 EXCLUSIVE   TRANSACTION PENDING histogram.cc:207    15580335    1247
GLOBAL              139531073686400 INTENTION_EXCLUSIVE STATEMENT   GRANTED sql_base.cc:5345    15580335    1245
SCHEMA  osu         139531072340656 INTENTION_EXCLUSIVE TRANSACTION GRANTED sql_base.cc:5333    15580335    1245
TABLE   osu _phpbb_disallow_del     139531085562032 SHARED_NO_READ_WRITE    TRANSACTION GRANTED sql_parse.cc:5617   15580335    1245
TABLE   osu phpbb_disallow      139531072699024 SHARED_NO_READ_WRITE    TRANSACTION GRANTED sql_parse.cc:5617   15580335    1245
USER LEVEL LOCK     gh-ost.15580295.lock        139531072871808 EXCLUSIVE   EXPLICIT    GRANTED item_func.cc:4516   15580335    562
peppy commented 5 years ago

Additionally, manually running the rename works without issue (and runs instantly) outside of the cut-over process.

Also, using --cut-over=two-step works (as one would expect).

shlomi-noach commented 5 years ago

MySQL 8 introduced an atomic rename which is supported while holding table locks. this was developed specifically for gh-ost, and I should start using it...

drogart commented 5 years ago

I'm seeing the same issue as peppy with 1.0.48, even when running on a test instance of mysql 8.0.11 where the only traffic is from gh-ost itself.

shlomi-noach commented 5 years ago

If anyone feels like testing https://github.com/github/gh-ost/pull/715, please let me know. I do not have a test bed for this, yet.

Please be advised this is experimental; if possible, please first test-on-replica. If you plan on testing on master in production, please make sure you are prepared for risks (I'm hoping this runs smoothly, but since I haven't tested in prod I cannot in good faith claim this works well). Risks may include loss of transactions, lockdown of tables, and beyond.

drogart commented 5 years ago

I commented on the pull request.

tl;dr: still fails, but I think it's because the 8.0.13 rename behavior requires either all or none of the involved tables to be write locked for our case.

Thanks!

shlomi-noach commented 4 years ago

In an attempt to reproduce and fix the problem, I tested today with MySQL 8.0.18, and -- everything worked as expected?

I issued:

gh-ost \
  --user=gh-ost \
  --password=gh-ost \
  --host=localhost \
  --port=20821 \
  --assume-master-host=localhost:20819 \
  --database=test \
  --table=gh_ost_test \
  --alter='engine=innodb' \
  --exact-rowcount \
  --assume-rbr \
  --initially-drop-old-table \
  --initially-drop-ghost-table \
  --throttle-query='select timestampdiff(second, min(last_update), now()) < 5 from _gh_ost_test_ghc' \
  --serve-socket-file=/tmp/gh-ost.test.sock \
  --initially-drop-socket-file \
  --default-retries=3 \
  --chunk-size=10 \
  --verbose \
  --debug \
  --stack \
  --execute

I got:

2020-02-04 15:59:56 INFO starting gh-ost
2020-02-04 15:59:56 INFO Migrating `test`.`gh_ost_test`
2020-02-04 15:59:56 INFO connection validated on localhost:20821
2020-02-04 15:59:56 INFO User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `test`.*
2020-02-04 15:59:56 INFO binary logs validated on localhost:20821
2020-02-04 15:59:56 INFO Inspector initiated on shlomi-oct:20821, version 8.0.18
2020-02-04 15:59:56 INFO Table found. Engine=InnoDB
2020-02-04 15:59:56 DEBUG Estimated number of rows via STATUS: 3
2020-02-04 15:59:56 DEBUG Validated no foreign keys exist on table
2020-02-04 15:59:56 DEBUG Validated no triggers exist on table
2020-02-04 15:59:56 INFO Estimated number of rows via EXPLAIN: 3
2020-02-04 15:59:56 DEBUG Potential unique keys in gh_ost_test: [PRIMARY (auto_increment): [id]; has nullable: false]
2020-02-04 15:59:56 INFO Master forced to be localhost:20819
2020-02-04 15:59:56 INFO log_slave_updates validated on localhost:20821
2020-02-04 15:59:56 INFO connection validated on localhost:20821
2020-02-04 15:59:56 DEBUG Streamer binlog coordinates: mysql-bin.000001:203020
2020-02-04 15:59:56 INFO Connecting binlog streamer at mysql-bin.000001:203020
[2020/02/04 15:59:56] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql localhost 20821 gh-ost    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/02/04 15:59:56] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin.000001, 203020)
[2020/02/04 15:59:56] [info] binlogsyncer.go:203 register slave for master server localhost:20821
2020-02-04 15:59:56 DEBUG Beginning streaming
[2020/02/04 15:59:56] [info] binlogsyncer.go:723 rotate to (mysql-bin.000001, 203020)
2020-02-04 15:59:56 INFO rotate to next log from mysql-bin.000001:0 to mysql-bin.000001
2020-02-04 15:59:56 INFO connection validated on localhost:20819
2020-02-04 15:59:56 INFO connection validated on localhost:20819
2020-02-04 15:59:56 INFO will use time_zone='SYSTEM' on applier
2020-02-04 15:59:56 INFO Examining table structure on applier
2020-02-04 15:59:56 INFO Applier initiated on shlomi-oct:20819, version 8.0.18
2020-02-04 15:59:56 INFO Dropping table `test`.`_gh_ost_test_gho`
2020-02-04 15:59:56 INFO Table dropped
2020-02-04 15:59:56 INFO Dropping table `test`.`_gh_ost_test_del`
2020-02-04 15:59:56 INFO Table dropped
2020-02-04 15:59:56 INFO Dropping table `test`.`_gh_ost_test_ghc`
2020-02-04 15:59:56 INFO Table dropped
2020-02-04 15:59:56 INFO Creating changelog table `test`.`_gh_ost_test_ghc`
2020-02-04 15:59:56 INFO Changelog table created
2020-02-04 15:59:56 INFO Creating ghost table `test`.`_gh_ost_test_gho`
2020-02-04 15:59:56 INFO Ghost table created
2020-02-04 15:59:56 INFO Altering ghost table `test`.`_gh_ost_test_gho`
2020-02-04 15:59:56 DEBUG ALTER statement: alter /* gh-ost */ table `test`.`_gh_ost_test_gho` drop column shushu, engine=innodb
2020-02-04 15:59:56 INFO Ghost table altered
2020-02-04 15:59:56 INFO Waiting for ghost table to be migrated. Current lag is 0s
2020-02-04 15:59:56 INFO Intercepted changelog state GhostTableMigrated
2020-02-04 15:59:56 INFO Handled changelog state GhostTableMigrated
2020-02-04 15:59:56 DEBUG ghost table migrated
2020-02-04 15:59:56 DEBUG Potential unique keys in _gh_ost_test_gho: [PRIMARY (auto_increment): [id]; has nullable: false]
2020-02-04 15:59:56 INFO Chosen shared unique key is PRIMARY
2020-02-04 15:59:56 INFO Shared columns are id,i,color
2020-02-04 15:59:56 INFO Listening on unix socket file: /tmp/gh-ost.test.sock
2020-02-04 15:59:56 INFO As instructed, counting rows in the background; meanwhile I will use an estimated count, and will update it later on
2020-02-04 15:59:56 DEBUG Reading migration range according to key: PRIMARY
2020-02-04 15:59:56 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2020-02-04 15:59:56 INFO Migration min values: [1]
2020-02-04 15:59:56 DEBUG Reading migration range according to key: PRIMARY
2020-02-04 15:59:56 INFO Migration max values: [3]
2020-02-04 15:59:56 INFO Waiting for first throttle metrics to be collected
2020-02-04 15:59:56 INFO First throttle metrics collected
2020-02-04 15:59:56 DEBUG Operating until row copy is complete
# Migrating `test`.`gh_ost_test`; Ghost table is `test`.`_gh_ost_test_gho`
2020-02-04 15:59:56 DEBUG Getting nothing in the write queue. Sleeping...
# Migrating shlomi-oct:20819; inspecting shlomi-oct:20821; executing on shlomi-oct
# Migration started at Tue Feb 04 15:59:56 +0200 2020
# chunk-size: 100; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-query: select timestampdiff(second, min(last_update), now()) < 5 from _gh_ost_test_ghc
# Serving on unix socket: /tmp/gh-ost.test.sock
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin.000001:206450; Lag: 0.02s, State: throttled, throttle-query; ETA: N/A
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.000001:211052; Lag: 0.01s, State: throttled, throttle-query; ETA: N/A
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 2s(copy); streamer: mysql-bin.000001:215654; Lag: 0.02s, State: throttled, throttle-query; ETA: N/A
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 3s(total), 3s(copy); streamer: mysql-bin.000001:220252; Lag: 0.02s, State: throttled, throttle-query; ETA: N/A
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 4s(total), 4s(copy); streamer: mysql-bin.000001:224852; Lag: 0.01s, State: throttled, throttle-query; ETA: N/A
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 5s(total), 5s(copy); streamer: mysql-bin.000001:229452; Lag: 0.02s, State: throttled, throttle-query; ETA: N/A
2020-02-04 16:00:01 DEBUG Issued INSERT on range: [1]..[3]; iteration: 0; chunk-size: 100
2020-02-04 16:00:01 DEBUG Iteration complete: no further range to iterate
2020-02-04 16:00:01 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-04 16:00:01 INFO Row copy complete
Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 5s(total), 5s(copy); streamer: mysql-bin.000001:231913; Lag: 0.01s, State: migrating; ETA: due
2020-02-04 16:00:01 DEBUG checking for cut-over postpone
2020-02-04 16:00:01 DEBUG checking for cut-over postpone: complete
2020-02-04 16:00:01 INFO Grabbing voluntary lock: gh-ost.51.lock
2020-02-04 16:00:01 INFO Setting LOCK timeout as 6 seconds
2020-02-04 16:00:01 INFO Looking for magic cut-over table
2020-02-04 16:00:01 INFO Creating magic cut-over table `test`.`_gh_ost_test_del`
2020-02-04 16:00:01 INFO Magic cut-over table created
2020-02-04 16:00:01 INFO Locking `test`.`gh_ost_test`, `test`.`_gh_ost_test_del`
2020-02-04 16:00:01 INFO Tables locked
2020-02-04 16:00:01 INFO Session locking original & magic tables is 51
2020-02-04 16:00:01 INFO Writing changelog state: AllEventsUpToLockProcessed:1580824801554808000
2020-02-04 16:00:01 INFO Waiting for events up to lock
2020-02-04 16:00:01 INFO Intercepted changelog state AllEventsUpToLockProcessed
2020-02-04 16:00:01 INFO Handled changelog state AllEventsUpToLockProcessed
Copy: 3/3 100.0%; Applied: 0; Backlog: 1/1000; Time: 6s(total), 5s(copy); streamer: mysql-bin.000001:236706; Lag: 0.02s, State: migrating; ETA: due
2020-02-04 16:00:02 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-04 16:00:02 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1580824801554808000
2020-02-04 16:00:02 INFO Done waiting for events up to lock; duration=989.170997ms
# Migrating `test`.`gh_ost_test`; Ghost table is `test`.`_gh_ost_test_gho`
# Migrating shlomi-oct:20819; inspecting shlomi-oct:20821; executing on shlomi-oct
# Migration started at Tue Feb 04 15:59:56 +0200 2020
# chunk-size: 100; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-query: select timestampdiff(second, min(last_update), now()) < 5 from _gh_ost_test_ghc
# Serving on unix socket: /tmp/gh-ost.test.sock
Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 6s(total), 5s(copy); streamer: mysql-bin.000001:238424; Lag: 0.02s, State: migrating; ETA: due
2020-02-04 16:00:02 INFO Setting RENAME timeout as 3 seconds
2020-02-04 16:00:02 INFO Session renaming tables is 53
2020-02-04 16:00:02 INFO Issuing and expecting this to block: rename /* gh-ost */ table `test`.`gh_ost_test` to `test`.`_gh_ost_test_del`, `test`.`_gh_ost_test_gho` to `test`.`gh_ost_test`
2020-02-04 16:00:02 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2020-02-04 16:00:02 INFO Checking session lock: gh-ost.51.lock
2020-02-04 16:00:02 INFO Connection holding lock on original table still exists
2020-02-04 16:00:02 INFO Will now proceed to drop magic table and unlock tables
2020-02-04 16:00:02 INFO Dropping magic cut-over table
2020-02-04 16:00:02 INFO Releasing lock from `test`.`gh_ost_test`, `test`.`_gh_ost_test_del`
2020-02-04 16:00:02 INFO Tables unlocked
2020-02-04 16:00:02 INFO Tables renamed
2020-02-04 16:00:02 INFO Lock & rename duration: 1.006560267s. During this time, queries on `gh_ost_test` were blocked
2020-02-04 16:00:02 INFO Looking for magic cut-over table
[2020/02/04 16:00:02] [info] binlogsyncer.go:164 syncer is closing...
2020-02-04 16:00:02 DEBUG done streaming events
2020-02-04 16:00:02 DEBUG Done streaming
[2020/02/04 16:00:02] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
2020-02-04 16:00:02 INFO Closed streamer connection. err=<nil>
[2020/02/04 16:00:02] [info] binlogsyncer.go:179 syncer is closed
2020-02-04 16:00:02 INFO Dropping table `test`.`_gh_ost_test_ghc`
2020-02-04 16:00:02 INFO Table dropped
2020-02-04 16:00:02 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2020-02-04 16:00:02 INFO -- drop table `test`.`_gh_ost_test_del`
2020-02-04 16:00:02 INFO Done migrating `test`.`gh_ost_test`
2020-02-04 16:00:02 INFO Removing socket file: /tmp/gh-ost.test.sock
2020-02-04 16:00:02 INFO Tearing down inspector
2020-02-04 16:00:02 INFO Tearing down applier
2020-02-04 16:00:02 DEBUG Tearing down...
2020-02-04 16:00:02 INFO Tearing down streamer
2020-02-04 16:00:02 INFO Tearing down throttler
2020-02-04 16:00:02 DEBUG Tearing down...
# Done
shlomi-noach commented 4 years ago

EDIT: this comment does not reflect the actual sequence of steps taken in gh-ost so it';s irrelevant. Keeping for posterity.


I'm confused how the above test worked. If I run the following manually on a 8.0.18 master (assuming I have two tables t1 and t2), I get:

master [localhost:20819] {msandbox} (d1) > lock tables t1 write;
Query OK, 0 rows affected (0.00 sec)

master [localhost:20819] {msandbox} (d1) > rename table t1 to t0, t2 to t1;
ERROR 1100 (HY000): Table 't2' was not locked with LOCK TABLES
shlomi-noach commented 4 years ago

In my comment above (https://github.com/github/gh-ost/issues/687#issuecomment-581927698) I did not reproduce the actual steps gh-ost takes, so it is irrelevant.

peppy commented 4 years ago

Thanks for looking into this again. I will test against the latest mysql and report back in the coming days.

shlomi-noach commented 4 years ago

Also tried on 8.0.16 and again the master branch just worked.

peppy commented 4 years ago

I can still reproduce on master (ea339b602380d60921f6e686f29de69d6163c8f0)

root@db:~/gh-ost# mysql --version
mysql  Ver 8.0.16 for Linux on x86_64 (MySQL Community Server - GPL)

Trying using my previous command:

root@db:~/gh-ost# ~/gh-ost/bin/gh-ost --max-load=Threads_running=25 --critical-load=Threads_running=1000 --chunk-size=1000 --throttle-control-replicas="slave_ip" --max-lag-millis=1500 --user="repl" --password="xxxx" --host=master_ip --verbose --exact-rowcount --default-retries=120 --allow-on-master --panic-flag-file=/tmp/ghost.panic.flag --database="osu" --table="osu_errors" --cut-over=two-step --alter="engine=innodb" --initially-drop-old-table --execute
2020-02-05 05:54:52 INFO starting gh-ost ea339b602380d60921f6e686f29de69d6163c8f0
2020-02-05 05:54:52 INFO Migrating `osu`.`osu_errors`
2020-02-05 05:54:52 INFO connection validated on master_ip:3306
2020-02-05 05:54:52 INFO User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `osu`.*
2020-02-05 05:54:52 INFO binary logs validated on master_ip:3306
2020-02-05 05:54:52 INFO Restarting replication on master_ip:3306 to make sure binlog settings apply to replication thread
2020-02-05 05:54:52 INFO Inspector initiated on db:3306, version 8.0.16
2020-02-05 05:54:52 INFO Table found. Engine=InnoDB
2020-02-05 05:54:52 INFO Estimated number of rows via EXPLAIN: 44
2020-02-05 05:54:52 INFO Recursively searching for replication master
2020-02-05 05:54:52 INFO Master found to be db:3306
2020-02-05 05:54:52 INFO log_slave_updates validated on master_ip:3306
2020-02-05 05:54:52 INFO connection validated on master_ip:3306
2020-02-05 05:54:52 INFO Connecting binlog streamer at mysql-bin.019585:590087919
[2020/02/05 05:54:52] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql master_ip 3306 repl    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/02/05 05:54:52] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin.019585, 590087919)
[2020/02/05 05:54:52] [info] binlogsyncer.go:203 register slave for master server master_ip:3306
2020-02-05 05:54:52 INFO rotate to next log from mysql-bin.019585:0 to mysql-bin.019585
[2020/02/05 05:54:52] [info] binlogsyncer.go:723 rotate to (mysql-bin.019585, 590087919)
2020-02-05 05:54:52 INFO connection validated on master_ip:3306
2020-02-05 05:54:52 INFO connection validated on master_ip:3306
2020-02-05 05:54:52 INFO will use time_zone='SYSTEM' on applier
2020-02-05 05:54:52 INFO Examining table structure on applier
2020-02-05 05:54:52 INFO Applier initiated on db:3306, version 8.0.16
2020-02-05 05:54:52 INFO Dropping table `osu`.`_osu_errors_del`
2020-02-05 05:54:52 INFO Table dropped
2020-02-05 05:54:52 INFO Dropping table `osu`.`_osu_errors_ghc`
2020-02-05 05:54:52 INFO Table dropped
2020-02-05 05:54:52 INFO Creating changelog table `osu`.`_osu_errors_ghc`
2020-02-05 05:54:52 INFO Changelog table created
2020-02-05 05:54:52 INFO Creating ghost table `osu`.`_osu_errors_gho`
2020-02-05 05:54:52 INFO Ghost table created
2020-02-05 05:54:52 INFO Altering ghost table `osu`.`_osu_errors_gho`
2020-02-05 05:54:53 INFO Ghost table altered
2020-02-05 05:54:53 INFO Intercepted changelog state GhostTableMigrated
2020-02-05 05:54:53 INFO Waiting for ghost table to be migrated. Current lag is 0s
2020-02-05 05:54:53 INFO Handled changelog state GhostTableMigrated
2020-02-05 05:54:53 INFO Chosen shared unique key is PRIMARY
2020-02-05 05:54:53 INFO Shared columns are error_id,username,user_id,version,osu_mode,game_mode,game_time,audio_time,culture,beatmap_id,beatmap_checksum,exception,feedback,stacktrace,iltrace,config,date
2020-02-05 05:54:53 INFO Listening on unix socket file: /tmp/gh-ost.osu.osu_errors.sock
2020-02-05 05:54:53 INFO As instructed, counting rows in the background; meanwhile I will use an estimated count, and will update it later on
2020-02-05 05:54:53 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2020-02-05 05:54:53 INFO Migration min values: [2267]
2020-02-05 05:54:53 INFO Migration max values: [2891180]
2020-02-05 05:54:53 INFO Waiting for first throttle metrics to be collected
2020-02-05 05:54:53 INFO First throttle metrics collected
# Migrating `osu`.`osu_errors`; Ghost table is `osu`.`_osu_errors_gho`
# Migrating db:3306; inspecting db:3306; executing on db
# Migration started at Wed Feb 05 05:54:52 +0000 2020
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# throttle-control-replicas count: 1
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.osu.osu_errors.sock
Copy: 0/44 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin.019585:590298658; Lag: 0.01s, State: migrating; ETA: N/A
Copy: 0/44 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.019585:590567805; Lag: 0.01s, State: migrating; ETA: N/A
2020-02-05 05:54:54 INFO Row copy complete
Copy: 84/84 100.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.019585:590711632; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:54:54 INFO Locking `osu`.`osu_errors`
2020-02-05 05:54:54 INFO Table locked
2020-02-05 05:54:54 INFO Writing changelog state: AllEventsUpToLockProcessed:1580882094240375857
2020-02-05 05:54:54 INFO Intercepted changelog state AllEventsUpToLockProcessed
2020-02-05 05:54:54 INFO Handled changelog state AllEventsUpToLockProcessed
2020-02-05 05:54:54 INFO Waiting for events up to lock
Copy: 84/84 100.0%; Applied: 0; Backlog: 1/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin.019585:591293960; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:54:55 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1580882094240375857
2020-02-05 05:54:55 INFO Done waiting for events up to lock; duration=998.657843ms
# Migrating `osu`.`osu_errors`; Ghost table is `osu`.`_osu_errors_gho`
# Migrating db:3306; inspecting db:3306; executing on db
# Migration started at Wed Feb 05 05:54:52 +0000 2020
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# throttle-control-replicas count: 1
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.osu.osu_errors.sock
Copy: 84/84 100.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin.019585:591313049; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:54:55 INFO Renaming original table
2020-02-05 05:54:55 INFO Renaming ghost table
2020-02-05 05:54:55 INFO Tables renamed
2020-02-05 05:54:55 INFO Unlocking tables
2020-02-05 05:54:55 INFO Tables unlocked
[2020/02/05 05:54:55] [info] binlogsyncer.go:164 syncer is closing...
2020-02-05 05:54:55 INFO Closed streamer connection. err=<nil>
2020-02-05 05:54:55 INFO Dropping table `osu`.`_osu_errors_ghc`
[2020/02/05 05:54:55] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
[2020/02/05 05:54:55] [info] binlogsyncer.go:179 syncer is closed
2020-02-05 05:54:55 INFO Table dropped
2020-02-05 05:54:55 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2020-02-05 05:54:55 INFO -- drop table `osu`.`_osu_errors_del`
2020-02-05 05:54:55 INFO Done migrating `osu`.`osu_errors`
2020-02-05 05:54:55 INFO Removing socket file: /tmp/gh-ost.osu.osu_errors.sock
2020-02-05 05:54:55 INFO Tearing down inspector
2020-02-05 05:54:55 INFO Tearing down applier
2020-02-05 05:54:55 INFO Tearing down streamer
2020-02-05 05:54:55 INFO Tearing down throttler
# Done

trying using your command above (modified to work with my setup, most important change to note is the addition of --allow-on-master:

root@db:~/gh-ost# ~/gh-ost/bin/gh-ost   --throttle-control-replicas="slave_ip" --max-lag-millis=1500 --user="repl" --password="xxxx" --host=master_ip   --database=osu   --allow-on-master   --table=osu_errors   --alter='engine=innodb'   --exact-rowcount   --assume-rbr   --initially-drop-old-table   --initially-drop-ghost-table   --serve-socket-file=/tmp/gh-ost.test.sock   --initially-drop-socket-file   --default-retries=3   --chunk-size=10   --verbose   --debug   --stack   --execute
2020-02-05 05:55:26 INFO starting gh-ost ea339b602380d60921f6e686f29de69d6163c8f0
2020-02-05 05:55:26 INFO Migrating `osu`.`osu_errors`
2020-02-05 05:55:26 INFO connection validated on master_ip:3306
2020-02-05 05:55:26 INFO User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `osu`.*
2020-02-05 05:55:26 INFO binary logs validated on master_ip:3306
2020-02-05 05:55:26 INFO Inspector initiated on db:3306, version 8.0.16
2020-02-05 05:55:26 INFO Table found. Engine=InnoDB
2020-02-05 05:55:26 DEBUG Estimated number of rows via STATUS: 35
2020-02-05 05:55:26 DEBUG Validated no foreign keys exist on table
2020-02-05 05:55:26 DEBUG Validated no triggers exist on table
2020-02-05 05:55:26 INFO Estimated number of rows via EXPLAIN: 44
2020-02-05 05:55:26 DEBUG Potential unique keys in osu_errors: [PRIMARY (auto_increment): [error_id]; has nullable: false]
2020-02-05 05:55:26 INFO Recursively searching for replication master
2020-02-05 05:55:26 DEBUG Looking for master on master_ip:3306
2020-02-05 05:55:26 INFO Master found to be db:3306
2020-02-05 05:55:26 INFO log_slave_updates validated on master_ip:3306
2020-02-05 05:55:26 INFO connection validated on master_ip:3306
2020-02-05 05:55:26 DEBUG Streamer binlog coordinates: mysql-bin.019585:600428610
2020-02-05 05:55:26 INFO Connecting binlog streamer at mysql-bin.019585:600428610
[2020/02/05 05:55:26] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql master_ip 3306 repl    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/02/05 05:55:26] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin.019585, 600428610)
[2020/02/05 05:55:26] [info] binlogsyncer.go:203 register slave for master server master_ip:3306
2020-02-05 05:55:26 DEBUG Beginning streaming
[2020/02/05 05:55:26] [info] binlogsyncer.go:723 rotate to (mysql-bin.019585, 600428610)
2020-02-05 05:55:26 INFO rotate to next log from mysql-bin.019585:0 to mysql-bin.019585
2020-02-05 05:55:26 INFO connection validated on master_ip:3306
2020-02-05 05:55:26 INFO connection validated on master_ip:3306
2020-02-05 05:55:26 INFO will use time_zone='SYSTEM' on applier
2020-02-05 05:55:26 INFO Examining table structure on applier
2020-02-05 05:55:26 INFO Applier initiated on db:3306, version 8.0.16
2020-02-05 05:55:26 INFO Dropping table `osu`.`_osu_errors_gho`
2020-02-05 05:55:26 INFO Table dropped
2020-02-05 05:55:26 INFO Dropping table `osu`.`_osu_errors_del`
2020-02-05 05:55:26 INFO Table dropped
2020-02-05 05:55:26 INFO Dropping table `osu`.`_osu_errors_ghc`
2020-02-05 05:55:26 INFO Table dropped
2020-02-05 05:55:26 INFO Creating changelog table `osu`.`_osu_errors_ghc`
2020-02-05 05:55:26 INFO Changelog table created
2020-02-05 05:55:26 INFO Creating ghost table `osu`.`_osu_errors_gho`
2020-02-05 05:55:26 INFO Ghost table created
2020-02-05 05:55:26 INFO Altering ghost table `osu`.`_osu_errors_gho`
2020-02-05 05:55:26 DEBUG ALTER statement: alter /* gh-ost */ table `osu`.`_osu_errors_gho` engine=innodb
2020-02-05 05:55:27 INFO Ghost table altered
2020-02-05 05:55:27 INFO Intercepted changelog state GhostTableMigrated
2020-02-05 05:55:27 INFO Waiting for ghost table to be migrated. Current lag is 0s
2020-02-05 05:55:27 DEBUG ghost table migrated
2020-02-05 05:55:27 INFO Handled changelog state GhostTableMigrated
2020-02-05 05:55:27 DEBUG Potential unique keys in _osu_errors_gho: [PRIMARY (auto_increment): [error_id]; has nullable: false]
2020-02-05 05:55:27 INFO Chosen shared unique key is PRIMARY
2020-02-05 05:55:27 INFO Shared columns are error_id,username,user_id,version,osu_mode,game_mode,game_time,audio_time,culture,beatmap_id,beatmap_checksum,exception,feedback,stacktrace,iltrace,config,date
2020-02-05 05:55:27 INFO Listening on unix socket file: /tmp/gh-ost.test.sock
2020-02-05 05:55:27 INFO As instructed, counting rows in the background; meanwhile I will use an estimated count, and will update it later on
2020-02-05 05:55:27 DEBUG Reading migration range according to key: PRIMARY
2020-02-05 05:55:27 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2020-02-05 05:55:27 INFO Migration min values: [2267]
2020-02-05 05:55:27 DEBUG Reading migration range according to key: PRIMARY
2020-02-05 05:55:27 INFO Migration max values: [2891180]
2020-02-05 05:55:27 INFO Waiting for first throttle metrics to be collected
2020-02-05 05:55:27 INFO First throttle metrics collected
2020-02-05 05:55:27 DEBUG Operating until row copy is complete
2020-02-05 05:55:27 DEBUG Getting nothing in the write queue. Sleeping...
# Migrating `osu`.`osu_errors`; Ghost table is `osu`.`_osu_errors_gho`
# Migrating db:3306; inspecting db:3306; executing on db
# Migration started at Wed Feb 05 05:55:26 +0000 2020
# chunk-size: 100; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# throttle-control-replicas count: 1
# Serving on unix socket: /tmp/gh-ost.test.sock
Copy: 0/44 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin.019585:600638981; Lag: 0.01s, State: migrating; ETA: N/A
Copy: 0/44 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.019585:600946019; Lag: 0.01s, State: migrating; ETA: N/A
2020-02-05 05:55:28 DEBUG Issued INSERT on range: [2267]..[2891180]; iteration: 0; chunk-size: 100
2020-02-05 05:55:28 DEBUG Iteration complete: no further range to iterate
2020-02-05 05:55:28 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-05 05:55:28 INFO Row copy complete
Copy: 84/84 100.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.019585:601265342; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:55:28 DEBUG checking for cut-over postpone
2020-02-05 05:55:28 DEBUG checking for cut-over postpone: complete
2020-02-05 05:55:28 INFO Grabbing voluntary lock: gh-ost.49366677.lock
2020-02-05 05:55:28 INFO Setting LOCK timeout as 6 seconds
2020-02-05 05:55:28 INFO Looking for magic cut-over table
2020-02-05 05:55:28 INFO Creating magic cut-over table `osu`.`_osu_errors_del`
2020-02-05 05:55:28 INFO Magic cut-over table created
2020-02-05 05:55:28 INFO Locking `osu`.`osu_errors`, `osu`.`_osu_errors_del`
2020-02-05 05:55:28 INFO Tables locked
2020-02-05 05:55:28 INFO Session locking original & magic tables is 49366677
2020-02-05 05:55:28 INFO Writing changelog state: AllEventsUpToLockProcessed:1580882128472911106
2020-02-05 05:55:28 INFO Intercepted changelog state AllEventsUpToLockProcessed
2020-02-05 05:55:28 INFO Handled changelog state AllEventsUpToLockProcessed
2020-02-05 05:55:28 INFO Waiting for events up to lock
Copy: 84/84 100.0%; Applied: 0; Backlog: 1/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin.019585:601740149; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:55:29 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-05 05:55:29 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1580882128472911106
2020-02-05 05:55:29 INFO Done waiting for events up to lock; duration=974.286376ms
# Migrating `osu`.`osu_errors`; Ghost table is `osu`.`_osu_errors_gho`
# Migrating db:3306; inspecting db:3306; executing on db
# Migration started at Wed Feb 05 05:55:26 +0000 2020
# chunk-size: 100; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# throttle-control-replicas count: 1
# Serving on unix socket: /tmp/gh-ost.test.sock
Copy: 84/84 100.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin.019585:601746558; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:55:29 INFO Setting RENAME timeout as 3 seconds
2020-02-05 05:55:29 INFO Session renaming tables is 49366683
2020-02-05 05:55:29 INFO Issuing and expecting this to block: rename /* gh-ost */ table `osu`.`osu_errors` to `osu`.`_osu_errors_del`, `osu`.`_osu_errors_gho` to `osu`.`osu_errors`
2020-02-05 05:55:29 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2020-02-05 05:55:29 INFO Checking session lock: gh-ost.49366677.lock
2020-02-05 05:55:29 INFO Connection holding lock on original table still exists
2020-02-05 05:55:29 INFO Will now proceed to drop magic table and unlock tables
2020-02-05 05:55:29 INFO Dropping magic cut-over table
2020-02-05 05:55:29 INFO Releasing lock from `osu`.`osu_errors`, `osu`.`_osu_errors_del`
2020-02-05 05:55:29 INFO Tables unlocked
2020-02-05 05:55:29 INFO Tables renamed
2020-02-05 05:55:29 INFO Lock & rename duration: 1.145875074s. During this time, queries on `osu_errors` were blocked
2020-02-05 05:55:29 INFO Looking for magic cut-over table
[2020/02/05 05:55:29] [info] binlogsyncer.go:164 syncer is closing...
2020-02-05 05:55:29 INFO Closed streamer connection. err=<nil>
2020-02-05 05:55:29 INFO Dropping table `osu`.`_osu_errors_ghc`
2020-02-05 05:55:29 DEBUG Done streaming
[2020/02/05 05:55:29] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
[2020/02/05 05:55:29] [info] binlogsyncer.go:179 syncer is closed
2020-02-05 05:55:30 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-05 05:55:31 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-05 05:55:32 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-05 05:55:32 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
goroutine 83 [running]:
runtime/debug.Stack(0x5c, 0x100, 0xc000106230)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/github/gh-ost/vendor/github.com/outbrain/golib/log.logErrorEntry(0x2, 0x8a5f80, 0xc00040dcc0, 0xc0002b23c0, 0xba)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/log/log.go:178 +0xe7
github.com/github/gh-ost/vendor/github.com/outbrain/golib/log.Errore(...)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/log/log.go:224
github.com/github/gh-ost/vendor/github.com/outbrain/golib/sqlutils.ExecNoPrepare(0xc00017a000, 0xc0002b23c0, 0xba, 0xc0000fbea0, 0x3, 0x3, 0x0, 0x0, 0x0, 0x0)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/sqlutils/sqlutils.go:283 +0x185
github.com/github/gh-ost/go/logic.(*Applier).WriteChangelog(0xc0001fc2a0, 0x8109d6, 0x9, 0xc0001fb660, 0x1e, 0xc0001fb660, 0x1e, 0xc0000fbf08, 0x67b203)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/applier.go:292 +0x2d0
github.com/github/gh-ost/go/logic.(*Applier).InitiateHeartbeat.func1(0xc000144000, 0xc0000fbf00)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/applier.go:313 +0x9d
github.com/github/gh-ost/go/logic.(*Applier).InitiateHeartbeat(0xc0001fc2a0)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/applier.go:336 +0x122
created by github.com/github/gh-ost/go/logic.(*Migrator).initiateApplier
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/migrator.go:1074 +0x21e
2020-02-05 05:55:32 ERROR Error 1146: Table 'osu._osu_errors_ghc' doesn't exist
goroutine 165 [running]:
runtime/debug.Stack(0x4f, 0x200, 0xc000126300)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/github/gh-ost/vendor/github.com/outbrain/golib/log.logErrorEntry(0x2, 0x8a5f80, 0xc00040dd20, 0xc00016a240, 0xba)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/log/log.go:178 +0xe7
github.com/github/gh-ost/vendor/github.com/outbrain/golib/log.Errore(...)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/log/log.go:224
github.com/github/gh-ost/vendor/github.com/outbrain/golib/sqlutils.ExecNoPrepare(0xc00017a000, 0xc00016a240, 0xba, 0xc000504cd8, 0x3, 0x3, 0x0, 0x0, 0x0, 0x0)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/sqlutils/sqlutils.go:283 +0x185
github.com/github/gh-ost/go/logic.(*Applier).WriteChangelog(0xc0001fc2a0, 0xc0004e2de0, 0x1e, 0xc00036c140, 0x98, 0xc0004e2de0, 0x1e, 0x1, 0xc000164680)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/applier.go:292 +0x2d0
github.com/github/gh-ost/go/logic.(*Migrator).printStatus(0xc000108120, 0x1, 0xc0000ad950, 0x1, 0x1)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/migrator.go:973 +0x8db
created by github.com/github/gh-ost/go/logic.(*Migrator).initiateStatus
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/migrator.go:785 +0xab
Copy: 84/84 100.0%; Applied: 0; Backlog: 0/1000; Time: 4s(total), 1s(copy); streamer: mysql-bin.019585:601777417; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:55:32 INFO Table dropped
2020-02-05 05:55:32 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2020-02-05 05:55:32 INFO -- drop table `osu`.`_osu_errors_del`
2020-02-05 05:55:32 INFO Done migrating `osu`.`osu_errors`
2020-02-05 05:55:32 INFO Removing socket file: /tmp/gh-ost.test.sock
2020-02-05 05:55:32 INFO Tearing down inspector
2020-02-05 05:55:32 INFO Tearing down applier
2020-02-05 05:55:32 DEBUG Tearing down...
2020-02-05 05:55:32 INFO Tearing down streamer
2020-02-05 05:55:32 INFO Tearing down throttler
2020-02-05 05:55:32 DEBUG Tearing down...
# Done

Please let me know if I can provide any further details to help further the investigation.

shlomi-noach commented 4 years ago

@peppy it seems like your migration worked after one retry?

But, as I look into this issue, there's actually multiple cases reported by multiple people and this got me confused. I think I'm working on something other than the issue you presented 😬

shlomi-noach commented 4 years ago

@peppy so what I'm trying to see is whether gh-ost works with MySQL 8.0, as per https://github.com/github/gh-ost/pull/715 (where you also participate). I'll continue the discussion there.

peppy commented 4 years ago

Aha, right.

And yes, the first one did work (with a weird error in the middle though), as mentioned earlier in this thread, because i used cut-over=two-step.

peppy commented 4 years ago

Actually on closer inspection it looks like both may have completed, if one is to ignore the error messages and stack traces.

shlomi-noach commented 4 years ago

if one is to ignore the error messages and stack traces.

Y es, I did mean the 2nd one worked, sorry for not being clear. Seems like retries were successful. Now, this doesn't mean it will always work; perhaps some workload will cause an infinite amount of retries -- not sure. At any case, what I was working on is irrelevant to this issue, so my past few comments can just be ignored 😛

jianhaiqing commented 3 years ago

As far as I know, 8.0.15 doesn't work, and after 8.0.15 it works; simulation operation as follows

create / gh-ost / table sbtest._sbtest1_ghc ( id int auto_increment primary key ) engine=InnoDB comment='_sbtest1_ghc';

create / gh-ost / table sbtest._sbtest1_gho ( id int auto_increment primary key ) engine=InnoDB comment='_sbtest1_ghc';

create / gh-ost / table sbtest.sbtest1 ( id int auto_increment primary key ) engine=InnoDB comment='_sbtest1_ghc';

- session 1:
```sql
-- step1
select get_lock('gh-ost.11.lock', 0);
lock /* gh-ost */ tables `sbtest`.`sbtest1` write, `sbtest`.`_sbtest1_del` write;
-- step 3
drop /* gh-ost */ table if exists `sbtest`.`_sbtest1_del`;  
-- blocked in step 3
shlomi-noach commented 3 years ago

This could be related to https://github.com/github/gh-ost/issues/799, fixed by https://github.com/openark/gh-ost/pull/14.

Otherwise, (and I don't have the capacity to test) if as @jianhaiqing suggests this doesn't work up to 8.0.15 and does word after 8.0.15, then I don't intend to pursue the issue, and will at most update the docs to require < 8.0 or > 8.0.15.

jianhaiqing commented 3 years ago

Yes, I agree with you. Updating docs ok.