github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.43k stars 1.26k forks source link

ERROR Unable to acquire lock gh-ost.3726786.lock #569

Open chilumbugeorge opened 6 years ago

chilumbugeorge commented 6 years ago

Hi. When i run gh-ost on the master without the --execute using the command below:

./gh-ost --host=192.168.24.22 --user=ghost --password=ghostPass --database=test --table=tb1 --alter="ADD COLUMN x INT NOT NULL DEFAULT '0'" --chunk-size=2000 --max-load=Threads_connected=20 --allow-on-master`

I get the following error:

# throttle-additional-flag-file: /tmp/gh-ost.throttle [225/1843]
# Serving on unix socket: /tmp/gh-ost.test.tb1.sock
Copy: 0/0 100.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysqld-bin.000006:12839070; State: migrating; ETA: due
CREATE TABLE `_tb1_gho` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`x` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
2018/04/16 10:53:10 binlogsyncer.go:107: [info] syncer is closing...
2018/04/16 10:53:10 binlogstreamer.go:47: [error] close sync with err: sync is been closing...
2018/04/16 10:53:10 binlogsyncer.go:122: [info] syncer is closed
Copy: 0/6 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysqld-bin.000006:12839070; State: migrating; ETA: N/A
# Done

And when i run with the --execute option, i get the following error:

Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m0s(total), 1s(copy); streamer: mysqld-bin.000006:13108629; State: migrating; ETA: due
2018-04-16 10:57:12 ERROR Unable to acquire lock gh-ost.3726786.lock
2018-04-16 10:57:12 ERROR Unable to acquire lock gh-ost.3726786.lock
2018-04-16 10:57:12 FATAL Unable to acquire lock gh-ost.3726786.lock

Either way, the command does not seem to be running as expected. I have also tried running on the slave with the initial commands you provided and still no luck. The tool seems to be promising as an alternative to Percona's pt-online-schema-change, but sad cannot get it to run.

shlomi-noach commented 6 years ago

Hi, can you please paste the complete log output for the -execute execution? which MySQL version are you using?

Your first execution completes without error, by the way. This message: [error] close sync with err: sync is been closing... is invoked by an underlying library gh-ost uses, but does not indicate an application error. I understand seeing this message in the standard output is alarming.

chilumbugeorge commented 6 years ago

Hi Shlomi. Thanks for your quick response.

I am using MySQL version 5.7. To be precise, here is the version:

percona-xtradb-cluster-server-5.7 5.7.17-29.20-3.trusty

So when i run the command without the --execute on the master, i get the following:

2018/04/16 14:28:08 binlogsyncer.go:79: [info] create BinlogSyncer with config {99999 mysql master_ip 3306 ghost   false false <nil>}
2018/04/16 14:28:08 binlogsyncer.go:246: [info] begin to sync binlog from position (mysqld-bin.000006, 13113017)
2018/04/16 14:28:08 binlogsyncer.go:139: [info] register slave for master server master_ip:3306
2018/04/16 14:28:08 binlogsyncer.go:573: [info] rotate to (mysqld-bin.000006, 13113017)
# Migrating `test`.`tb1`; Ghost table is `test`.`_tb1_gho`
# Migrating nangang-vm-db-pxc-test-1:3306; inspecting nangang-vm-db-pxc-test-1:3306; executing on nangang-vm-db-pxc-test-1
# Migrating `test`.`tb1`; Ghost table is `test`.`_tb1_gho`
# Migrating nangang-vm-db-pxc-test-1:3306; inspecting nangang-vm-db-pxc-test-1:3306; executing on nangang-vm-db-pxc-test-1
# Migration started at Mon Apr 16 14:28:08 +0800 2018
# chunk-size: 2000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_connected=20; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# Serving on unix socket: /tmp/gh-ost.test.tb1.sock
# Migration started at Mon Apr 16 14:28:08 +0800 2018
# chunk-size: 2000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_connected=20; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# Serving on unix socket: /tmp/gh-ost.test.tb1.sock
Copy: 0/0 100.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysqld-bin.000006:13115079; State: migrating; ETA: due
CREATE TABLE `_tb1_gho` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `x` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
2018/04/16 14:28:08 binlogsyncer.go:107: [info] syncer is closing...
Copy: 0/6 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysqld-bin.000006:13115079; State: migrating; ETA: N/A
2018/04/16 14:28:09 binlogstreamer.go:47: [error] close sync with err: sync is been closing...
2018/04/16 14:28:09 binlogsyncer.go:122: [info] syncer is closed
# Done

And when i run with --execute, i get the following error:

# Migration started at Mon Apr 16 14:31:08 +0800 2018
# chunk-size: 2000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_connected=20; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# Serving on unix socket: /tmp/gh-ost.test.tb1.sock
Copy: 0/6 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysqld-bin.000006:13118634; State: migrating; ETA: N/A
Copy: 0/6 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysqld-bin.000006:13123049; State: migrating; ETA: N/A
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysqld-bin.000006:13123838; State: migrating; ETA: due
2018-04-16 14:31:10 ERROR Unable to acquire lock gh-ost.3739543.lock
2018-04-16 14:31:10 ERROR Unable to acquire lock gh-ost.3739543.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 1s(copy); streamer: mysqld-bin.000006:13128257; State: migrating; ETA: due
2018-04-16 14:31:11 ERROR Unable to acquire lock gh-ost.3739536.lock
2018-04-16 14:31:11 ERROR Unable to acquire lock gh-ost.3739536.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 3s(total), 1s(copy); streamer: mysqld-bin.000006:13132676; State: migrating; ETA: due
2018-04-16 14:31:12 ERROR Unable to acquire lock gh-ost.3739543.lock
2018-04-16 14:31:12 ERROR Unable to acquire lock gh-ost.3739543.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 4s(total), 1s(copy); streamer: mysqld-bin.000006:13137091; State: migrating; ETA: due
2018-04-16 14:31:13 ERROR Unable to acquire lock gh-ost.3739536.lock
2018-04-16 14:31:13 ERROR Unable to acquire lock gh-ost.3739536.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 5s(total), 1s(copy); streamer: mysqld-bin.000006:13141508; State: migrating; ETA: due
2018-04-16 14:31:14 ERROR Unable to acquire lock gh-ost.3739543.lock
2018-04-16 14:31:14 ERROR Unable to acquire lock gh-ost.3739543.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 6s(total), 1s(copy); streamer: mysqld-bin.000006:13145921; State: migrating; ETA: due
2018-04-16 14:31:15 ERROR Unable to acquire lock gh-ost.3739536.lock
2018-04-16 14:31:15 ERROR Unable to acquire lock gh-ost.3739536.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 7s(total), 1s(copy); streamer: mysqld-bin.000006:13150336; State: migrating; ETA: due
2018-04-16 14:31:16 ERROR Unable to acquire lock gh-ost.3739536.lock
2018-04-16 14:31:16 ERROR Unable to acquire lock gh-ost.3739536.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 8s(total), 1s(copy); streamer: mysqld-bin.000006:13154755; State: migrating; ETA: due
2018-04-16 14:31:17 ERROR Unable to acquire lock gh-ost.3739536.lock
2018-04-16 14:31:17 ERROR Unable to acquire lock gh-ost.3739536.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 9s(total), 1s(copy); streamer: mysqld-bin.000006:13159172; State: migrating; ETA: due
2018-04-16 14:31:18 ERROR Unable to acquire lock gh-ost.3739543.lock
2018-04-16 14:31:18 ERROR Unable to acquire lock gh-ost.3739543.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 10s(total), 1s(copy); streamer: mysqld-bin.000006:13163591; State: migrating; ETA: due
2018-04-16 14:31:19 ERROR Unable to acquire lock gh-ost.3739536.lock
2018-04-16 14:31:19 ERROR Unable to acquire lock gh-ost.3739536.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 11s(total), 1s(copy); streamer: mysqld-bin.000006:13168011; State: migrating; ETA: due
2018-04-16 14:31:20 ERROR Unable to acquire lock gh-ost.3739536.lock
2018-04-16 14:31:20 ERROR Unable to acquire lock gh-ost.3739536.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 12s(total), 1s(copy); streamer: mysqld-bin.000006:13172429; State: migrating; ETA: due
2018-04-16 14:31:21 ERROR Unable to acquire lock gh-ost.3739536.lock
2018-04-16 14:31:21 ERROR Unable to acquire lock gh-ost.3739536.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 13s(total), 1s(copy); streamer: mysqld-bin.000006:13176847; State: migrating; ETA: due
2018-04-16 14:31:22 ERROR Unable to acquire lock gh-ost.3739543.lock
2018-04-16 14:31:22 ERROR Unable to acquire lock gh-ost.3739543.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 14s(total), 1s(copy); streamer: mysqld-bin.000006:13181267; State: migrating; ETA: due
2018-04-16 14:31:23 ERROR Unable to acquire lock gh-ost.3739536.lock
2018-04-16 14:31:23 ERROR Unable to acquire lock gh-ost.3739536.lock
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 15s(total), 1s(copy); streamer: mysqld-bin.000006:13185687; State: migrating; ETA: due
2018-04-16 14:31:24 ERROR Unable to acquire lock gh-ost.3739543.lock
2018-04-16 14:31:24 ERROR Unable to acquire lock gh-ost.3739543.lock
..
Copy: 8/8 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m0s(total), 1s(copy); streamer: mysqld-bin.000006:13384483; State: migrating; ETA: due
2018-04-16 14:32:09 ERROR Unable to acquire lock gh-ost.3739536.lock
2018-04-16 14:32:09 ERROR Unable to acquire lock gh-ost.3739536.lock
2018-04-16 14:32:09 FATAL Unable to acquire lock gh-ost.3739536.lock
george.chilumbu@nangang-vm-db-pxc-test-1:/opt$
shlomi-noach commented 6 years ago

XtraDB Cluster unfortunately does not support GET_LOCK(): https://www.percona.com/doc/percona-xtradb-cluster/LATEST/limitation.html, which is why gh-ost is failing.

There's a discussion in #224, and see in particular https://github.com/github/gh-ost/issues/224#issuecomment-254593577 .

I don't have further insight at this time. For XtraDB cluster you may have to revert to pt-online-schema-change.

shlomi-noach commented 6 years ago

Hmm perhaps on a single-master setup GET_LOCK() can be executed. Please let me know if you do try it out.

chilumbugeorge commented 6 years ago

How unfortunate coz all our databases use PXC. So it means gh-ost cannot work for us.

chilumbugeorge commented 6 years ago

So for PXC, it still does not work if i run the command on the save instead of one of the masters?

shlomi-noach commented 6 years ago

So for PXC, it still does not work if i run the command on the save instead of one of the masters?

I wouldn't know. I was hoping you could tell me.

chilumbugeorge commented 6 years ago

On the slave (whose master is part of a PXC), it also does not run. So without the --execute and --allow-on-master params, i get the following:

load=Threads_connected=20
2018/04/16 18:48:52 binlogsyncer.go:79: [info] create BinlogSyncer with config {99999 mysql slave_ip 3306 ghost   false false <nil>}
2018/04/16 18:48:52 binlogsyncer.go:246: [info] begin to sync binlog from position (mysqld-bin.000029, 882245706)
2018/04/16 18:48:52 binlogsyncer.go:139: [info] register slave for master server slave_ip:3306
2018-04-16 18:49:52 FATAL connection was bad

And with the --execute param added, i get the following:

2018/04/16 18:50:50 binlogsyncer.go:79: [info] create BinlogSyncer with config {99999 mysql slave_ip 3306 ghost   false false <nil>}
2018/04/16 18:50:50 binlogsyncer.go:246: [info] begin to sync binlog from position (mysqld-bin.000029, 882245706)
2018/04/16 18:50:50 binlogsyncer.go:139: [info] register slave for master server slave_ip:3306
2018-04-16 18:51:50 FATAL connection was bad

The IP on the --host is that of the slave FYI. But even when i change to that of the master (for the heck of it), i get the following error:

2018-04-16 18:54:44 FATAL dial tcp master_ip:3306: getsockopt: no route to host

So it does not seem to work even if you run from the slave.