Open raniejade opened 6 years ago
Thank you for the detailed report. I’ll look into it.
It finally worked, for some reason.
Copy: 883626763/883626763 100.0%; Applied: 33323200; Backlog: 12/1000; Time: 98h59m30s(total), 48h31m10s(copy); streamer: mysql-bin-changelog.077450:855325; State: postponing cut-over; ETA: due
# Migrating `media`.`media_bundle`; Ghost table is `media`.`_media_bundle_gho`
# Migrating ip-172-17-1-238:3306; inspecting ip-172-17-3-149:3306; executing on ip-10-0-9-219
# Migration started at Thu Apr 19 04:03:47 +0000 2018
# chunk-size: 1000; max-lag-millis: 500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-flag-file: ./throttle.flag
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-control-replicas count: 3
# postpone-cut-over-flag-file: ./cutover.flag [set]
# panic-flag-file: ./panic.flag
# Serving on unix socket: /tmp/gh-ost.media.media_bundle.sock
Copy: 883626763/883626763 100.0%; Applied: 33325625; Backlog: 33/1000; Time: 99h0m0s(total), 48h31m10s(copy); streamer: mysql-bin-changelog.077450:4508481; State: postponing cut-over; ETA: due
Copy: 883626763/883626763 100.0%; Applied: 33327979; Backlog: 72/1000; Time: 99h0m30s(total), 48h31m10s(copy); streamer: mysql-bin-changelog.077450:8145010; State: postponing cut-over; ETA: due
Copy: 883626763/883626763 100.0%; Applied: 33330435; Backlog: 13/1000; Time: 99h1m0s(total), 48h31m10s(copy); streamer: mysql-bin-changelog.077450:11853964; State: postponing cut-over; ETA: due
2018-04-23 07:04:51 INFO Grabbing voluntary lock: gh-ost.9124052.lock
2018-04-23 07:04:51 INFO Setting LOCK timeout as 6 seconds
2018-04-23 07:04:51 INFO Looking for magic cut-over table
2018-04-23 07:04:51 INFO Creating magic cut-over table `media`.`_media_bundle_del`
2018-04-23 07:04:52 INFO Magic cut-over table created
2018-04-23 07:04:52 INFO Locking `media`.`media_bundle`, `media`.`_media_bundle_del`
2018-04-23 07:04:52 INFO Tables locked
2018-04-23 07:04:52 INFO Session locking original & magic tables is 9124052
2018-04-23 07:04:52 INFO Writing changelog state: AllEventsUpToLockProcessed:1524467092042964910
2018-04-23 07:04:52 INFO Waiting for events up to lock
2018-04-23 07:04:52 INFO Intercepted changelog state AllEventsUpToLockProcessed
2018-04-23 07:04:52 INFO Handled changelog state AllEventsUpToLockProcessed
2018-04-23 07:04:52 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1524467092042964910
2018-04-23 07:04:52 INFO Done waiting for events up to lock; duration=403.148675ms
# Migrating `media`.`media_bundle`; Ghost table is `media`.`_media_bundle_gho`
# Migrating ip-172-17-1-238:3306; inspecting ip-172-17-3-149:3306; executing on ip-10-0-9-219
# Migration started at Thu Apr 19 04:03:47 +0000 2018
# chunk-size: 1000; max-lag-millis: 500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-flag-file: ./throttle.flag
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-control-replicas count: 3
# postpone-cut-over-flag-file: ./cutover.flag
# panic-flag-file: ./panic.flag
# Serving on unix socket: /tmp/gh-ost.media.media_bundle.sock
Copy: 883626763/883626763 100.0%; Applied: 33330765; Backlog: 0/1000; Time: 99h1m4s(total), 48h31m10s(copy); streamer: mysql-bin-changelog.077450:12390373; State: migrating; ETA: due
2018-04-23 07:04:52 INFO Setting RENAME timeout as 3 seconds
2018-04-23 07:04:52 INFO Session renaming tables is 9131747
2018-04-23 07:04:52 INFO Issuing and expecting this to block: rename /* gh-ost */ table `media`.`media_bundle` to `media`.`_media_bundle_del`, `media`.`_media_bundle_gho` to `media`.`media_bundle`
2018-04-23 07:04:52 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2018-04-23 07:04:52 INFO Checking session lock: gh-ost.9124052.lock
2018-04-23 07:04:52 INFO Connection holding lock on original table still exists
2018-04-23 07:04:52 INFO Will now proceed to drop magic table and unlock tables
2018-04-23 07:04:52 INFO Dropping magic cut-over table
2018-04-23 07:04:52 INFO Releasing lock from `media`.`media_bundle`, `media`.`_media_bundle_del`
2018-04-23 07:04:52 INFO Tables unlocked
2018-04-23 07:04:52 INFO Tables renamed
2018-04-23 07:04:52 INFO Lock & rename duration: 494.933783ms. During this time, queries on `media_bundle` were blocked
2018-04-23 07:04:52 INFO Looking for magic cut-over table
2018/04/23 07:04:52 binlogsyncer.go:107: [info] syncer is closing...
2018/04/23 07:04:52 binlogstreamer.go:47: [error] close sync with err: sync is been closing...
2018/04/23 07:04:52 binlogsyncer.go:122: [info] syncer is closed
2018-04-23 07:04:52 INFO Closed streamer connection. err=<nil>
2018-04-23 07:04:52 INFO Dropping table `media`.`_media_bundle_ghc`
2018-04-23 07:04:52 INFO Table dropped
2018-04-23 07:04:52 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2018-04-23 07:04:52 INFO -- drop table `media`.`_media_bundle_del`
2018-04-23 07:04:52 INFO Done migrating `media`.`media_bundle`
2018-04-23 07:04:52 INFO Removing socket file: /tmp/gh-ost.media.media_bundle.sock
2018-04-23 07:04:52 INFO Tearing down inspector
[MySQL] 2018/04/23 07:04:52 packets.go:118: write tcp 10.0.9.219:55570->10.0.16.20:3306: write: connection timed out
2018-04-23 07:04:52 INFO Tearing down applier
[MySQL] 2018/04/23 07:04:52 packets.go:118: write tcp 10.0.9.219:48704->10.0.18.162:3306: write: connection timed out
2018-04-23 07:04:52 INFO Tearing down streamer
2018-04-23 07:04:52 INFO Tearing down throttler
# Done
--- All done
Interesting take from this, during the last minutes before the cutover I still see some backlog being processed.
Copy: 883626763/883626763 100.0%; Applied: 33330435; Backlog: 13/1000; Time: 99h1m0s(total), 48h31m10s(copy); streamer: mysql-bin-changelog.077450:11853964; State: postponing cut-over; ETA: due
2018-04-23 07:04:51 INFO Grabbing voluntary lock: gh-ost.9124052.lock
However on the previous runs, the backlog was always 0
. So probably it might have stopped receiving bin_log events even before the rows were fully copied. I'm going to look at our previous logs to see if I can see something useful.
we had this problem too。why? Copy: 5158614/5158614 100.0%; Applied: 0; Backlog: 0/1000; Time: 3m5s(total), 2m56s(copy); streamer: mysql-bin.000206:485009551; State: migrating; ETA: due 2018-05-14 17:10:26 ERROR Timeout while waiting for events up to lock 2018-05-14 17:10:26 ERROR 2018-05-14 17:10:26 ERROR Timeout while waiting for events up to lock 2018-05-14 17:10:26 INFO Looking for magic cut-over table 2018-05-14 17:10:26 INFO Will now proceed to drop magic table and unlock tables 2018-05-14 17:10:26 INFO Dropping magic cut-over table 2018-05-14 17:10:26 INFO Dropping magic cut-over table
Which MySQL version is involved? Parallel replication? Thread cache? Any other interesting setup?
Probably related to #403. Upon issuing a cut-over,
gh-ost
exits after several attempts to wait forAllEventsUpToLockProcessed
event to appear. Have verified thatAllEventsUpToLockProcessed
is indeed written to the change log table.This is how we run
gh-ost
.And the last few lines of the log file.
We are using
MySQL 5.6.34
on RDS, 1 master and 3 read replicas plus 1 more replica dedicated for thegh-ost
migration.We are using the default value for' in the log file, but grepping only yields
CutOverLockTimeoutSeconds
(which is3
seconds), but I don't think the timeout being too low is the issue. I did some digging and looks likegh-ost
doesn't receive any bin_log events after issuing the cutover. Based on the snippet below I should see a line similar to 'Intercepted changelog stateIntercepted changelog state GhostTableMigrated
. Disclaimer: My go-fu is so-so, I may have misread some parts of the code.Tried to run the same migration twice (currently retrying for the 3rd time) and the result is exactly the same. We already have run several successful migrations before on the same table and others.