github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.11k stars 1.23k forks source link

Clarification on `aurora_enable_repl_bin_log_filtering` in Aurora #1424

Open reillski opened 1 week ago

reillski commented 1 week ago

Hi!

In the RDS documentation it recommends disabling aurora_enable_repl_bin_log_filtering when using GH-OST in Aurora MySQL. Looking through the history I can see a few references to this potentially causing issues with cutovers, so I was looking to see if there was a reproducible case which can demonstrate this?

aurora_enable_repl_bin_log_filtering Background

In Aurora, the writer instance in a cluster sends redo log records to the aurora storage volume and reader nodes in the cluster. Since Aurora MySQL stores binary logs in the cluster volume, redo log records are also be generated for binary logs under the hood. This is all transparent to users, and the binary logs are presented like they would be in community MySQL. When aurora_enable_repl_bin_log_filtering is enabled, redo logs for binary log records will still be sent to Aurora storage, but they will not be sent to aurora readers within the same cluster. Without filtering, these redo logs will be sent to reader instances and discarded since binary logs are not accessible from readers. This can lead to unnecessary amplification of network traffic on the writer/reader so the recommendation is to leave aurora_enable_repl_bin_log_filtering enabled. Binlog filtering is always enabled in Aurora MySQL version 3, so the parameter was removed in the Aurora 3 major version. This setting should not be confused with MySQL binlog replication filtering.

Since binary logs are not accessible on readers, and aurora_enable_repl_bin_log_filtering only affects internal transportation of redo, I highly suspect this may not be the read cause of the cutover issues seen.

Theory

What I suspect is the real cause is the aurora_binlog_replication_max_yield_seconds parameter.

aurora_binlog_replication_max_yield_seconds Background

aurora_binlog_replication_max_yield_seconds was introduced along with aurora_binlog_read_buffer_size to improve read performance of binary log consumer threads. The idea here was, to increase the IO read size( aurora_binlog_read_buffer_size) for each binlog read req made by a consumer thread to improve IO efficiency/throughput of binlog consumer threads. Instead of reading binary logs from aurora storage in 8K chunks, you could configure the read buffer size of each binlog read request. The drawback was that if you were reading in larger chunks, it could lead to more contention with foreground transactions. To allow users configure this tradeoff aurora_binlog_replication_max_yield_seconds was introduced. Instead of constantly contending with foreground transactions on the active binary log you could configure binary log consumers to "backoff" or "yield" for a number of seconds defined by aurora_binlog_replication_max_yield_seconds.

Note: In Aurora version 2.10 the binlog IO cache was introduced(more information here) which removed the need for the above yield second configurations. The binary log I/O cache aims to minimize read I/O from the Aurora storage layer by keeping the most recent binary log change events in its circular cache on your writer DB instance. This enhancement is enabled by default, so should not require any additional configurations. from docs:

You don't need to adjust any configuration parameters to turn on this optimization. In particular, if you adjust the configuration parameter aurora_binlog_replication_max_yield_seconds to a nonzero value in earlier Aurora MySQL versions, set it back to zero for Aurora MySQL 2.10 and higher.

What I suspect here is that if aurora_binlog_replication_max_yield_seconds is configured to a non-zero value, the heartbeat in GH-OST cutovers will never reach zero for long enough to allow the cutover succeed, as it will read an event, sleep("yield"), repeat.

Test

I done some basic testing on my side to validate this and it seems to be the case, but would love to hear from others. With yield seconds set to 60, my heartbeatLag does not decrease:

mysql> select @@aurora_binlog_replication_max_yield_seconds,@@aurora_enable_repl_bin_log_filtering,@@aurora_version;
+-----------------------------------------------+----------------------------------------+------------------+
| @@aurora_binlog_replication_max_yield_seconds | @@aurora_enable_repl_bin_log_filtering | @@aurora_version |
+-----------------------------------------------+----------------------------------------+------------------+
|                                            60 |                                      1 | 2.11.2           |
+-----------------------------------------------+----------------------------------------+------------------+
1 row in set (0.00 sec)

$ gh-ost --max-load=Threads_running=25 --critical-load=Threads_running=1000 --chunk-size=1000 --max-lag-millis=1500 --user="user" --password="password" --host=<HOST> --allow-on-master --database="sysbench" --table="sbtest1" --verbose --alter="engine=innodb" --switch-to-rbr --allow-master-master --cut-over=default --exact-rowcount --concurrent-rowcount --default-retries=120 --panic-flag-file=/tmp/ghost.panic.flag --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag --execute

[2024/06/18 22:04:39] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000070, 4)
Copy: 247740/247740 100.0%; Applied: 1596183; Backlog: 1000/1000; Time: 43m0s(total), 9s(copy); streamer: mysql-bin-changelog.000070:4755374; Lag: 0.00s, HeartbeatLag: 62.91s, State: postponing cut-over; ETA: due
Copy: 247740/247740 100.0%; Applied: 1638082; Backlog: 0/1000; Time: 43m30s(total), 9s(copy); streamer: mysql-bin-changelog.000070:134208037; Lag: 0.00s, HeartbeatLag: 30.61s, State: postponing cut-over; ETA: due
Copy: 247740/247740 100.0%; Applied: 1638082; Backlog: 0/1000; Time: 44m0s(total), 9s(copy); streamer: mysql-bin-changelog.000070:134208037; Lag: 0.00s, HeartbeatLag: 60.61s, State: postponing cut-over; ETA: due
[2024/06/18 22:05:45] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000071, 4)
[2024/06/18 22:05:45] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000071, 4)
2024-06-18 22:05:45 INFO rotate to next log from mysql-bin-changelog.000071:134221467 to mysql-bin-changelog.000071
2024-06-18 22:05:45 INFO rotate to next log from mysql-bin-changelog.000071:0 to mysql-bin-changelog.000071
Copy: 247740/247740 100.0%; Applied: 1676342; Backlog: 1000/1000; Time: 44m30s(total), 9s(copy); streamer: mysql-bin-changelog.000071:124450137; Lag: 0.00s, HeartbeatLag: 31.31s, State: postponing cut-over; ETA: due
Copy: 247740/247740 100.0%; Applied: 1680895; Backlog: 0/1000; Time: 45m0s(total), 9s(copy); streamer: mysql-bin-changelog.000071:134204651; Lag: 0.00s, HeartbeatLag: 56.01s, State: postponing cut-over; ETA: due
[2024/06/18 22:06:48] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000072, 4)
2024-06-18 22:06:48 INFO rotate to next log from mysql-bin-changelog.000072:134218538 to mysql-bin-changelog.000072
2024-06-18 22:06:48 INFO rotate to next log from mysql-bin-changelog.000072:0 to mysql-bin-changelog.000072
[2024/06/18 22:06:48] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000072, 4)
Copy: 247740/247740 100.0%; Applied: 1713305; Backlog: 1000/1000; Time: 45m30s(total), 9s(copy); streamer: mysql-bin-changelog.000072:109020755; Lag: 0.00s, HeartbeatLag: 35.41s, State: postponing cut-over; ETA: due
Copy: 247740/247740 100.0%; Applied: 1723592; Backlog: 0/1000; Time: 46m0s(total), 9s(copy); streamer: mysql-bin-changelog.000072:134205251; Lag: 0.00s, HeartbeatLag: 51.61s, State: postponing cut-over; ETA: due
[2024/06/18 22:07:53] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000073, 4)
[2024/06/18 22:07:53] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000073, 4)
2024-06-18 22:07:53 INFO rotate to next log from mysql-bin-changelog.000073:134220372 to mysql-bin-changelog.000073
2024-06-18 22:07:53 INFO rotate to next log from mysql-bin-changelog.000073:0 to mysql-bin-changelog.000073
Copy: 247740/247740 100.0%; Applied: 1748532; Backlog: 1000/1000; Time: 46m30s(total), 9s(copy); streamer: mysql-bin-changelog.000073:89642975; Lag: 0.00s, HeartbeatLag: 41.61s, State: postponing cut-over; ETA: due

Shortly after I set aurora_binlog_replication_max_yield_seconds back to zero in my parameter group, the cutover succeeded:

mysql> select @@aurora_binlog_replication_max_yield_seconds,@@aurora_enable_repl_bin_log_filtering,@@aurora_version;
+-----------------------------------------------+----------------------------------------+------------------+
| @@aurora_binlog_replication_max_yield_seconds | @@aurora_enable_repl_bin_log_filtering | @@aurora_version |
+-----------------------------------------------+----------------------------------------+------------------+
|                                             0 |                                      1 | 2.11.2           |
+-----------------------------------------------+----------------------------------------+------------------+
1 row in set (0.00 sec)

2024-06-18 22:11:57 INFO Done waiting for events up to lock; duration=683.348399ms
# Migrating `sysbench`.`sbtest1`; Ghost table is `sysbench`.`_sbtest1_gho`
# Migrating ip-172-17-5-175:3306; inspecting ip-172-17-5-175:3306; executing on i-09fce0c3dab1a3eb2.ec2.internal
# Migration started at Tue Jun 18 21:21:39 +0000 2024
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.sysbench.sbtest1.sock
Copy: 247740/247740 100.0%; Applied: 1924559; Backlog: 0/1000; Time: 50m17s(total), 9s(copy); streamer: mysql-bin-changelog.000077:117908084; Lag: 0.00s, HeartbeatLag: 0.10s, State: migrating; ETA: due
2024-06-18 22:11:57 INFO Setting RENAME timeout as 3 seconds
2024-06-18 22:11:57 INFO Session renaming tables is 301
2024-06-18 22:11:57 INFO Issuing and expecting this to block: rename /* gh-ost */ table `sysbench`.`sbtest1` to `sysbench`.`_sbtest1_del`, `sysbench`.`_sbtest1_gho` to `sysbench`.`sbtest1`
2024-06-18 22:11:58 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2024-06-18 22:11:58 INFO Checking session lock: gh-ost.312.lock
2024-06-18 22:11:58 INFO Connection holding lock on original table still exists
2024-06-18 22:11:58 INFO Will now proceed to drop magic table and unlock tables
2024-06-18 22:11:58 INFO Dropping magic cut-over table
2024-06-18 22:11:58 INFO Releasing lock from `sysbench`.`sbtest1`, `sysbench`.`_sbtest1_del`
2024-06-18 22:11:58 INFO Tables unlocked
2024-06-18 22:11:58 INFO Tables renamed
2024-06-18 22:11:58 INFO Lock & rename duration: 1.721119655s. During this time, queries on `sbtest1` were blocked
[2024/06/18 22:11:58] [info] binlogsyncer.go:164 syncer is closing...
[2024/06/18 22:11:58] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
2024-06-18 22:11:58 INFO Closed streamer connection. err=<nil>
[2024/06/18 22:11:58] [info] binlogsyncer.go:179 syncer is closed
2024-06-18 22:11:58 INFO Dropping table `sysbench`.`_sbtest1_ghc`
2024-06-18 22:11:58 INFO Table dropped
2024-06-18 22:11:58 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2024-06-18 22:11:58 INFO -- drop table `sysbench`.`_sbtest1_del`
2024-06-18 22:11:58 INFO Done migrating `sysbench`.`sbtest1`
2024-06-18 22:11:58 INFO Removing socket file: /tmp/gh-ost.sysbench.sbtest1.sock
2024-06-18 22:11:58 INFO Tearing down inspector
2024-06-18 22:11:58 INFO Tearing down applier
2024-06-18 22:11:58 INFO Tearing down streamer
2024-06-18 22:11:58 INFO Tearing down throttler
# Done

Would love to hear what you think.

Thanks, Marc

rbanks54 commented 1 week ago

You're referencing an Aurora v2 instance. Bump yourself up to v3 and try again and you'll likely run into the problems others are.

select @@aurora_binlog_replication_max_yield_seconds,@@aurora_enable_repl_bin_log_filtering,@@aurora_version;
+-----------------------------------------------+----------------------------------------+------------------+
| @@aurora_binlog_replication_max_yield_seconds | @@aurora_enable_repl_bin_log_filtering | @@aurora_version |
+-----------------------------------------------+----------------------------------------+------------------+
|                                             0 |                                      1 | 3.06.0           |
+-----------------------------------------------+----------------------------------------+------------------+

For me, gh-ost fails on Aurora v3

reillski commented 1 week ago

Thanks @rbanks54

I just tested on 3.06 and 3.07 and the cutover was successful. Using default parameters apart from binlog_format=row. Only way I can reliably repro is using aurora_binlog_replication_max_yield_seconds on 2.x. Tried using gh-ost versions: gh-ost-1.1.6-1 and gh-ost-1.1.5-1

3.06

mysql> select @@aurora_binlog_replication_max_yield_seconds,@@aurora_enable_repl_bin_log_filtering,@@aurora_version;
+-----------------------------------------------+----------------------------------------+------------------+
| @@aurora_binlog_replication_max_yield_seconds | @@aurora_enable_repl_bin_log_filtering | @@aurora_version |
+-----------------------------------------------+----------------------------------------+------------------+
|                                             0 |                                      1 | 3.06.0           |
+-----------------------------------------------+----------------------------------------+------------------+
1 row in set (0.00 sec)

$ gh-ost --max-load=Threads_running=25 --critical-load=Threads_running=1000 --chunk-size=1000 --max-lag-millis=1500 --user="user" --password="password" --host=<host>.us-east-1.rds.amazonaws.com --allow-on-master --database="sysbench" --table="sbtest1" --verbose --alter="engine=innodb" --switch-to-rbr --allow-master-master --cut-over=default --exact-rowcount --concurrent-rowcount --default-retries=120 --panic-flag-file=/tmp/ghost.panic.flag --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag --initially-drop-old-table --execute
2024-06-19 05:03:20 INFO starting gh-ost 1.1.5
2024-06-19 05:03:20 INFO Migrating `sysbench`.`sbtest1`
2024-06-19 05:03:20 INFO inspector connection validated on <host>.us-east-1.rds.amazonaws.com:3306
2024-06-19 05:03:20 INFO User has REPLICATION CLIENT, REPLICATION SLAVE privileges, and has ALL privileges on `sysbench`.*
2024-06-19 05:03:20 INFO binary logs validated on <host>.us-east-1.rds.amazonaws.com:3306
2024-06-19 05:03:20 INFO Restarting replication on <host>.us-east-1.rds.amazonaws.com:3306 to make sure binlog settings apply to replication thread
2024-06-19 05:03:20 INFO Inspector initiated on ip-172-17-5-173:3306, version 8.0.34
2024-06-19 05:03:20 INFO Table found. Engine=InnoDB
2024-06-19 05:03:20 INFO Estimated number of rows via EXPLAIN: 247316
2024-06-19 05:03:20 INFO Recursively searching for replication master
2024-06-19 05:03:20 INFO Master found to be ip-172-17-5-173:3306
2024-06-19 05:03:20 INFO log_slave_updates validated on <host>.us-east-1.rds.amazonaws.com:3306
2024-06-19 05:03:20 INFO streamer connection validated on <host>.us-east-1.rds.amazonaws.com:3306
2024-06-19 05:03:20 INFO Connecting binlog streamer at mysql-bin-changelog.000005:97727481
[2024/06/19 05:03:20] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql <host>.us-east-1.rds.amazonaws.com 3306 user    false false <nil> false UTC true 0 0s 0s 0 false}
[2024/06/19 05:03:20] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin-changelog.000005, 97727481)
[2024/06/19 05:03:20] [info] binlogsyncer.go:203 register slave for master server <host>.us-east-1.rds.amazonaws.com:3306
2024-06-19 05:03:20 INFO applier connection validated on <host>.us-east-1.rds.amazonaws.com:3306
2024-06-19 05:03:20 INFO applier connection validated on <host>.us-east-1.rds.amazonaws.com:3306
2024-06-19 05:03:20 INFO will use time_zone='SYSTEM' on applier
2024-06-19 05:03:20 INFO Examining table structure on applier
2024-06-19 05:03:20 INFO Applier initiated on ip-172-17-5-173:3306, version 8.0.34
2024-06-19 05:03:20 INFO Dropping table `sysbench`.`_sbtest1_del`
[2024/06/19 05:03:20] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000005, 97727481)
2024-06-19 05:03:20 INFO rotate to next log from mysql-bin-changelog.000005:0 to mysql-bin-changelog.000005
2024-06-19 05:03:20 INFO Table dropped
2024-06-19 05:03:20 INFO Dropping table `sysbench`.`_sbtest1_ghc`
2024-06-19 05:03:20 INFO Table dropped
2024-06-19 05:03:20 INFO Creating changelog table `sysbench`.`_sbtest1_ghc`
2024-06-19 05:03:20 INFO Changelog table created
2024-06-19 05:03:20 INFO Creating ghost table `sysbench`.`_sbtest1_gho`
2024-06-19 05:03:20 INFO Ghost table created
2024-06-19 05:03:20 INFO Altering ghost table `sysbench`.`_sbtest1_gho`
2024-06-19 05:03:20 INFO Ghost table altered
2024-06-19 05:03:20 INFO Altering ghost table AUTO_INCREMENT value `sysbench`.`_sbtest1_gho`
2024-06-19 05:03:20 INFO Ghost table AUTO_INCREMENT altered
2024-06-19 05:03:20 INFO Created postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
2024-06-19 05:03:20 INFO Waiting for ghost table to be migrated. Current lag is 0s
2024-06-19 05:03:20 INFO Intercepted changelog state GhostTableMigrated
2024-06-19 05:03:20 INFO Handled changelog state GhostTableMigrated
2024-06-19 05:03:20 INFO Chosen shared unique key is PRIMARY
2024-06-19 05:03:20 INFO Shared columns are id,k,c,pad
2024-06-19 05:03:20 INFO Listening on unix socket file: /tmp/gh-ost.sysbench.sbtest1.sock
2024-06-19 05:03:20 INFO As instructed, counting rows in the background; meanwhile I will use an estimated count, and will update it later on
2024-06-19 05:03:20 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2024-06-19 05:03:20 INFO Migration min values: [1]
2024-06-19 05:03:20 INFO Migration max values: [250000]
2024-06-19 05:03:20 INFO Waiting for first throttle metrics to be collected
2024-06-19 05:03:20 INFO Exact number of rows via COUNT: 249998
2024-06-19 05:03:20 INFO First throttle metrics collected
# Migrating `sysbench`.`sbtest1`; Ghost table is `sysbench`.`_sbtest1_gho`
# Migrating ip-172-17-5-173:3306; inspecting ip-172-17-5-173:3306; executing on i-09fce0c3dab1a3eb2.ec2.internal
# Migration started at Wed Jun 19 05:03:20 +0000 2024
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag [set]
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.sysbench.sbtest1.sock
Copy: 0/249998 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin-changelog.000005:97921758; Lag: 0.02s, HeartbeatLag: 0.02s, State: migrating; ETA: N/A
2024-06-19 05:03:20 INFO Intercepted changelog state ReadMigrationRangeValues
2024-06-19 05:03:20 INFO Handled changelog state ReadMigrationRangeValues
Copy: 33990/250092 13.6%; Applied: 417; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin-changelog.000005:105687523; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: 6s
Copy: 66972/250177 26.8%; Applied: 826; Backlog: 4/1000; Time: 2s(total), 2s(copy); streamer: mysql-bin-changelog.000005:113281935; Lag: 0.01s, HeartbeatLag: 0.12s, State: migrating; ETA: 5s
Copy: 98944/250228 39.5%; Applied: 1208; Backlog: 5/1000; Time: 3s(total), 3s(copy); streamer: mysql-bin-changelog.000005:120652666; Lag: 0.01s, HeartbeatLag: 0.12s, State: migrating; ETA: 4s
Copy: 129909/250285 51.9%; Applied: 1629; Backlog: 13/1000; Time: 4s(total), 4s(copy); streamer: mysql-bin-changelog.000005:127882787; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: 3s
[2024/06/19 05:03:25] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000006, 4)
[2024/06/19 05:03:25] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000006, 4)
2024-06-19 05:03:25 INFO rotate to next log from mysql-bin-changelog.000006:134290721 to mysql-bin-changelog.000006
2024-06-19 05:03:25 INFO rotate to next log from mysql-bin-changelog.000006:0 to mysql-bin-changelog.000006
Copy: 157851/250334 63.1%; Applied: 2030; Backlog: 3/1000; Time: 5s(total), 5s(copy); streamer: mysql-bin-changelog.000006:179404; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: 2s
Copy: 185780/250375 74.2%; Applied: 2469; Backlog: 8/1000; Time: 6s(total), 6s(copy); streamer: mysql-bin-changelog.000006:6933020; Lag: 0.01s, HeartbeatLag: 0.12s, State: migrating; ETA: 2s
Copy: 213698/250403 85.3%; Applied: 2910; Backlog: 1/1000; Time: 7s(total), 7s(copy); streamer: mysql-bin-changelog.000006:13614396; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: 1s
Copy: 243611/250410 97.3%; Applied: 3297; Backlog: 7/1000; Time: 8s(total), 8s(copy); streamer: mysql-bin-changelog.000006:20681642; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: due
2024-06-19 05:03:28 INFO Row copy complete
Copy: 249584/249584 100.0%; Applied: 3392; Backlog: 0/1000; Time: 8s(total), 8s(copy); streamer: mysql-bin-changelog.000006:22142335; Lag: 0.01s, HeartbeatLag: 0.04s, State: migrating; ETA: due
Copy: 249584/249584 100.0%; Applied: 3392; Backlog: 387/1000; Time: 9s(total), 8s(copy); streamer: mysql-bin-changelog.000006:23264031; Lag: 0.01s, HeartbeatLag: 0.12s, State: postponing cut-over; ETA: due
Copy: 249584/249584 100.0%; Applied: 4078; Backlog: 224/1000; Time: 10s(total), 8s(copy); streamer: mysql-bin-changelog.000006:25137638; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249584/249584 100.0%; Applied: 4558; Backlog: 214/1000; Time: 11s(total), 8s(copy); streamer: mysql-bin-changelog.000006:26805088; Lag: 0.01s, HeartbeatLag: 0.12s, State: postponing cut-over; ETA: due
Copy: 249584/249584 100.0%; Applied: 4885; Backlog: 389/1000; Time: 12s(total), 8s(copy); streamer: mysql-bin-changelog.000006:28456583; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249584/249584 100.0%; Applied: 5592; Backlog: 216/1000; Time: 13s(total), 8s(copy); streamer: mysql-bin-changelog.000006:30251766; Lag: 0.01s, HeartbeatLag: 0.12s, State: postponing cut-over; ETA: due
Copy: 249584/249584 100.0%; Applied: 6272; Backlog: 79/1000; Time: 14s(total), 8s(copy); streamer: mysql-bin-changelog.000006:32113736; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249584/249584 100.0%; Applied: 6391; Backlog: 550/1000; Time: 15s(total), 8s(copy); streamer: mysql-bin-changelog.000006:33782524; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
2024-06-19 05:03:35 INFO Grabbing voluntary lock: gh-ost.89.lock
2024-06-19 05:03:35 INFO Setting LOCK timeout as 6 seconds
2024-06-19 05:03:35 INFO Looking for magic cut-over table
2024-06-19 05:03:35 INFO Creating magic cut-over table `sysbench`.`_sbtest1_del`
2024-06-19 05:03:35 INFO Magic cut-over table created
2024-06-19 05:03:35 INFO Locking `sysbench`.`sbtest1`, `sysbench`.`_sbtest1_del`
2024-06-19 05:03:35 INFO Tables locked
2024-06-19 05:03:35 INFO Session locking original & magic tables is 89
2024-06-19 05:03:35 INFO Writing changelog state: AllEventsUpToLockProcessed:1718773415465582443
2024-06-19 05:03:35 INFO Intercepted changelog state AllEventsUpToLockProcessed
2024-06-19 05:03:35 INFO Handled changelog state AllEventsUpToLockProcessed
2024-06-19 05:03:35 INFO Waiting for events up to lock
2024-06-19 05:03:35 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1718773415465582443
2024-06-19 05:03:35 INFO Done waiting for events up to lock; duration=175.935047ms
# Migrating `sysbench`.`sbtest1`; Ghost table is `sysbench`.`_sbtest1_gho`
# Migrating ip-172-17-5-173:3306; inspecting ip-172-17-5-173:3306; executing on i-09fce0c3dab1a3eb2.ec2.internal
# Migration started at Wed Jun 19 05:03:20 +0000 2024
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.sysbench.sbtest1.sock
Copy: 249584/249584 100.0%; Applied: 7082; Backlog: 0/1000; Time: 15s(total), 8s(copy); streamer: mysql-bin-changelog.000006:34501096; Lag: 0.01s, HeartbeatLag: 0.04s, State: migrating; ETA: due
2024-06-19 05:03:35 INFO Setting RENAME timeout as 3 seconds
2024-06-19 05:03:35 INFO Session renaming tables is 83
2024-06-19 05:03:35 INFO Issuing and expecting this to block: rename /* gh-ost */ table `sysbench`.`sbtest1` to `sysbench`.`_sbtest1_del`, `sysbench`.`_sbtest1_gho` to `sysbench`.`sbtest1`
2024-06-19 05:03:35 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2024-06-19 05:03:35 INFO Checking session lock: gh-ost.89.lock
2024-06-19 05:03:35 INFO Connection holding lock on original table still exists
2024-06-19 05:03:35 INFO Will now proceed to drop magic table and unlock tables
2024-06-19 05:03:35 INFO Dropping magic cut-over table
2024-06-19 05:03:35 INFO Releasing lock from `sysbench`.`sbtest1`, `sysbench`.`_sbtest1_del`
2024-06-19 05:03:35 INFO Tables unlocked
2024-06-19 05:03:35 INFO Tables renamed
2024-06-19 05:03:35 INFO Lock & rename duration: 206.238705ms. During this time, queries on `sbtest1` were blocked
[2024/06/19 05:03:35] [info] binlogsyncer.go:164 syncer is closing...
[2024/06/19 05:03:35] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
2024-06-19 05:03:35 INFO Closed streamer connection. err=<nil>
[2024/06/19 05:03:35] [info] binlogsyncer.go:179 syncer is closed
2024-06-19 05:03:35 INFO Dropping table `sysbench`.`_sbtest1_ghc`
2024-06-19 05:03:35 INFO Table dropped
2024-06-19 05:03:35 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2024-06-19 05:03:35 INFO -- drop table `sysbench`.`_sbtest1_del`
2024-06-19 05:03:35 INFO Done migrating `sysbench`.`sbtest1`
2024-06-19 05:03:35 INFO Removing socket file: /tmp/gh-ost.sysbench.sbtest1.sock
2024-06-19 05:03:35 INFO Tearing down inspector
2024-06-19 05:03:35 INFO Tearing down applier
2024-06-19 05:03:35 INFO Tearing down streamer
2024-06-19 05:03:35 INFO Tearing down throttler
# Done

3.07

mysql> select @@aurora_binlog_replication_max_yield_seconds,@@aurora_enable_repl_bin_log_filtering,@@aurora_version;
+-----------------------------------------------+----------------------------------------+------------------+
| @@aurora_binlog_replication_max_yield_seconds | @@aurora_enable_repl_bin_log_filtering | @@aurora_version |
+-----------------------------------------------+----------------------------------------+------------------+
|                                             0 |                                      1 | 3.07.0           |
+-----------------------------------------------+----------------------------------------+------------------+
1 row in set (0.00 sec)

[ec2-user@i-09fce0c3dab1a3eb2 ~]$ gh-ost --max-load=Threads_running=25 --critical-load=Threads_running=1000 --chunk-size=1000 --max-lag-millis=1500 --user="user" --password="password" --host=<host>us-east-1.rds.amazonaws.com --allow-on-master --database="sysbench" --table="sbtest1" --verbose --alter="engine=innodb" --switch-to-rbr --allow-master-master --cut-over=default --exact-rowcount --concurrent-rowcount --default-retries=120 --panic-flag-file=/tmp/ghost.panic.flag --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag --initially-drop-old-table --execute
2024-06-19 05:12:28 INFO starting gh-ost 1.1.5
2024-06-19 05:12:28 INFO Migrating `sysbench`.`sbtest1`
2024-06-19 05:12:28 INFO inspector connection validated on <host>us-east-1.rds.amazonaws.com:3306
2024-06-19 05:12:28 INFO User has REPLICATION CLIENT, REPLICATION SLAVE privileges, and has ALL privileges on `sysbench`.*
2024-06-19 05:12:28 INFO binary logs validated on <host>us-east-1.rds.amazonaws.com:3306
2024-06-19 05:12:28 INFO Restarting replication on <host>us-east-1.rds.amazonaws.com:3306 to make sure binlog settings apply to replication thread
2024-06-19 05:12:28 INFO Inspector initiated on ip-172-17-5-142:3306, version 8.0.36
2024-06-19 05:12:28 INFO Table found. Engine=InnoDB
2024-06-19 05:12:28 INFO Estimated number of rows via EXPLAIN: 246674
2024-06-19 05:12:28 INFO Recursively searching for replication master
2024-06-19 05:12:28 INFO Master found to be ip-172-17-5-142:3306
2024-06-19 05:12:28 INFO log_slave_updates validated on <host>us-east-1.rds.amazonaws.com:3306
2024-06-19 05:12:28 INFO streamer connection validated on <host>us-east-1.rds.amazonaws.com:3306
2024-06-19 05:12:28 INFO Connecting binlog streamer at mysql-bin-changelog.000006:9395302
[2024/06/19 05:12:28] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql <host>us-east-1.rds.amazonaws.com 3306 user    false false <nil> false UTC true 0 0s 0s 0 false}
[2024/06/19 05:12:28] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin-changelog.000006, 9395302)
[2024/06/19 05:12:28] [info] binlogsyncer.go:203 register slave for master server <host>us-east-1.rds.amazonaws.com:3306
2024-06-19 05:12:28 INFO applier connection validated on <host>us-east-1.rds.amazonaws.com:3306
2024-06-19 05:12:28 INFO applier connection validated on <host>us-east-1.rds.amazonaws.com:3306
2024-06-19 05:12:28 INFO will use time_zone='SYSTEM' on applier
2024-06-19 05:12:28 INFO Examining table structure on applier
2024-06-19 05:12:28 INFO Applier initiated on ip-172-17-5-142:3306, version 8.0.36
2024-06-19 05:12:28 INFO Dropping table `sysbench`.`_sbtest1_del`
2024-06-19 05:12:28 INFO Table dropped
2024-06-19 05:12:28 INFO Dropping table `sysbench`.`_sbtest1_ghc`
2024-06-19 05:12:28 INFO Table dropped
2024-06-19 05:12:28 INFO Creating changelog table `sysbench`.`_sbtest1_ghc`
[2024/06/19 05:12:28] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.000006, 9395302)
2024-06-19 05:12:28 INFO rotate to next log from mysql-bin-changelog.000006:0 to mysql-bin-changelog.000006
2024-06-19 05:12:28 INFO Changelog table created
2024-06-19 05:12:28 INFO Creating ghost table `sysbench`.`_sbtest1_gho`
2024-06-19 05:12:28 INFO Ghost table created
2024-06-19 05:12:28 INFO Altering ghost table `sysbench`.`_sbtest1_gho`
2024-06-19 05:12:28 INFO Ghost table altered
2024-06-19 05:12:28 INFO Altering ghost table AUTO_INCREMENT value `sysbench`.`_sbtest1_gho`
2024-06-19 05:12:28 INFO Ghost table AUTO_INCREMENT altered
2024-06-19 05:12:28 INFO Created postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
2024-06-19 05:12:28 INFO Intercepted changelog state GhostTableMigrated
2024-06-19 05:12:28 INFO Waiting for ghost table to be migrated. Current lag is 0s
2024-06-19 05:12:28 INFO Handled changelog state GhostTableMigrated
2024-06-19 05:12:28 INFO Chosen shared unique key is PRIMARY
2024-06-19 05:12:28 INFO Shared columns are id,k,c,pad
2024-06-19 05:12:28 INFO Listening on unix socket file: /tmp/gh-ost.sysbench.sbtest1.sock
2024-06-19 05:12:28 INFO As instructed, counting rows in the background; meanwhile I will use an estimated count, and will update it later on
2024-06-19 05:12:28 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2024-06-19 05:12:28 INFO Migration min values: [1]
2024-06-19 05:12:28 INFO Migration max values: [250000]
2024-06-19 05:12:28 INFO Waiting for first throttle metrics to be collected
2024-06-19 05:12:28 INFO Exact number of rows via COUNT: 249991
2024-06-19 05:12:28 INFO First throttle metrics collected
# Migrating `sysbench`.`sbtest1`; Ghost table is `sysbench`.`_sbtest1_gho`
# Migrating ip-172-17-5-142:3306; inspecting ip-172-17-5-142:3306; executing on i-09fce0c3dab1a3eb2.ec2.internal
# Migration started at Wed Jun 19 05:12:28 +0000 2024
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag [set]
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.sysbench.sbtest1.sock
Copy: 0/249991 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin-changelog.000006:9570832; Lag: 0.02s, HeartbeatLag: 0.02s, State: migrating; ETA: N/A
2024-06-19 05:12:28 INFO Intercepted changelog state ReadMigrationRangeValues
2024-06-19 05:12:28 INFO Handled changelog state ReadMigrationRangeValues
Copy: 29993/250100 12.0%; Applied: 442; Backlog: 4/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin-changelog.000006:16349500; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: 7s
Copy: 59975/250178 24.0%; Applied: 832; Backlog: 0/1000; Time: 2s(total), 2s(copy); streamer: mysql-bin-changelog.000006:23464017; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: 6s
Copy: 88946/250229 35.5%; Applied: 1199; Backlog: 14/1000; Time: 3s(total), 3s(copy); streamer: mysql-bin-changelog.000006:30258476; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: 5s
Copy: 115910/250285 46.3%; Applied: 1590; Backlog: 13/1000; Time: 4s(total), 4s(copy); streamer: mysql-bin-changelog.000006:36607789; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: 4s
Copy: 142868/250339 57.1%; Applied: 2016; Backlog: 8/1000; Time: 5s(total), 5s(copy); streamer: mysql-bin-changelog.000006:43082810; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: 3s
Copy: 169805/250371 67.8%; Applied: 2365; Backlog: 4/1000; Time: 6s(total), 6s(copy); streamer: mysql-bin-changelog.000006:49467553; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: 2s
Copy: 196750/250394 78.6%; Applied: 2755; Backlog: 9/1000; Time: 7s(total), 7s(copy); streamer: mysql-bin-changelog.000006:55926488; Lag: 0.01s, HeartbeatLag: 0.12s, State: migrating; ETA: 1s
Copy: 222669/250411 88.9%; Applied: 3139; Backlog: 0/1000; Time: 8s(total), 8s(copy); streamer: mysql-bin-changelog.000006:62177925; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: due
Copy: 247582/250414 98.9%; Applied: 3582; Backlog: 2/1000; Time: 9s(total), 9s(copy); streamer: mysql-bin-changelog.000006:68357393; Lag: 0.01s, HeartbeatLag: 0.02s, State: migrating; ETA: due
2024-06-19 05:12:37 INFO Row copy complete
Copy: 249569/249569 100.0%; Applied: 3612; Backlog: 0/1000; Time: 9s(total), 9s(copy); streamer: mysql-bin-changelog.000006:68840292; Lag: 0.01s, HeartbeatLag: 0.10s, State: migrating; ETA: due
Copy: 249569/249569 100.0%; Applied: 3615; Backlog: 441/1000; Time: 10s(total), 9s(copy); streamer: mysql-bin-changelog.000006:70195592; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 4333; Backlog: 290/1000; Time: 11s(total), 9s(copy); streamer: mysql-bin-changelog.000006:72053715; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 5053; Backlog: 100/1000; Time: 12s(total), 9s(copy); streamer: mysql-bin-changelog.000006:73833081; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 5210; Backlog: 529/1000; Time: 13s(total), 9s(copy); streamer: mysql-bin-changelog.000006:75499945; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 6058; Backlog: 209/1000; Time: 14s(total), 9s(copy); streamer: mysql-bin-changelog.000006:77389881; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 6578; Backlog: 206/1000; Time: 15s(total), 9s(copy); streamer: mysql-bin-changelog.000006:79224057; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 6900; Backlog: 460/1000; Time: 16s(total), 9s(copy); streamer: mysql-bin-changelog.000006:80953330; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 7765; Backlog: 132/1000; Time: 17s(total), 9s(copy); streamer: mysql-bin-changelog.000006:82906479; Lag: 0.02s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 8125; Backlog: 347/1000; Time: 18s(total), 9s(copy); streamer: mysql-bin-changelog.000006:84637821; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 8665; Backlog: 373/1000; Time: 19s(total), 9s(copy); streamer: mysql-bin-changelog.000006:86474673; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 9355; Backlog: 168/1000; Time: 20s(total), 9s(copy); streamer: mysql-bin-changelog.000006:88128534; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 9609; Backlog: 495/1000; Time: 21s(total), 9s(copy); streamer: mysql-bin-changelog.000006:89868954; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 10419; Backlog: 162/1000; Time: 22s(total), 9s(copy); streamer: mysql-bin-changelog.000006:91730606; Lag: 0.02s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 10869; Backlog: 270/1000; Time: 23s(total), 9s(copy); streamer: mysql-bin-changelog.000006:93477136; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 11270; Backlog: 374/1000; Time: 24s(total), 9s(copy); streamer: mysql-bin-changelog.000006:95108250; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 12093; Backlog: 106/1000; Time: 25s(total), 9s(copy); streamer: mysql-bin-changelog.000006:97023664; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 12363; Backlog: 414/1000; Time: 26s(total), 9s(copy); streamer: mysql-bin-changelog.000006:98721735; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 12986; Backlog: 309/1000; Time: 27s(total), 9s(copy); streamer: mysql-bin-changelog.000006:100533872; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 13745; Backlog: 62/1000; Time: 28s(total), 9s(copy); streamer: mysql-bin-changelog.000006:102297914; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 13905; Backlog: 442/1000; Time: 29s(total), 9s(copy); streamer: mysql-bin-changelog.000006:103944827; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 14605; Backlog: 263/1000; Time: 30s(total), 9s(copy); streamer: mysql-bin-changelog.000006:105781890; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 15438; Backlog: 11/1000; Time: 31s(total), 9s(copy); streamer: mysql-bin-changelog.000006:107730161; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 15458; Backlog: 482/1000; Time: 32s(total), 9s(copy); streamer: mysql-bin-changelog.000006:109295632; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
Copy: 249569/249569 100.0%; Applied: 16132; Backlog: 339/1000; Time: 33s(total), 9s(copy); streamer: mysql-bin-changelog.000006:111135451; Lag: 0.01s, HeartbeatLag: 0.02s, State: postponing cut-over; ETA: due
2024-06-19 05:13:01 INFO Grabbing voluntary lock: gh-ost.46.lock
2024-06-19 05:13:01 INFO Setting LOCK timeout as 6 seconds
2024-06-19 05:13:01 INFO Looking for magic cut-over table
2024-06-19 05:13:01 INFO Creating magic cut-over table `sysbench`.`_sbtest1_del`
2024-06-19 05:13:01 INFO Magic cut-over table created
2024-06-19 05:13:01 INFO Locking `sysbench`.`sbtest1`, `sysbench`.`_sbtest1_del`
2024-06-19 05:13:01 INFO Tables locked
2024-06-19 05:13:01 INFO Session locking original & magic tables is 46
2024-06-19 05:13:01 INFO Writing changelog state: AllEventsUpToLockProcessed:1718773981687017615
2024-06-19 05:13:01 INFO Intercepted changelog state AllEventsUpToLockProcessed
2024-06-19 05:13:01 INFO Handled changelog state AllEventsUpToLockProcessed
2024-06-19 05:13:01 INFO Waiting for events up to lock
2024-06-19 05:13:02 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1718773981687017615
2024-06-19 05:13:02 INFO Done waiting for events up to lock; duration=515.236968ms
# Migrating `sysbench`.`sbtest1`; Ghost table is `sysbench`.`_sbtest1_gho`
# Migrating ip-172-17-5-142:3306; inspecting ip-172-17-5-142:3306; executing on i-09fce0c3dab1a3eb2.ec2.internal
# Migration started at Wed Jun 19 05:12:28 +0000 2024
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.sysbench.sbtest1.sock
Copy: 249569/249569 100.0%; Applied: 16525; Backlog: 0/1000; Time: 33s(total), 9s(copy); streamer: mysql-bin-changelog.000006:111508579; Lag: 0.01s, HeartbeatLag: 0.04s, State: migrating; ETA: due
2024-06-19 05:13:02 INFO Setting RENAME timeout as 3 seconds
2024-06-19 05:13:02 INFO Session renaming tables is 47
2024-06-19 05:13:02 INFO Issuing and expecting this to block: rename /* gh-ost */ table `sysbench`.`sbtest1` to `sysbench`.`_sbtest1_del`, `sysbench`.`_sbtest1_gho` to `sysbench`.`sbtest1`
2024-06-19 05:13:02 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2024-06-19 05:13:02 INFO Checking session lock: gh-ost.46.lock
2024-06-19 05:13:02 INFO Connection holding lock on original table still exists
2024-06-19 05:13:02 INFO Will now proceed to drop magic table and unlock tables
2024-06-19 05:13:02 INFO Dropping magic cut-over table
2024-06-19 05:13:02 INFO Releasing lock from `sysbench`.`sbtest1`, `sysbench`.`_sbtest1_del`
2024-06-19 05:13:02 INFO Tables unlocked
2024-06-19 05:13:02 INFO Tables renamed
2024-06-19 05:13:02 INFO Lock & rename duration: 548.080798ms. During this time, queries on `sbtest1` were blocked
[2024/06/19 05:13:02] [info] binlogsyncer.go:164 syncer is closing...
[2024/06/19 05:13:02] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
2024-06-19 05:13:02 INFO Closed streamer connection. err=<nil>
2024-06-19 05:13:02 INFO Dropping table `sysbench`.`_sbtest1_ghc`
[2024/06/19 05:13:02] [info] binlogsyncer.go:179 syncer is closed
2024-06-19 05:13:02 INFO Table dropped
2024-06-19 05:13:02 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2024-06-19 05:13:02 INFO -- drop table `sysbench`.`_sbtest1_del`
2024-06-19 05:13:02 INFO Done migrating `sysbench`.`sbtest1`
2024-06-19 05:13:02 INFO Removing socket file: /tmp/gh-ost.sysbench.sbtest1.sock
2024-06-19 05:13:02 INFO Tearing down inspector
2024-06-19 05:13:02 INFO Tearing down applier
2024-06-19 05:13:02 INFO Tearing down streamer
2024-06-19 05:13:02 INFO Tearing down throttler
# Done

Test: gh-ost command:

gh-ost --max-load=Threads_running=25 --critical-load=Threads_running=1000 --chunk-size=1000 --max-lag-millis=1500 --user="user" --password="password" --host=<host>.us-east-1.rds.amazonaws.com --allow-on-master --database="sysbench" --table="sbtest1" --verbose --alter="engine=innodb" --switch-to-rbr --allow-master-master --cut-over=default --exact-rowcount --concurrent-rowcount --default-retries=120 --panic-flag-file=/tmp/ghost.panic.flag --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag --initially-drop-old-table --execute

sysbench command running in parallel:

sysbench ./sysbench/src/lua/oltp_write_only.lua --db-driver=mysql --mysql-user=user --mysql-password=password --mysql-db=sysbench --mysql-host=<host>-east-1.rds.amazonaws.com --mysql-port=3306 --threads=10 --rand-type=uniform --db-ps-mode=disable --tables=5 --forced-shutdown --table-size=250000 --report-interval=1 --time=0 --skip-trx run
rbanks54 commented 1 week ago

Interesting! We'd tried it on slightly earlier aurora version and had problems. We bumped our minor version just the other day and hadn't tried since, but an attempt just now works.

We need to do a little more investigation as we may have made a mistake in our config that we didn't notice when making our earlier attempts

rbanks54 commented 1 week ago

Whelp... turns out we had a #facepalm level configuration issue. Thanks for posting this issue as it caused us to go back and triple check things (and find our obvious-in-hindsight problem)

reillski commented 6 days ago

Thanks for double checking this! Based on testing this seems to confirm

Since the default for aurora_binlog_replication_max_yield_seconds is zero in latest versions, I would recommend updating gh-ost docs to: