github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.4k stars 1.26k forks source link

Error running "gh-ost" on Aurora get errors #1159

Open Yacohen opened 2 years ago

Yacohen commented 2 years ago

I been running "gh-ost" against Aurora , and at the end of the sessions , I allways get errors (not encounter on regular MySql Servers, even do all the process appear to be successful

Running "gh-ost" using this command: cut-over=atomic

gh-ost --host=yacov-aurora-db.cluster-ci59a3jianey.us-east-2.rds.amazonaws.com \ --user=telegram \ --execute \ --password=Admin180 \ --database=telegram \ --table=social_monitor_m3u8_small \ --alter="DROP COLUMN x" \ --chunk-size=30000 \ --max-load=Threads_running=20 \ --allow-on-master \ --switch-to-rbr \ --cut-over=atomic \ --cut-over-lock-timeout-seconds=5 \ --initially-drop-ghost-table \ --initially-drop-old-table \ --initially-drop-socket-file \ --exact-rowcount \ --concurrent-rowcount \ --panic-flag-file=/tmp/ghost.panic.flag ֿ

I get those error at the end of the process:

Copy: 196000/199999 98.0%; Applied: 0; Backlog: 0/1000; Time: 2m32s(total), 2m18s(copy); streamer: mysql-bin-changelog.000068:24229615; Lag: 0.75s, HeartbeatLag: 0.80s, State: migrating; ETA: 2s Copy: 198000/199999 99.0%; Applied: 0; Backlog: 0/1000; Time: 2m33s(total), 2m19s(copy); streamer: mysql-bin-changelog.000068:24990974; Lag: 0.50s, HeartbeatLag: 0.51s, State: migrating; ETA: 1s Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m34s(total), 2m20s(copy); streamer: mysql-bin-changelog.000068:25743114; Lag: 0.29s, HeartbeatLag: 0.29s, State: migrating; ETA: due Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m35s(total), 2m20s(copy); streamer: mysql-bin-changelog.000068:25744498; Lag: 0.37s, HeartbeatLag: 0.42s, State: migrating; ETA: due

Migrating telegram.social_monitor_m3u8_small; Ghost table is telegram._social_monitor_m3u8_small_gho

Migrating ip-10-20-2-127:3306; inspecting ip-10-20-2-127:3306; executing on orarac-lab-01

Migration started at Wed Aug 03 11:49:27 +0300 2022

chunk-size: 2000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=20; critical-load: ; nice-ratio: 0.000000

throttle-additional-flag-file: /tmp/gh-ost.throttle

panic-flag-file: /tmp/ghost.panic.flag

Serving on unix socket: /tmp/gh-ost.telegram.social_monitor_m3u8_small.sock

Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m37s(total), 2m20s(copy); streamer: mysql-bin-changelog.000068:25748079; Lag: 1.46s, HeartbeatLag: 1.47s, State: migrating; ETA: due 2022-08-03 11:52:12 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m45s(total), 2m20s(copy); streamer: mysql-bin-changelog.000068:25749842; Lag: 6.97s, HeartbeatLag: 6.98s, State: migrating; ETA: due Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m40s(total), 2m20s(copy); streamer: mysql-bin-changelog.000068:25749842; Lag: 1.96s, HeartbeatLag: 1.98s, State: migrating; ETA: due 2022-08-03 11:52:14 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m50s(total), 2m20s(copy); streamer: mysql-bin-changelog.000068:25754144; Lag: 11.21s, HeartbeatLag: 11.21s, State: throttled, lag=10.007112s; ETA: due

Migrating telegram.social_monitor_m3u8_small; Ghost table is telegram._social_monitor_m3u8_small_gho

Migrating ip-10-20-2-127:3306; inspecting ip-10-20-2-127:3306; executing on orarac-lab-01

Migration started at Wed Aug 03 11:49:27 +0300 2022

chunk-size: 2000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=20; critical-load: ; nice-ratio: 0.000000

throttle-additional-flag-file: /tmp/gh-ost.throttle

panic-flag-file: /tmp/ghost.panic.flag

Serving on unix socket: /tmp/gh-ost.telegram.social_monitor_m3u8_small.sock

Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m54s(total), 2m20s(copy); streamer: mysql-bin-changelog.000068:25762253; Lag: 1.56s, HeartbeatLag: 1.66s, State: migrating; ETA: due Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 2m55s(total), 2m20s(copy); streamer: mysql-bin-changelog.000068:25764016; Lag: 0.35s, HeartbeatLag: 0.43s, State: migrating; ETA: due [2022/08/03 11:52:27] [info] binlogsyncer.go:164 syncer is closing... [2022/08/03 11:52:27] [error] binlogsyncer.go:631 connection was bad [2022/08/03 11:52:27] [error] binlogstreamer.go:77 close sync with err: Sync was closed [2022/08/03 11:52:27] [info] binlogsyncer.go:179 syncer is closed Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 3m0s(total), 2m20s(copy); streamer: mysql-bin-changelog.000068:25767175; Lag: 1.35s, HeartbeatLag: 1.42s, State: migrating; ETA: due 2022-08-03 11:52:29 ERROR sql: database is closed 2022-08-03 11:52:29 ERROR sql: database is closed

Done

2022-08-03 11:52:29 ERROR sql: database is closed

Running "gh-ost" using this command: cut-over=two-step

gh-ost --host=yacov-aurora-db.cluster-ci59a3jianey.us-east-2.rds.amazonaws.com \ --user=telegram \ --execute \ --password=Admin180 \ --database=telegram \ --table=social_monitor_m3u8_small \ --alter="ADD COLUMN x INT(12) NOT NULL DEFAULT 0" \ --chunk-size=30000 \ --max-load=Threads_running=20 \ --allow-on-master \ --switch-to-rbr \ --cut-over=two-step \ --cut-over-lock-timeout-seconds=5 \ --initially-drop-ghost-table \ --initially-drop-old-table \ --initially-drop-socket-file \ --exact-rowcount \ --concurrent-rowcount \ --panic-flag-file=/tmp/ghost.panic.flag ֿ

Return less error but steel:

Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m0s(total), 40s(copy); streamer: mysql-bin-changelog.000068:67576070; Lag: 0.24s, HeartbeatLag: 30.33s, State: migrating; ETA: due Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m5s(total), 40s(copy); streamer: mysql-bin-changelog.000068:73800252; Lag: 0.30s, HeartbeatLag: 31.83s, State: migrating; ETA: due Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m10s(total), 40s(copy); streamer: mysql-bin-changelog.000068:85012903; Lag: 0.22s, HeartbeatLag: 31.23s, State: migrating; ETA: due

Migrating telegram.social_monitor_m3u8_small; Ghost table is telegram._social_monitor_m3u8_small_gho

Migrating ip-10-20-2-127:3306; inspecting ip-10-20-2-127:3306; executing on orarac-lab-01

Migration started at Wed Aug 03 11:59:36 +0300 2022

chunk-size: 30000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=20; critical-load: ; nice-ratio: 0.000000

throttle-additional-flag-file: /tmp/gh-ost.throttle

panic-flag-file: /tmp/ghost.panic.flag

Serving on unix socket: /tmp/gh-ost.telegram.social_monitor_m3u8_small.sock

Copy: 199999/199999 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m14s(total), 40s(copy); streamer: mysql-bin-changelog.000068:101173009; Lag: 0.30s, HeartbeatLag: 0.32s, State: migrating; ETA: due [2022/08/03 12:00:52] [info] binlogsyncer.go:164 syncer is closing... [2022/08/03 12:00:52] [error] binlogstreamer.go:77 close sync with err: sync is been closing... [2022/08/03 12:00:52] [info] binlogsyncer.go:179 syncer is closed

Done

Aurora Configuration:

mysql> show variables like 'aurora_enable_repl_bin_log_filtering'; +--------------------------------------+-------+ | Variable_name | Value | +--------------------------------------+-------+ | aurora_enable_repl_bin_log_filtering | OFF |
+--------------------------------------+-------+ 1 row in set (0.14 sec)

mysql> show variables like 'binlog_format'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | binlog_format | ROW | +---------------+-------+ 1 row in set (0.14 sec)

mysql> show variables like 'aurora_parallel_query'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | aurora_parallel_query | ON | +-----------------------+-------+ 1 row in set (0.13 sec)

mysql> show variables like 'binlog_row_image'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | binlog_row_image | FULL | +------------------+-------+ 1 row in set (0.14 sec)

Errors where reproducible on two different environments runs on Aurora each time.

"gh-ost" run on

Linux host 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST 2019 x86_64 x86_64 x86_64 GNU/Linux

"gh-ost" version

[root@orarac-lab-01 ~]# gh-ost --version 1.1.5

Aurora Version:

mysql> select AURORA_VERSION(); select @@aurora_version; show variables like '%version'; +------------------+ | AURORA_VERSION() | +------------------+ | 2.10.2 | +------------------+ 1 row in set (0.15 sec)

+------------------+ | @@aurora_version | +------------------+ | 2.10.2 | +------------------+ 1 row in set (0.15 sec)

+------------------+-----------------------+ | Variable_name | Value | +------------------+-----------------------+ | aurora_version | 2.10.2 | | innodb_version | 5.7.12 | | protocol_version | 10 | | tls_version | TLSv1,TLSv1.1,TLSv1.2 | | version | 5.7.12-log | +------------------+-----------------------+ 5 rows in set (0.14 sec)

spihiker commented 1 year ago

I have the same error.My mysql version:8.0.32.

Copy: 1300000/1300000 100.0%; Applied: 0; Backlog: 0/1000; Time: 1m0s(total), 53s(copy); streamer: binlog.000007:81508511; Lag: 0.23s, HeartbeatLag: 0.11s, State: migrating; ETA: due 2023-05-04 16:06:05 INFO Tables renamed 2023-05-04 16:06:05 INFO Lock & rename duration: 4.449586975s. During this time, queries on user were blocked [2023/05/04 16:06:05] [info] binlogsyncer.go:164 syncer is closing... [2023/05/04 16:06:05] [error] binlogsyncer.go:631 connection was bad 2023-05-04 16:06:05 INFO Closed streamer connection. err= [2023/05/04 16:06:05] [error] binlogstreamer.go:77 close sync with err: Sync was closed [2023/05/04 16:06:05] [info] binlogsyncer.go:179 syncer is closed 2023-05-04 16:06:05 INFO Dropping table mytest._user_ghc 2023-05-04 16:06:06 INFO Table dropped