github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.25k stars 1.25k forks source link

Lock wait timeout exceeded during cutover #1364

Open nickelodean opened 8 months ago

nickelodean commented 8 months ago

Azure Single Server. MySQL verison 8.0.24 Ghost Version - gh-ost-1.1.4-1.x86_64.rpm

ghost command:

gh-ost \ --assume-rbr \ --allow-on-master \ --azure \ --max-load=Threads_running=10 \ --critical-load=Threads_running=50 \ --critical-load-interval-millis=10000 \ --critical-load-hibernate-seconds=100 \ --chunk-size=1000 \ --user="$user@$master" \ --password="$pass" \ --ssl \ --assume-master-host=$master \ --database="$database" \ --table="test_summary" \ --verbose \ --host=$master \ --alter="drop index index_secure_account_id_created,add index index_secure_account_id(secure_account_id)" \ --cut-over=default \ --default-retries=120 \ --panic-flag-file=/tmp/ghost.panic.flag \ --initially-drop-ghost-table \ --initially-drop-old-table \ --exact-rowcount \ --cut-over-lock-timeout-seconds=10 \ -- Also tried max 31536000 --hooks-path=/root/ghost_hooks/ \ --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \ --execute

error log

Copy: 1559201/1559201 100.0%; Applied: 9; Backlog: 0/1000; Time: 14m22s(total), 5m36s(copy); streamer: mysql-bin.000039:96604335; Lag: 0.06s, HeartbeatLag: 0.10s, State: migrating; ETA: due 2024-01-06 19:59:38 INFO Setting RENAME timeout as 10 seconds 2024-01-06 19:59:38 INFO Session renaming tables is 713699 2024-01-06 19:59:38 INFO Issuing and expecting this to block: rename / gh-ost / table testdb.test_summary to testdb._test_summary_del, testdb._test_summary_gho to testdb.test_summary 2024-01-06 19:59:38 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to) 2024-01-06 19:59:38 INFO Checking session lock: gh-ost.713695.lock 2024-01-06 19:59:38 INFO Connection holding lock on original table still exists 2024-01-06 19:59:38 INFO Will now proceed to drop magic table and unlock tables 2024-01-06 19:59:38 INFO Dropping magic cut-over table Copy: 1559201/1559201 100.0%; Applied: 9; Backlog: 0/1000; Time: 14m25s(total), 5m36s(copy); streamer: mysql-bin.000039:96622242; Lag: 0.07s, HeartbeatLag: 0.08s, State: migrating; ETA: due Copy: 1559201/1559201 100.0%; Applied: 9; Backlog: 0/1000; Time: 14m30s(total), 5m36s(copy); streamer: mysql-bin.000039:96645621; Lag: 0.07s, HeartbeatLag: 0.08s, State: migrating; ETA: due 2024-01-06 19:59:48 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction Copy: 1559201/1559201 100.0%; Applied: 9; Backlog: 0/1000; Time: 14m35s(total), 5m36s(copy); streamer: mysql-bin.000039:96668998; Lag: 0.07s, HeartbeatLag: 0.08s, State: migrating; ETA: due Copy: 1559201/1559201 100.0%; Applied: 9; Backlog: 0/1000; Time: 14m40s(total), 5m36s(copy); streamer: mysql-bin.000039:96692371; Lag: 0.07s, HeartbeatLag: 0.08s, State: migrating; ETA: due 2024-01-06 19:59:58 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction 2024-01-06 19:59:58 INFO Releasing lock from testdb.test_summary, testdb._test_summary_del 2024-01-06 19:59:58 INFO Tables unlocked 2024-01-06 19:59:58 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction 2024-01-06 19:59:59 INFO executing gh-ost-on-begin-postponed hook: /root/ghost_hooks/gh-ost-on-begin-postponed-hook Mail Delivery Status Report will be mailed to .

Copy: 1559201/1559201 100.0%; Applied: 9; Backlog: 0/1000; Time: 14m45s(total), 5m36s(copy); streamer: mysql-bin.000039:96715743; Lag: 0.07s, HeartbeatLag: 0.08s, State: postponing cut-over; ETA: due

| innodb_lock_wait_timeout | 50 | | innodb_old_blocks_pct | 37 | | innodb_old_blocks_time | 1000 | | innodb_print_all_deadlocks | OFF | | innodb_status_output_locks | OFF | | innodb_table_locks | ON | | key_cache_block_size | 1024 | | lock_wait_timeout | 31536000 |

unable to cutover, see no locks in db from innodb.trx . This seems like a bug.Can you please suggest next steps.

dontstopbelieveing commented 6 months ago

Run with max-lag-millis less than 3 seconds when running on master See https://github.com/github/gh-ost/issues/1387 for my experience with this parameter when using --allow-on-master

nickelodean commented 5 months ago

I have used max-lag-millis , tried it with 1sec , 2sec , still get same error. I removed the postpone flag, still get same error.