github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.3k stars 1.25k forks source link

gh-ost hang on INFO Waiting for ghost table to be migrated. Current lag is 0s in MySQL 8.0.21 #884

Open barucho opened 4 years ago

barucho commented 4 years ago

gh-ost run is not passing the stage

INFO Waiting for ghost table to be migrated. Current lag is 0s
| @@version |
+-----------+
| 8.0.21    |
2020-09-13 19:18:55 INFO starting gh-ost 1.0.49
2020-09-13 19:18:55 INFO Migrating `test`.`T1`
2020-09-13 19:18:55 INFO connection validated on pdb-mysql-pii-sdl-playground1b.42.net:3306
2020-09-13 19:18:55 INFO User has REPLICATION CLIENT, REPLICATION SLAVE privileges, and has ALL privileges on `test`.*
2020-09-13 19:18:55 INFO binary logs validated on pdb-mysql-pii-sdl-playground1b.42.net:3306
2020-09-13 19:18:55 INFO Inspector initiated on pdb-mysql-pii-sdl-playground1b.42.net:3306, version 8.0.21
2020-09-13 19:18:55 INFO Table found. Engine=InnoDB
2020-09-13 19:18:55 DEBUG Estimated number of rows via STATUS: 80
2020-09-13 19:18:55 DEBUG Validated no foreign keys exist on table
2020-09-13 19:18:55 DEBUG Validated no triggers exist on table
2020-09-13 19:18:55 INFO Estimated number of rows via EXPLAIN: 80
2020-09-13 19:18:55 DEBUG Potential unique keys in T1: [PRIMARY (auto_increment): [ID]; has nullable: false]
2020-09-13 19:18:55 INFO Master forced to be pdb-mysql-pii-sdl-playground0a.42.net:3306
2020-09-13 19:18:55 INFO log_slave_updates validated on pdb-mysql-pii-sdl-playground1b.42.net:3306
2020-09-13 19:18:55 INFO connection validated on pdb-mysql-pii-sdl-playground1b.42.net:3306
2020-09-13 19:18:55 DEBUG Streamer binlog coordinates: mysql-bin.000003:1539502
2020-09-13 19:18:55 INFO Connecting binlog streamer at mysql-bin.000003:1539502
[2020/09/13 19:18:55] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql pdb-mysql-pii-sdl-playground1b.42.net 3306 dbschema    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/09/13 19:18:55] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin.000003, 1539502)
[2020/09/13 19:18:55] [info] binlogsyncer.go:203 register slave for master server pdb-mysql-pii-sdl-playground1b.42.net:3306
2020-09-13 19:18:55 DEBUG Beginning streaming
2020-09-13 19:18:55 INFO rotate to next log from mysql-bin.000003:0 to mysql-bin.000003
[2020/09/13 19:18:55] [info] binlogsyncer.go:723 rotate to (mysql-bin.000003, 1539502)
2020-09-13 19:18:55 INFO connection validated on pdb-mysql-pii-sdl-playground0a.42.net:3306
2020-09-13 19:18:55 INFO connection validated on pdb-mysql-pii-sdl-playground0a.42.net:3306
2020-09-13 19:18:55 INFO will use time_zone='SYSTEM' on applier
2020-09-13 19:18:55 INFO Examining table structure on applier
2020-09-13 19:18:55 INFO Applier initiated on pdb-mysql-pii-sdl-playground0a.42.net:3306, version 8.0.21
2020-09-13 19:18:55 INFO Dropping table `test`.`_T1_gho`
2020-09-13 19:18:55 INFO Table dropped
2020-09-13 19:18:55 INFO Dropping table `test`.`_T1_del`
2020-09-13 19:18:55 INFO Table dropped
2020-09-13 19:18:55 INFO Dropping table `test`.`_T1_ghc`
2020-09-13 19:18:55 INFO Table dropped
2020-09-13 19:18:55 INFO Creating changelog table `test`.`_T1_ghc`
2020-09-13 19:18:55 INFO Changelog table created
2020-09-13 19:18:55 INFO Creating ghost table `test`.`_T1_gho`
2020-09-13 19:18:55 INFO Ghost table created
2020-09-13 19:18:55 INFO Altering ghost table `test`.`_T1_gho`
2020-09-13 19:18:55 DEBUG ALTER statement: alter /* gh-ost */ table `test`.`_T1_gho` add e varchar(2),add index e_idx (e)
2020-09-13 19:18:55 INFO Ghost table altered
2020-09-13 19:18:55 INFO Waiting for ghost table to be migrated. Current lag is 0s
barucho commented 4 years ago
+----------------+
| Tables_in_test |
+----------------+
| T1             |
| T2             |
| _T1_ghc        |
| _T1_gho        |
+----------------+
4 rows in set (0.00 sec)
select * from _T1_ghc;
+-----+---------------------+------------------------------+--------------------------------+
| id  | last_update         | hint                         | value                          |
+-----+---------------------+------------------------------+--------------------------------+
|   1 | 2020-09-13 19:29:10 | heartbeat                    | 2020-09-13T19:29:10.945231577Z |
|   2 | 2020-09-13 19:18:55 | state                        | GhostTableMigrated             |
| 256 | 2020-09-13 19:18:55 | state at 1600024735341670734 | GhostTableMigrated             |
+-----+---------------------+------------------------------+--------------------------------+
3 rows in set (0.00 sec)
*************************** 1. row ***************************
       Table: _T1_gho
Create Table: CREATE TABLE `_T1_gho` (
  `ID` int NOT NULL AUTO_INCREMENT,
  `Mail` varchar(50) NOT NULL,
  `e` varchar(2) DEFAULT NULL,
  PRIMARY KEY (`ID`),
  KEY `e_idx` (`e`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
esorek commented 3 years ago

any update on this issue?

shlomi-noach commented 3 years ago

Sorry, not tracking this repo as before. I don't suppose you have any replication filters? master-master can be tricky for various reasons. I'm curious, as a very quick shortcut, what happens when you try to migrate directly on one of the masters?

gh-ost --allow-on-master -host=pdb-mysql-pii-sdl-playground0a.42.net ...

(remove -allow-master-master, remove -assume-master-host)

lioraltarescu commented 3 years ago

Hey @shlomi-noach , Thanks for the response . we tried what you suggested . Im attaching the error below

FATAL There seems to be a master-master setup at server:3306. This is unsupported. Bailing out

shlomi-noach commented 3 years ago

Thanks. Now, if you re-add --allow-master-master?

shmuelmwix commented 3 years ago

Hi, we tried re-adding --allow-master-master in our gh-ost command, and we're back to square one - getting Waiting for ghost table to be migrated. Current lag is 0s again.

We also tried running it from the Master, a co-Master and a slave in the cluster (3 different attempts, each with different setups for each attempt).

This is the command we ran:

/usr/bin/gh-ost --max-load='Threads_running=500' --critical-load='Threads_running=1000' --initially-drop-ghost-table --initially-drop-old-table --chunk-size=600 --max-lag-millis=3000 --throttle-control-replicas=mysql-test1 --user="ghost_user" --password=****** --host=mysql-test0 --database="test" --table="table1" --debug --verbose --alter="DROP COLUMN prices" --assume-rbr --allow-master-master --allow-on-master --cut-over=default --concurrent-rowcount --default-retries=120 --panic-flag-file=/tmp/ghost.panic.flag --exact-rowcount --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag

Thanks!

lioraltarescu commented 3 years ago

Another thing i wanted to add @shlomi-noach is that the gh-ost process is stuck on this line "INFO Waiting for ghost table to be migrated. Current lag is 0s" After digging a bit in your code it looks like it doesn't receive a ghostTableMigrated binlog event which prevent it from moving forward the and it might be related to the mysql character set we configured which is utf8mb4 .

lioraltarescu commented 3 years ago

Hello again , I finally found the issue and managed to resolve it! In our mysql 8 clusters we enabled this parameter binlog_transaction_compression which had an impact on the gh-ost process itself .Once I disabled it ( it's disabled by default ) It works both from a slave and from the master. I was testing it on a dev cluster which didn't have a lot of traffic so i had to restart both the writable master and the host on which the gh-ost was running on ( after disabling the param in my.cnf) in order to get the gh-ost to start working .

shlomi-noach commented 3 years ago

Good finding! I'm actually unfamiliar with this flag, and I see it was introduced in 8.0.20. gh-ost should probably check that this flag is disabled.

m00dawg commented 1 month ago

Resurrecting an old but still open bug, it seems like things like Maxwell can read the compressed binlogs ok - is there an avenue to enable this with gh-ost? I recall gh-ost might be in need of a maintainer as @shlomi-noach has moved onto other things which makes me think this might not be fixed anytime soon though thought I might ask anyway in hopes that maybe it's something as simple as updating a client connector or..?