github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.43k stars 1.26k forks source link

Replication is broken: Slave_IO_Running: No, Slave_SQL_Running: No #983

Open artem-dronov opened 3 years ago

artem-dronov commented 3 years ago

Good day! gh-ost cannot find the replication master, previously we promoted the slave to the master, that is, if we SHOW SLAVE STATUS on the master, it will show Slave_IO_Running: No, Slave_SQL_Running: No, what should we do in this case?

./gh-ost --host=master_host --user=test_user --password=????? \ --database=test_db --table=test --assume-rbr --allow-on-master \ --verbose --alter="add test SMALLINT NULL" --chunk-size=3000 --assume-rbr \ --max-load=Threads_connected=20 > gh.log &

2021-06-02 09:08:53 INFO starting gh-ost 1.1.1 2021-06-02 09:08:53 INFO Migrating test_db.test 2021-06-02 09:08:53 INFO inspector connection validated on master_host:3306 2021-06-02 09:08:53 INFO User has REPLICATION CLIENT, REPLICATION SLAVE privileges, and has ALL privileges on test_db.* 2021-06-02 09:08:53 INFO binary logs validated on master_host:3306 2021-06-02 09:08:53 INFO Inspector initiated on slave_host:3306, version 5.7.32-log 2021-06-02 09:08:53 INFO Table found. Engine=InnoDB 2021-06-02 09:08:53 INFO Estimated number of rows via EXPLAIN: 75277111 2021-06-02 09:08:53 INFO Recursively searching for replication master 2021-06-02 09:08:53 INFO Tearing down inspector 2021-06-02 09:08:53 FATAL Replication on master_host:3306 is broken: Slave_IO_Running: No, Slave_SQL_Running: No. Please make sure replication runs before using gh-ost.

shlomi-noach commented 3 years ago

You should run RESET SLAVE ALL on your current primary. This is assuming there's nothing in the relay logs that you need. that's the normal thing to do after promoting a new primary -- but that's just my suggestion, make sure you understand what you are doing.

artem-dronov commented 3 years ago

You should run RESET SLAVE ALL on your current primary. This is assuming there's nothing in the relay logs that you need. that's the normal thing to do after promoting a new primary -- but that's just my suggestion, make sure you understand what you are doing.

Does this action not affect the health of the primary node?

shlomi-noach commented 3 years ago

right now your primary node thinks it's a replica. But if it's a primary, it probably shouldn't think so, it's risky. Because it may try to reconnect to whatever master_host is listed right now.

Define "affect the health"? RESET SLAVE ALL does not affect production traffic and is a non-blocking operation.

artem-dronov commented 3 years ago

I'm sorry, I haven't done this before. I'm more worried about whether the execution of this command on the master node will affect the synchronization of other slave nodes connected to the master.

shlomi-noach commented 3 years ago

Running this command on the primary node (master) does not affect the synchronization of the replicas connected to the primary node.

Obviously do not run it on the replicas.