ClusterLabs / resource-agents

Combined repository of OCF agents from the RHCS and Linux-HA projects
GNU General Public License v2.0
493 stars 579 forks source link

mysql: variable master_host empty on slave reboot #1841

Open Systhom opened 1 year ago

Systhom commented 1 year ago

Hello,

I'm having trouble with a MariaDB cluster (2 nodes, master-slave) on Debian 11. I don't know what to do anymore.

Environment:

Node1: OS: Debian 11 Kernel: 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) Versions: resource-agents (4.7.0-1), pacemaker (2.0.5-2), corosync (3.1.2-2), mariadb (10.5.18-0+deb11u1)

Node2: OS: Debian 11 Kernel: 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) Versions: resource-agents (4.7.0-1), pacemaker (2.0.5-2), corosync (3.1.2-2), mariadb (10.5.18-0+deb11u1)

crm configure show as attachment: crm_configure_show.txt

Problem:

When I restart Node2 (which is a slave), it goes up correctly in the cluster:

$ crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: Node1 (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Thu Jan 26 12:04:57 2023
  * Last change:  Thu Jan 26 11:39:58 2023 by root via cibadmin on Node2
  * 2 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ Node1 Node2 ]

Full List of Resources:
  * VIP (ocf::heartbeat:IPaddr2):        Started Node1
  * Clone Set: MYSQLREPLICATOR [MYSQL] (promotable):
    * Masters: [ Node1 ]
    * Slaves: [ Node2 ]

But it does not retrieve the replication information. (SHOW SLAVE STATUS; returns nothing) In the Node2 logs, I can see this message that explains that replication is not taking place:

Jan 25 16:29:38  mysql(MYSQL)[22862]:    INFO: No MySQL master present - clearing replication state
Jan 25 16:29:39  mysql(MYSQL)[22862]:    WARNING: MySQL Slave IO threads currently not running.
Jan 25 16:29:39  mysql(MYSQL)[22862]:    ERROR: MySQL Slave SQL threads currently not running.
Jan 25 16:29:39  mysql(MYSQL)[22862]:    ERROR: See  for details
Jan 25 16:29:39  mysql(MYSQL)[22862]:    ERROR: ERROR 1200 (HY000) at line 1: Misconfigured slave: MASTER_HOST was not set; Fix in config file or with CHANGE MASTER TO

From what I see, when activating the trace mode, the variable master_host is empty:

+ [ -n  ]
+ [ 0 -eq 0 ]
+ [ 1 -a ! -z  ]
+ return 0
+ echo
+ tr -d
+ master_host=
+ [  -a  != Node2 ]
+ ocf_log info No MySQL master present - clearing replication state

Because the environment variables are also empty and especially "OCF_RESKEY_CRM_meta_notify_master_uname":

OCF_RESKEY_CRM_meta_notify=true
OCF_RESKEY_CRM_meta_notify_active_resource=
OCF_RESKEY_CRM_meta_notify_active_uname=
OCF_RESKEY_CRM_meta_notify_all_uname=Node1 Node2
OCF_RESKEY_CRM_meta_notify_available_uname=Node2 Node1
OCF_RESKEY_CRM_meta_notify_demote_resource=
OCF_RESKEY_CRM_meta_notify_demote_uname=
OCF_RESKEY_CRM_meta_notify_inactive_resource=MYSQL:0 MYSQL:1
OCF_RESKEY_CRM_meta_notify_master_resource=
OCF_RESKEY_CRM_meta_notify_master_uname=
OCF_RESKEY_CRM_meta_notify_promote_resource=
OCF_RESKEY_CRM_meta_notify_promote_uname=
OCF_RESKEY_CRM_meta_notify_slave_resource=
OCF_RESKEY_CRM_meta_notify_slave_uname=
OCF_RESKEY_CRM_meta_notify_start_resource=MYSQL:0
OCF_RESKEY_CRM_meta_notify_start_uname=Node2
OCF_RESKEY_CRM_meta_notify_stop_resource=
OCF_RESKEY_CRM_meta_notify_stop_uname=

As it is a production environment, I performed a bare metal restore of these machines on 2 test machines and I have no problem… In production, there is a lot of writing but the servers are far from being saturated.

Pacemaker log before and after reboot on slave: log_pacemaker.txt

Thank you in advance for all the help you can give me.

Best regards