ewwhite / zfs-ha

ZFS High-Availability NAS
749 stars 76 forks source link

pacemaker errors #24

Open Tualua opened 5 years ago

Tualua commented 5 years ago

Hi, trying to reproduce this setup in my homelab. Reached this part: pcs stonith create fence-vol1

After creating stonith resource its state is stopped

[root@stor-node1 ~]# pcs status
Cluster name: stor-cluster
Stack: corosync
Current DC: stor-node2 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Thu Mar 21 16:03:58 2019
Last change: Thu Mar 21 15:59:18 2019 by root via cibadmin on stor-node2

2 nodes configured
1 resource configured

Online: [ stor-node1 stor-node2 ]

Full list of resources:

 fence-vol1     (stonith:fence_scsi):   Stopped

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

And I got following error in pacemaker log:

  [root@stor-node1 ~]# journalctl -u pacemaker
  -- Logs begin at Thu 2019-03-21 14:02:46 CST, end at Thu 2019-03-21 16:01:01 CST. --
  Mar 21 15:57:42 stor-node1.homelab.mvdnet.org systemd[1]: Started Pacemaker High       Availability Cluster Manager.
  Mar 21 15:57:42 stor-node1.homelab.mvdnet.org pacemakerd[27805]:   notice: Additional       logging available in /var/log/pacemaker.log
  Mar 21 15:57:42 stor-node1.homelab.mvdnet.org pacemakerd[27805]:   notice: Switching to /var/log/cluster/corosync.log
  Mar 21 15:57:42 stor-node1.homelab.mvdnet.org pacemakerd[27805]:   notice: Additional logging available in /var/log/cluster/corosync.log
  Mar 21 15:57:42 stor-node1.homelab.mvdnet.org pacemakerd[27805]:   notice: Starting Pacemaker 1.1.19-8.el7_6.4
  Mar 21 15:57:42 stor-node1.homelab.mvdnet.org pacemakerd[27805]:   notice: Quorum acquired
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org pacemakerd[27805]:   notice: Node stor-node1 state is now member
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org pacemakerd[27805]:   notice: Node stor-node2 state is now member
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org pengine[27812]:   notice: Additional logging available in /var/log/cluster/corosync.log
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org cib[27808]:   notice: Additional logging available in /var/log/cluster/corosync.log
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org cib[27808]:   notice: /var/lib/pacemaker/cib/cib.xml not found: No such file or directory
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org cib[27808]:  warning: Could not verify cluster configuration file /var/lib/pacemaker/cib/cib.xml: No such file or directory (2)
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org cib[27808]:  warning: Primary configuration corrupt or unusable, trying backups in /var/lib/pacemaker/cib
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org cib[27808]:  warning: Continuing with an empty configuration.
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org stonith-ng[27809]:   notice: Additional logging available in /var/log/cluster/corosync.log
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org stonith-ng[27809]:   notice: Connecting to cluster infrastructure: corosync
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org crmd[27813]:   notice: Additional logging available in /var/log/cluster/corosync.log
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org lrmd[27810]:   notice: Additional logging available in /var/log/cluster/corosync.log
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org attrd[27811]:   notice: Additional logging available in /var/log/cluster/corosync.log
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org attrd[27811]:   notice: Connecting to cluster infrastructure: corosync
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org attrd[27811]:   notice: Node stor-node1 state is now member
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org stonith-ng[27809]:   notice: Node stor-node1 state is now member
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org cib[27808]:   notice: Connecting to cluster infrastructure: corosync
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org cib[27808]:   notice: Node stor-node1 state is now member
  Mar 21 15:57:43 stor-node1.homelab.mvdnet.org cib[27808]:   notice: Node stor-node2 state is now member
  Mar 21 15:57:44 stor-node1.homelab.mvdnet.org crmd[27813]:   notice: Connecting to cluster infrastructure: corosync
  Mar 21 15:57:44 stor-node1.homelab.mvdnet.org crmd[27813]:   notice: Quorum acquired
  Mar 21 15:57:44 stor-node1.homelab.mvdnet.org stonith-ng[27809]:   notice: Node stor-node2 state is now member
  Mar 21 15:57:44 stor-node1.homelab.mvdnet.org attrd[27811]:   notice: Node stor-node2 state is now member
  Mar 21 15:57:44 stor-node1.homelab.mvdnet.org crmd[27813]:   notice: Node stor-node1 state is now member
  Mar 21 15:57:44 stor-node1.homelab.mvdnet.org crmd[27813]:   notice: Node stor-node2 state is now member
  Mar 21 15:57:44 stor-node1.homelab.mvdnet.org crmd[27813]:  warning: Support for 'notification-agent' and 'notification-target' cluster options is deprecated and will be removed in a future release (use alerts fe
  Mar 21 15:57:44 stor-node1.homelab.mvdnet.org crmd[27813]:   notice: The local CRM is operational
  Mar 21 15:57:44 stor-node1.homelab.mvdnet.org crmd[27813]:   notice: State transition S_STARTING -> S_PENDING
  Mar 21 15:58:05 stor-node1.homelab.mvdnet.org crmd[27813]:   notice: State transition S_PENDING -> S_NOT_DC
  Mar 21 15:59:19 stor-node1.homelab.mvdnet.org stonith-ng[27809]:   notice: Added 'fence-vol1' to the device list (1 active devices)
  Mar 21 15:59:19 stor-node1.homelab.mvdnet.org stonith-ng[27809]:   notice: fence-vol1 can not fence (on) stor-node1: static-list
  Mar 21 15:59:19 stor-node1.homelab.mvdnet.org stonith-ng[27809]:   notice: Operation on of stor-node2 by <no-one> for crmd.447@stor-node2.1840ba08: No such device
  Mar 21 15:59:19 stor-node1.homelab.mvdnet.org crmd[27813]:    error: Unfencing of stor-node2 by <anyone> failed: No such device (-19)
  Mar 21 15:59:19 stor-node1.homelab.mvdnet.org stonith-ng[27809]:   notice: Operation on of stor-node1 by <no-one> for crmd.447@stor-node2.0ae64991: No such device
  Mar 21 15:59:19 stor-node1.homelab.mvdnet.org crmd[27813]:    error: Unfencing of stor-node1 by <anyone> failed: No such device (-19)

Similar messages are on the other node Network communications are ok, firewalld is disabled

This is not my first run, tried with different kernel, physical/virtual. Ends the same.

What am I doing wrong?

ewwhite commented 5 years ago

Can you show your stonith device creation string?