ewwhite / zfs-ha

ZFS High-Availability NAS
749 stars 76 forks source link

Should fence_mpath agent be utilized instead of the fence_scsi agent? #26

Open rcproam opened 5 years ago

rcproam commented 5 years ago

This is not an issue with the current design. Possibly label as enhancement?

In particular, due to the documented issue "RHEL 7 High Availability and Resilient Storage Pacemaker cluster experiences a fence race condition between nodes during network outages while using fence_scsi with multipath storage", would it be more reliable to utilize the fence_mpath agent than the fence_scsi agent? I've encountered an issue very similar to the issue described here: https://access.redhat.com/solutions/3201072

Red Hat recommends utilizing the fence_mpath agent instead of fence_scsi to resolve this particular issue, however fence_mpath is more complex to configure, and may likely come with its own unique caveats/issues. https://access.redhat.com/articles/3078811

Still need to test the fence_mpath agent with my particular buildout to confirm whether or not it resolves the fencing / scsi reservation issue I've encountered, but I'm opening this issue in case others might have time to test the fence_mpath agent before I can.

rcproam commented 5 years ago

Description of fence_mpath agent and how it functions compared to fence_scsi:

fence_mpath: new fence agent for dm-multipath based on mpathpersist Previously, scenario with multipath and underlying SCSI devices was solved by using fence_scsi what works correctly but there are some limitation. The most important is that unfencing has to be done when all paths are available as it is executed only once. This new fence agent solve this situation properly as most of this situations are solved by mpathpersist which is part of dm-multipath. https://lists.fedorahosted.org/pipermail/cluster-commits/2014-November/004033.html

ewwhite commented 5 years ago

I'd still see if you can debug your specific issue. I don't know of anyone using fence_mpath for this type of setup, and there are plenty of folks using this guide with success.

Please note what I mentioned about diverse heartbeat network paths.

rcproam commented 5 years ago

Thanks @ewwhite I will try to debug some more... still trying to understand how the pcs resource start and stop timeouts affect failover as the suggested 90 seconds seems like a very large value (IIRC the TCP session timeout for NFS is only like 60 seconds). Also, my particular deployment is utilizing a SuperMicro Storage Bridge Bay (SBB), which includes an internal Ethernet interconnect between nodes which I am using for heartbeats.

rcproam commented 5 years ago

So I placed node#2 (cluster-nas2) into standby, then shut it down completely. When I subsequently startup node#2 again it causes pacemaker to crash on node#1. Below is the excerpt from the syslog on node#1 showing the sequence:

Apr 8 01:35:41 svr-lf-nas1 crmd[2850]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE Apr 8 01:50:41 svr-lf-nas1 crmd[2850]: notice: State transition S_IDLE -> S_POLICY_ENGINE Apr 8 01:50:41 svr-lf-nas1 pengine[2849]: notice: On loss of CCM Quorum: Ignore Apr 8 01:50:41 svr-lf-nas1 pengine[2849]: notice: Calculated transition 3481, saving inputs in /var/lib/pacemaker/pengine/pe-input-367.bz2 Apr 8 01:50:41 svr-lf-nas1 crmd[2850]: notice: Transition 3481 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-367.bz2): Complete Apr 8 01:50:41 svr-lf-nas1 crmd[2850]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE Apr 8 02:05:41 svr-lf-nas1 crmd[2850]: notice: State transition S_IDLE -> S_POLICY_ENGINE Apr 8 02:05:41 svr-lf-nas1 pengine[2849]: notice: On loss of CCM Quorum: Ignore Apr 8 02:05:41 svr-lf-nas1 pengine[2849]: notice: Calculated transition 3482, saving inputs in /var/lib/pacemaker/pengine/pe-input-367.bz2 Apr 8 02:05:41 svr-lf-nas1 crmd[2850]: notice: Transition 3482 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-367.bz2): Complete Apr 8 02:05:41 svr-lf-nas1 crmd[2850]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE Apr 8 02:17:01 svr-lf-nas1 CRON[13384]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Apr 8 02:18:52 svr-lf-nas1 crmd[2850]: notice: State transition S_IDLE -> S_POLICY_ENGINE Apr 8 02:18:52 svr-lf-nas1 pengine[2849]: notice: On loss of CCM Quorum: Ignore Apr 8 02:18:52 svr-lf-nas1 pengine[2849]: notice: Calculated transition 3483, saving inputs in /var/lib/pacemaker/pengine/pe-input-368.bz2 Apr 8 02:18:52 svr-lf-nas1 crmd[2850]: notice: Transition 3483 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-368.bz2): Complete Apr 8 02:18:52 svr-lf-nas1 crmd[2850]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE Apr 8 02:19:15 svr-lf-nas1 crmd[2850]: notice: State transition S_IDLE -> S_POLICY_ENGINE Apr 8 02:19:15 svr-lf-nas1 pengine[2849]: notice: On loss of CCM Quorum: Ignore Apr 8 02:19:15 svr-lf-nas1 pengine[2849]: notice: Scheduling Node cluster-nas2 for shutdown Apr 8 02:19:15 svr-lf-nas1 pengine[2849]: notice: Calculated transition 3484, saving inputs in /var/lib/pacemaker/pengine/pe-input-369.bz2 Apr 8 02:19:15 svr-lf-nas1 crmd[2850]: notice: Transition 3484 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-369.bz2): Complete Apr 8 02:19:15 svr-lf-nas1 crmd[2850]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE Apr 8 02:19:15 svr-lf-nas1 crmd[2850]: notice: do_shutdown of peer cluster-nas2 is complete Apr 8 02:19:15 svr-lf-nas1 cib[2845]: notice: Node cluster-nas2 state is now lost Apr 8 02:19:15 svr-lf-nas1 cib[2845]: notice: Purged 1 peers with id=2 and/or uname=cluster-nas2 from the membership cache Apr 8 02:19:15 svr-lf-nas1 corosync[2768]: notice [TOTEM ] A new membership (198.51.100.1:884) was formed. Members left: 2 Apr 8 02:19:15 svr-lf-nas1 corosync[2768]: notice [QUORUM] Members[1]: 1 Apr 8 02:19:15 svr-lf-nas1 corosync[2768]: notice [MAIN ] Completed service synchronization, ready to provide service. Apr 8 02:19:15 svr-lf-nas1 corosync[2768]: [TOTEM ] A new membership (198.51.100.1:884) was formed. Members left: 2 Apr 8 02:19:15 svr-lf-nas1 corosync[2768]: [QUORUM] Members[1]: 1 Apr 8 02:19:15 svr-lf-nas1 corosync[2768]: [MAIN ] Completed service synchronization, ready to provide service. Apr 8 02:19:15 svr-lf-nas1 pacemakerd[2840]: notice: Node cluster-nas2 state is now lost Apr 8 02:19:15 svr-lf-nas1 crmd[2850]: notice: Node cluster-nas2 state is now lost Apr 8 02:19:15 svr-lf-nas1 crmd[2850]: notice: do_shutdown of peer cluster-nas2 is complete Apr 8 02:19:15 svr-lf-nas1 stonith-ng[2846]: notice: Node cluster-nas2 state is now lost Apr 8 02:19:15 svr-lf-nas1 stonith-ng[2846]: notice: Purged 1 peers with id=2 and/or uname=cluster-nas2 from the membership cache Apr 8 02:19:15 svr-lf-nas1 attrd[2848]: notice: Node cluster-nas2 state is now lost Apr 8 02:19:15 svr-lf-nas1 attrd[2848]: notice: Removing all cluster-nas2 attributes for peer loss Apr 8 02:19:15 svr-lf-nas1 attrd[2848]: notice: Lost attribute writer cluster-nas2 Apr 8 02:19:15 svr-lf-nas1 attrd[2848]: notice: Purged 1 peers with id=2 and/or uname=cluster-nas2 from the membership cache Apr 8 02:19:25 svr-lf-nas1 kernel: [3133621.758535] igb 0000:05:00.0 eno3: igb: eno3 NIC Link is Down Apr 8 02:19:27 svr-lf-nas1 ntpd[2809]: Deleting interface #11 eno3, 198.51.100.1#123, interface stats: received=0, sent=0, dropped=0, active_time=3133254 secs Apr 8 02:19:28 svr-lf-nas1 kernel: [3133624.730941] igb 0000:05:00.0 eno3: igb: eno3 NIC Link is Up 10 Mbps Full Duplex, Flow Control: RX/TX Apr 8 02:19:30 svr-lf-nas1 ntpd[2809]: Listen normally on 12 eno3 198.51.100.1:123 Apr 8 02:20:33 svr-lf-nas1 kernel: [3133689.895368] igb 0000:05:00.0 eno3: igb: eno3 NIC Link is Down Apr 8 02:20:35 svr-lf-nas1 ntpd[2809]: Deleting interface #12 eno3, 198.51.100.1#123, interface stats: received=0, sent=0, dropped=0, active_time=65 secs Apr 8 02:20:37 svr-lf-nas1 kernel: [3133692.983744] igb 0000:05:00.0 eno3: igb: eno3 NIC Link is Up 10 Mbps Full Duplex, Flow Control: RX/TX Apr 8 02:20:38 svr-lf-nas1 ntpd[2809]: Listen normally on 13 eno3 198.51.100.1:123 Apr 8 02:20:42 svr-lf-nas1 kernel: [3133698.535494] igb 0000:05:00.0 eno3: igb: eno3 NIC Link is Down Apr 8 02:20:44 svr-lf-nas1 ntpd[2809]: Deleting interface #13 eno3, 198.51.100.1#123, interface stats: received=0, sent=0, dropped=0, active_time=6 secs Apr 8 02:20:45 svr-lf-nas1 kernel: [3133701.371873] igb 0000:05:00.0 eno3: igb: eno3 NIC Link is Up 10 Mbps Full Duplex, Flow Control: RX/TX Apr 8 02:20:47 svr-lf-nas1 ntpd[2809]: Listen normally on 14 eno3 198.51.100.1:123 Apr 8 02:21:12 svr-lf-nas1 kernel: [3133728.815903] igb 0000:05:00.0 eno3: igb: eno3 NIC Link is Down Apr 8 02:21:14 svr-lf-nas1 ntpd[2809]: Deleting interface #14 eno3, 198.51.100.1#123, interface stats: received=0, sent=0, dropped=0, active_time=27 secs Apr 8 02:21:38 svr-lf-nas1 kernel: [3133754.760563] igb 0000:05:00.0 eno3: igb: eno3 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX Apr 8 02:21:38 svr-lf-nas1 kernel: [3133754.760633] igb 0000:05:00.0 eno3: Link Speed was downgraded by SmartSpeed Apr 8 02:21:40 svr-lf-nas1 ntpd[2809]: Listen normally on 15 eno3 198.51.100.1:123 Apr 8 02:22:35 svr-lf-nas1 kernel: [3133811.692929] igb 0000:05:00.0 eno3: igb: eno3 NIC Link is Down Apr 8 02:22:37 svr-lf-nas1 ntpd[2809]: Deleting interface #15 eno3, 198.51.100.1#123, interface stats: received=0, sent=0, dropped=0, active_time=57 secs Apr 8 02:23:34 svr-lf-nas1 kernel: [3133870.401931] igb 0000:05:00.0 eno3: igb: eno3 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX Apr 8 02:23:34 svr-lf-nas1 kernel: [3133870.401997] igb 0000:05:00.0 eno3: Link Speed was downgraded by SmartSpeed Apr 8 02:23:35 svr-lf-nas1 corosync[2768]: notice [TOTEM ] A new membership (198.51.100.1:888) was formed. Members joined: 2 Apr 8 02:23:35 svr-lf-nas1 corosync[2768]: [TOTEM ] A new membership (198.51.100.1:888) was formed. Members joined: 2 Apr 8 02:23:35 svr-lf-nas1 crmd[2850]: notice: do_shutdown of peer cluster-nas2 is complete Apr 8 02:23:35 svr-lf-nas1 crmd[2850]: error: Node cluster-nas2[2] appears to be online even though we think it is dead Apr 8 02:23:35 svr-lf-nas1 crmd[2850]: notice: Node cluster-nas2 state is now member Apr 8 02:23:35 svr-lf-nas1 crmd[2850]: notice: State transition S_IDLE -> S_INTEGRATION Apr 8 02:23:35 svr-lf-nas1 corosync[2768]: notice [QUORUM] Members[2]: 1 2 Apr 8 02:23:35 svr-lf-nas1 corosync[2768]: notice [MAIN ] Completed service synchronization, ready to provide service. Apr 8 02:23:35 svr-lf-nas1 corosync[2768]: [QUORUM] Members[2]: 1 2 Apr 8 02:23:35 svr-lf-nas1 pacemakerd[2840]: notice: Node cluster-nas2 state is now member Apr 8 02:23:35 svr-lf-nas1 corosync[2768]: [MAIN ] Completed service synchronization, ready to provide service. Apr 8 02:23:35 svr-lf-nas1 cib[2845]: notice: Node cluster-nas2 state is now member Apr 8 02:23:35 svr-lf-nas1 attrd[2848]: notice: Node cluster-nas2 state is now member Apr 8 02:23:35 svr-lf-nas1 stonith-ng[2846]: notice: Node cluster-nas2 state is now member Apr 8 02:23:35 svr-lf-nas1 attrd[2848]: notice: Recorded attribute writer: cluster-nas2 Apr 8 02:23:35 svr-lf-nas1 cib[2845]: error: Cannot perform modification with no data Apr 8 02:23:35 svr-lf-nas1 cib[2845]: warning: Completed cib_modify operation for section status: Invalid argument (rc=-22, origin=cluster-nas2/crmd/35, version=0.256.6) Apr 8 02:23:35 svr-lf-nas1 crmd[2850]: warning: Another DC detected: cluster-nas2 (op=noop) Apr 8 02:23:35 svr-lf-nas1 crmd[2850]: notice: State transition S_ELECTION -> S_INTEGRATION Apr 8 02:23:35 svr-lf-nas1 crmd[2850]: warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check Apr 8 02:23:35 svr-lf-nas1 crmd[2850]: notice: Syncing the Cluster Information Base from cluster-nas2 to rest of cluster Apr 8 02:23:35 svr-lf-nas1 crmd[2850]: notice: Requested version Apr 8 02:23:35 svr-lf-nas1 attrd[2848]: notice: Updating all attributes after cib_refresh_notify event Apr 8 02:23:36 svr-lf-nas1 ntpd[2809]: Listen normally on 16 eno3 198.51.100.1:123 Apr 8 02:23:36 svr-lf-nas1 stonith-ng[2846]: notice: Operation reboot of cluster-nas1 by cluster-nas2 for crmd.2716@cluster-nas2.f04b7ab5: OK Apr 8 02:23:36 svr-lf-nas1 stonith-ng[2846]: notice: Operation on of cluster-nas2 by cluster-nas2 for crmd.2716@cluster-nas2.fbcdacb2: OK Apr 8 02:23:37 svr-lf-nas1 crmd[2850]: crit: We were allegedly just fenced by cluster-nas2 for cluster-nas2! Apr 8 02:23:37 svr-lf-nas1 pacemakerd[2840]: warning: The crmd process (2850) can no longer be respawned, shutting the cluster down. Apr 8 02:23:37 svr-lf-nas1 pacemakerd[2840]: notice: Shutting down Pacemaker Apr 8 02:23:37 svr-lf-nas1 pacemakerd[2840]: notice: Stopping pengine Apr 8 02:23:37 svr-lf-nas1 kernel: [3133873.286736] sd 0:0:13:0: Parameters changed Apr 8 02:23:37 svr-lf-nas1 lrmd[2847]: warning: new_event_notification (2847-2850-7): Bad file descriptor (9) Apr 8 02:23:37 svr-lf-nas1 lrmd[2847]: warning: Notification of client crmd/09eb8595-7f1f-4169-aa4a-8935aa1fb4b6 failed Apr 8 02:23:37 svr-lf-nas1 lrmd[2847]: warning: Notification of client crmd/09eb8595-7f1f-4169-aa4a-8935aa1fb4b6 failed Apr 8 02:23:37 svr-lf-nas1 lrmd[2847]: warning: Notification of client crmd/09eb8595-7f1f-4169-aa4a-8935aa1fb4b6 failed Apr 8 02:23:37 svr-lf-nas1 lrmd[2847]: warning: Notification of client crmd/09eb8595-7f1f-4169-aa4a-8935aa1fb4b6 failed Apr 8 02:23:37 svr-lf-nas1 lrmd[2847]: warning: Notification of client crmd/09eb8595-7f1f-4169-aa4a-8935aa1fb4b6 failed Apr 8 02:23:37 svr-lf-nas1 lrmd[2847]: warning: Notification of client crmd/09eb8595-7f1f-4169-aa4a-8935aa1fb4b6 failed Apr 8 02:23:37 svr-lf-nas1 pengine[2849]: notice: Caught 'Terminated' signal Apr 8 02:23:37 svr-lf-nas1 pacemakerd[2840]: notice: Stopping attrd Apr 8 02:23:37 svr-lf-nas1 attrd[2848]: notice: Caught 'Terminated' signal Apr 8 02:23:37 svr-lf-nas1 pacemakerd[2840]: notice: Stopping lrmd Apr 8 02:23:37 svr-lf-nas1 lrmd[2847]: notice: Caught 'Terminated' signal Apr 8 02:23:37 svr-lf-nas1 pacemakerd[2840]: notice: Stopping stonith-ng Apr 8 02:23:37 svr-lf-nas1 stonith-ng[2846]: notice: Caught 'Terminated' signal Apr 8 02:23:37 svr-lf-nas1 pacemakerd[2840]: notice: Stopping cib Apr 8 02:23:37 svr-lf-nas1 cib[2845]: notice: Caught 'Terminated' signal Apr 8 02:23:37 svr-lf-nas1 cib[2845]: notice: Disconnected from Corosync Apr 8 02:23:37 svr-lf-nas1 cib[2845]: notice: Disconnected from Corosync Apr 8 02:23:37 svr-lf-nas1 pacemakerd[2840]: notice: Shutdown complete Apr 8 02:23:37 svr-lf-nas1 pacemakerd[2840]: notice: Attempting to inhibit respawning after fatal error

ewwhite commented 5 years ago

Can you show me the pcs resource creation string you used for the fencing?

Maybe also the cluster creation string... and also your hosts files?

ewwhite commented 5 years ago

Any updates? @rcproam

rcproam commented 5 years ago

Thanks so much for following-up on this @ewwhite and my apologies for the delay. My spare time has been focused on tax preparations this week.

Anythom, I did try configuring the fence_mpath agent devices but unfortunately unfencing no worky for me :-\

Will try to revert back to fence_scsi agent tonight and provide the info you requested.

BTW, are you receiving Email to your @ewwhite.net address? I had sent an Email last week. If you’re located in Chicago maybe we could meet up one day? Would like to learn more about your consulting business in case I have the opportunity to refer some new business to you.