Open calippus opened 1 year ago
fence_scsi is meant to disconnect access to shared storage, so e.g. your database or other resource(s) arent able to write to it when the node fails.
To reboot the node you should use any of the redfish/ipmilan agents (for iLO or iDRAC, etc), or fence_xvm with fence-virtd on the host node for virtual machines.
For other scenarioes you can use fence_sbd with poison pill on shared storage.
That's a VMware environment (so no ipmilan agents) and also I don't have the access to use fence_vmware_soap.
One can use fence_scsi as stonith device and by definition of stonith fence_scsi should do the work. I did use this agent before in a different system, it was working.
Anyway, as in the log, Operation 'reboot' has been started. It should be able to reboot.
If you want a "real" reboot you could still go for an SBD setup.
As SBD is heavily relying on a reliable watchdog. This makes SBD on VMware a bit critical as everything available below VSphere 7 was softdog and from there a virtual watchdog implementation. Both, as to my current knowledge, do have issues in guaranteeing a reliable reboot within a defined timeout in certain scenarios (migration, pausing, ...). Having that in mind you still could go for an SBD setup depending on what your cluster is intended for (test-cluster ...). As you have setup scsi-fencing already you could try having fence_scsi and fence_sbd in a topology (first fence_scsi and 2nd fence_sbd on the same level). If you keep e.g. your database on the scsi-device this would guarantee protection against database-corruption and still give you a quite reliable reboot of the fenced node. Haven't done any testing with this setup but it should work.
From the fencing configuration above it looks as if you're running a 2-node-cluster. This gives you basically 2 options for SBD: poison-pill (fence_sbd) with a shared disk or watchdog-fencing if you add either qdevice or a 3rd node for real quorum-forming. If your pacemaker-version is current enough (easiest check for existence of /usr/sbin/fence_watchdog) you can use watchdog in a topology similarly as fence_sbd with poison-pill.
Thanks a lot for the information and explanation. I read that sbd is not supported on VMware, that's the reason why I didn´t try it. (see: https://access.redhat.com/articles/3131271)
But I will test it now to configure, let's see.
Meanwhile, I have installed watchdog together with fence_scsi. First tests are successful so far.
That is why I was very careful in suggesting SBD for your scenario - but as it had already been mentioned ...
You are right - for just getting the "real" reboot using watchdog-daemon in combination with fence_scsi should be a possibility. Personally I have no experience with that combination and I haven't looked into how it is done in detail - both setup and implementation. But I would assume that on resource-recovery of resources that don't use the disk (like an IP address) you might have to be careful as I'm not sure if there is a mechanism that guarantees there is left enough time for the watchdog to trigger (leaving of course the uncertainty here if the watchdog triggers within the given timeout at all).
Hello,
I am having problem with fencing in our environment.
When I manually fence from node2 to node1
The fence operation is "OK", but the node is not rebooting, pacekamer is shutting down and then the node stays alive, see the logs from node1;
This is the stonith configuration;
Why the node is not rebooting, I couldn't find the solution. Could you please help in this matter? Thanks in advance