Closed OndrejHome closed 5 years ago
Hmm I have discussed that with the pacemaker developers before I decided which RC to return in a probe situation and as far as I remember they claimed that a m/s resource can return master or slave. However I will review that code part and will discuss that again :)
If the issue that code change is intended to fix is only for situations where a resource clean up is done while a resource is in unmanaged mode then I don't think it makes much sense, because I can not see a good reason why a resource cleanup would be done on a resource that is unmanaged.
When putting a resource in unmanaged mode it means that pacemaker is told not to perform any actions besides monitoring on the resource, but doing a resource cleanup might require that actions have to be performed on the resource, and therefore doing a resource cleanup while the resource is in unmanaged mode can lead to unexpected results and should therefore be avoided in my opinion.
I will discuss that code change in one of the next days online in the LinuxLab with Frank. I need to understand, why this change is really needed and if it also could break something. I think about the classical (currently supported) situation, where a customer sets the cluster to maintenance(!), works heavy on the SAP HANAs like doing updates, sets the resource to un-managed with an following cleanup, brings the cluster to ready and activates the managed-mode for the resource. In such situations the SAP HANA roles could be exchanged, and then I should NOT return OCF_RUNNING_MASTER on the secondary side during the probe. This procedure currently works like a charm and customer have reported back that they are very sufficient with that.
Again (like discussed in the other issue about promote) - we need to map a multi-state SAP HANA to the multi-state pacemaker resource. We need to hide some details against the cluster to get him calm and NOT doing unwanted resource actions. I would prefer to have a new pacemaker construct which really would allow multi-state resources. SAP HANA could not directly (1:1) mapped to the STOPPED-start->DEMOTED->promote->PROMOTED-demote->DEMOTED-stop->STOPPED Unfortunately it is much more complex.
Restarting an IP Address might be ok and fast, but restarting a SAP HANA instance could take up to hours!
However Frank and I will review your code-change-suggestion and if it fits the above needs of supporting still the SAP or cluster maintenance it has a good chance to find its way to the project code.
As there was no significant response here and as I don't have access to any testing systems any more I'm closing this issue. Feel free to reach out for questions if interested.
Hi Fabian,
I have noticed that in place below (L2160) SAPHana returns only
$OCF_SUCCESS
when the it is probed. This seems to not be and issue during cluster startup, but it leads to "not optimal" behaviour when the probe is needed during normal cluster operation. During normal cluster operation the probe can happen when the resource is "cleaned up" (pcs resource cleanup
). In such case the node that was running the 'Master' of SAPHana will first detect$OCF_SUCCESS
(which means 'Slave') and on next monitor it detects correctly 'Master'. However when this is done during the time that resource is in 'unmanaged' state, then cluster will consider the$OCF_SUCCESS
return code on probe as 'Failed Master' and as soon as the resource is again managed it will do 'restart' to resolve this - which is not good. In normal scenario (as managed resource) when cleanup is done I can see that after probe the cluster will do actually 'promote' operation to get from reported 'Slave' back to 'Master' - that is pretty quick, but it seems unnecessary if the resource was detected as Master in first place.I believe that code below can be just slightly adjusted to allow proper detection of Master state during probe. Just wanted to ask what was the motivation for returning just
$OCF_SUCCESS
when doing probe instead of more advanced detection of state? Is it a concern that 'probe' will change any node attributes?Possible change that only takes care of first part
If the above makes sense I can prepare a pull request for the changes that should improved the detection of Master when doing probe in scenario where $lss (landscape status) reports 2/3/4. Please let me know what you think about this approach.