fmherschel / SAPHanaSR-old-private

15 stars 8 forks source link

SAPHana returns $OCF_SUCCESS when doing 'probe' on node that was running Master #44

Closed OndrejHome closed 5 years ago

OndrejHome commented 6 years ago

Hi Fabian,

I have noticed that in place below (L2160) SAPHana returns only $OCF_SUCCESS when the it is probed. This seems to not be and issue during cluster startup, but it leads to "not optimal" behaviour when the probe is needed during normal cluster operation. During normal cluster operation the probe can happen when the resource is "cleaned up" (pcs resource cleanup). In such case the node that was running the 'Master' of SAPHana will first detect $OCF_SUCCESS (which means 'Slave') and on next monitor it detects correctly 'Master'. However when this is done during the time that resource is in 'unmanaged' state, then cluster will consider the $OCF_SUCCESS return code on probe as 'Failed Master' and as soon as the resource is again managed it will do 'restart' to resolve this - which is not good. In normal scenario (as managed resource) when cleanup is done I can see that after probe the cluster will do actually 'promote' operation to get from reported 'Slave' back to 'Master' - that is pretty quick, but it seems unnecessary if the resource was detected as Master in first place.

I believe that code below can be just slightly adjusted to allow proper detection of Master state during probe. Just wanted to ask what was the motivation for returning just $OCF_SUCCESS when doing probe instead of more advanced detection of state? Is it a concern that 'probe' will change any node attributes?

        2 | 3 | 4 ) # WARN, INFO or OK
            if ocf_is_probe; then
                rc=$OCF_SUCCESS
            else
                LPTloc=$(date '+%s')
                lpa_set_lpt $LPTloc $NODENAME
                lpa_push_lpt $LPTloc
                if [ "$promoted" -eq 1 ]; then
                    set_hana_attribute "$NODENAME" "PRIM" ${ATTR_NAME_HANA_SYNC_STATUS[@]}
                    rc=$OCF_RUNNING_MASTER
                else
                    if [ "$init_attribute" -eq 1 ]; then
                        set_hana_attribute ${NODENAME} "DEMOTED" ${ATTR_NAME_HANA_CLONE_STATE[@]}
                        rc=$OCF_RUNNING_MASTER
                    else
                        rc=$OCF_SUCCESS
                    fi
fi

Possible change that only takes care of first part

...
        2 | 3 | 4 ) # WARN, INFO or OK
                LPTloc=$(date '+%s')
                if ! ocf_is_probe; then
                    lpa_set_lpt $LPTloc $NODENAME
                    lpa_push_lpt $LPTloc
                fi
                if [ "$promoted" -eq 1 ]; then
                    if ! ocf_is_probe; then
                        set_hana_attribute "$NODENAME" "PRIM" ${ATTR_NAME_HANA_SYNC_STATUS[@]}
                    fi
                    rc=$OCF_RUNNING_MASTER
                else
                    if [ "$init_attribute" -eq 1 ]; then
                        if ! ocf_is_probe; then
                            set_hana_attribute ${NODENAME} "DEMOTED" ${ATTR_NAME_HANA_CLONE_STATE[@]}
                        fi
                        rc=$OCF_RUNNING_MASTER
                    else
                        rc=$OCF_SUCCESS
                    fi
                    if ! ocf_is_probe; then
                        ... rest of the code ...
                    fi
...

If the above makes sense I can prepare a pull request for the changes that should improved the detection of Master when doing probe in scenario where $lss (landscape status) reports 2/3/4. Please let me know what you think about this approach.

fmherschel commented 6 years ago

Hmm I have discussed that with the pacemaker developers before I decided which RC to return in a probe situation and as far as I remember they claimed that a m/s resource can return master or slave. However I will review that code part and will discuss that again :)

fdanapfel commented 6 years ago

If the issue that code change is intended to fix is only for situations where a resource clean up is done while a resource is in unmanaged mode then I don't think it makes much sense, because I can not see a good reason why a resource cleanup would be done on a resource that is unmanaged.

When putting a resource in unmanaged mode it means that pacemaker is told not to perform any actions besides monitoring on the resource, but doing a resource cleanup might require that actions have to be performed on the resource, and therefore doing a resource cleanup while the resource is in unmanaged mode can lead to unexpected results and should therefore be avoided in my opinion.

fmherschel commented 6 years ago

I will discuss that code change in one of the next days online in the LinuxLab with Frank. I need to understand, why this change is really needed and if it also could break something. I think about the classical (currently supported) situation, where a customer sets the cluster to maintenance(!), works heavy on the SAP HANAs like doing updates, sets the resource to un-managed with an following cleanup, brings the cluster to ready and activates the managed-mode for the resource. In such situations the SAP HANA roles could be exchanged, and then I should NOT return OCF_RUNNING_MASTER on the secondary side during the probe. This procedure currently works like a charm and customer have reported back that they are very sufficient with that.

Again (like discussed in the other issue about promote) - we need to map a multi-state SAP HANA to the multi-state pacemaker resource. We need to hide some details against the cluster to get him calm and NOT doing unwanted resource actions. I would prefer to have a new pacemaker construct which really would allow multi-state resources. SAP HANA could not directly (1:1) mapped to the STOPPED-start->DEMOTED->promote->PROMOTED-demote->DEMOTED-stop->STOPPED Unfortunately it is much more complex.

Restarting an IP Address might be ok and fast, but restarting a SAP HANA instance could take up to hours!

However Frank and I will review your code-change-suggestion and if it fits the above needs of supporting still the SAP or cluster maintenance it has a good chance to find its way to the project code.

OndrejHome commented 5 years ago

As there was no significant response here and as I don't have access to any testing systems any more I'm closing this issue. Feel free to reach out for questions if interested.