ClusterLabs / fence-agents

Fence agents
104 stars 159 forks source link

Azure_ARM fence agent - pcmk_delay_max and priority-fencing-delay #494

Open db6thomas opened 2 years ago

db6thomas commented 2 years ago

To avoid a fence race, it is possible to use above parameters to address this issue. References can be found here:

https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/high-availability-guide-rhel-pacemaker https://www.suse.com/support/kb/doc/?id=000019110 https://access.redhat.com/solutions/5110521

The options do not work for the Azure_ARM fence agent 4.7.1 and 4.9.1. Both gets ignored and fencing race happens.

Is this for purpose or just missing yet?

oalbrigt commented 2 years ago

These parameters depend on which version of pacemaker you're running.

Can you post your output of rpm -qa | grep pacemaker?

db6thomas commented 2 years ago

Hello,

azr-sd01:~ # rpm -qa |grep pacemaker libpacemaker3-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.x86_64 pacemaker-remote-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.x86_64 pacemaker-cli-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.x86_64 libpacemaker-devel-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.x86_64 pacemaker-cts-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.noarch pacemaker-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.x86_64

This the Pacemaker, that comes integrated with Db2 - therefore you see db2pcmk.x86_64. In Pacemaker and Corosync, no changes where made beside new packaging. With the same pacemaker version, we tested on AWS and there the parameters works and fencing race can be avoided.

oalbrigt commented 2 years ago

pcmk_delay_max, pcmk_delay_base, priority-fencing-delay are being executed by pacemaker (fenced) prior to execute the action on the fence-agent while other delay-parameters are passed to the fence-agent.

The reason why fence_azure_arm is behaving differently might be due to code in fence_aws to avoid race conditions: https://github.com/ClusterLabs/fence-agents/pull/323/files

Maybe you should use pcmk_delay_base instead? That's used for base+random value.

oalbrigt commented 2 years ago

If you have further issues you can try mailing: http://oss.clusterlabs.org/mailman/listinfo/users where users/devs of all the ClusterLabs projects can answer.