ClusterLabs / fence-agents

Fence agents
104 stars 157 forks source link

Fence with fence_pve fails in pacemaker with stacktrace #580

Open crowtrobot opened 4 months ago

crowtrobot commented 4 months ago

Some background:
I'm running fence-agents as part of a new pacemaker cluster I am setting up. It's just a simple 2 node nfs server cluster running Debian 12 (bookworm) in VMs on proxmox as an effort to learn pacemaker. So there's nothing critical or even production here, just me learning the basics. Everything is the version installed from the official debian repositories (fence-agents 4.12.1-1).

The issue: Manually calling fence_pve works, but when pacemaker tries to fence with fence_pve it results in this error in the logs:

May 13 16:51:27.132 nfs-test-1 pacemaker-fenced    [1095] (internal_stonith_action_execute)     info: Attempt 2 to execute fence_pve (list). remaining timeout is 59
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_op_output)       info: fence_pve_list_2of2[1706] error output [ Traceback (most recent call last): ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_op_output)       info: fence_pve_list_2of2[1706] error output [   File "/usr/sbin/fence_pve", line 240, in <module> ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_op_output)       info: fence_pve_list_2of2[1706] error output [     main() ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_op_output)       info: fence_pve_list_2of2[1706] error output [   File "/usr/sbin/fence_pve", line 235, in main ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_op_output)       info: fence_pve_list_2of2[1706] error output [     result = fence_action(None, options, set_power_status, get_power_status, get_outlet_list, reboot_cycle) ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_op_output)       info: fence_pve_list_2of2[1706] error output [              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_op_output)       info: fence_pve_list_2of2[1706] error output [   File "/usr/share/fence/fencing.py", line 975, in fence_action ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_op_output)       info: fence_pve_list_2of2[1706] error output [     print(outlet_id + options["--separator"] + alias) ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_op_output)       info: fence_pve_list_2of2[1706] error output [           ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_op_output)       info: fence_pve_list_2of2[1706] error output [ TypeError: unsupported operand type(s) for +: 'int' and 'str' ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_action)  warning: fence_pve[1706] stderr: [ Traceback (most recent call last): ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_action)  warning: fence_pve[1706] stderr: [   File "/usr/sbin/fence_pve", line 240, in <module> ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_action)  warning: fence_pve[1706] stderr: [     main() ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_action)  warning: fence_pve[1706] stderr: [   File "/usr/sbin/fence_pve", line 235, in main ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_action)  warning: fence_pve[1706] stderr: [     result = fence_action(None, options, set_power_status, get_power_status, get_outlet_list, reboot_cycle) ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_action)  warning: fence_pve[1706] stderr: [              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_action)  warning: fence_pve[1706] stderr: [   File "/usr/share/fence/fencing.py", line 975, in fence_action ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_action)  warning: fence_pve[1706] stderr: [     print(outlet_id + options["--separator"] + alias) ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_action)  warning: fence_pve[1706] stderr: [           ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (log_action)  warning: fence_pve[1706] stderr: [ TypeError: unsupported operand type(s) for +: 'int' and 'str' ]
May 13 16:51:29.017 nfs-test-1 pacemaker-fenced    [1095] (update_remaining_timeout)    info: Attempted to execute agent fence_pve (list) the maximum number of times (2) allowed

I don't know if this is the right way to do it (I'm a newb at python), but I fixed it with a simple find a replace in /usr/share/fence/fencing.py, replacing in all occurrences of "print(outlet_id + options["--separator"]" with "print(str(outlet_id) + options["--separator"]" and that seems to have worked.

oalbrigt commented 4 months ago

The easiest way is probably to do "{}{}{}".format(outlet_id, options["--separator"], alias) to avoid any pitfalls like alias being an integer.

oalbrigt commented 3 months ago

This is getting fixed in https://github.com/ClusterLabs/fence-agents/pull/586/commits/a4502b3bf15a3be2ebd64b6829cd4f6641f2506b