ClusterLabs / fence-agents

Fence agents
104 stars 158 forks source link

fence_ipmilan failed when method set to cycle #596

Open Jazyy opened 1 week ago

Jazyy commented 1 week ago

Hi All, In a 2-node pacemaker cluster, with fence_ipmilan configured as method=cycle and action=reboot, if a node is forcibly powered off, the other node cannot perform the fence action and keeps reporting “Connection Timed out”;

After debugging, it appears that the failure of the fence action is attributed to the unsuccessful execution of the command '/var/bin/ipmitool -I lanplus -H xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power cycle.' The error message associated with this command is: 'Set Chassis Power Control to Cycle failed: Command not supported in present state'.

Logs are follow: Oct 11 14:48:58 host1 pacemaker-schedulerd[2984671]: warning: Calculated transition 16611 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-2593.bz2 Oct 11 14:48:58 host1 pacemaker-controld[2984672]: notice: Requesting fencing (reboot) targeting node host2 Oct 11 14:48:58 host1 pacemaker-fenced[2984668]: notice: Client pacemaker-controld.2984672 wants to fence (reboot) host2 using any device Oct 11 14:48:58 host1 pacemaker-fenced[2984668]: notice: Requesting peer fencing (reboot) targeting host2 Oct 11 14:48:58 host1 pacemaker-fenced[2984668]: notice: Requesting that host1 perform 'reboot' action targeting host2 Oct 11 14:48:58 host1 /fence_ipmilan[2923680]: Executing: /usr/bin/ipmitool -I lanplus -H xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power status Oct 11 14:48:58 host1 /fence_ipmilan[2923680]: Executing: /usr/bin/ipmitool -I lanplus -H xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power cycle Oct 11 14:48:58 host1 /fence_ipmilan[2923680]: Executing: /usr/bin/ipmitool -I lanplus -H xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power cycle Oct 11 14:49:00 host1 pacemaker-attrd[2984670]: notice: Setting ram_free[host1] in instance_attributes: 50.48 -> 226950 Oct 11 14:49:00 host1 pacemaker-controld[2984672]: notice: Transition 16611 aborted by status-1-ram_free doing modify ram_free=226950: Transient attribute change Oct 11 14:49:00 host1 pacemaker-attrd[2984670]: notice: Setting ram_total[host1] in instance_attributes: 250.17 -> 256200 Oct 11 14:49:00 host1 /fence_ipmilan[2923680]: **Connection timed out** Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,324 INFO: Executing: /usr/bin/ipmitool -I lanplus -H xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power status ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,381 DEBUG: 0 Chassis Power is off ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,382 INFO: Executing: /usr/bin/ipmitool -I lanplus -H xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power cycle ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,567 DEBUG: 1 Set Chassis Power Control to Cycle failed: Command not supported in present state ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,567 INFO: Executing: /usr/bin/ipmitool -I lanplus -H xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power cycle ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,741 DEBUG: 1 Set Chassis Power Control to Cycle failed: Command not supported in present state ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:49:00,743 ERROR: **Connection timed out** ]

My question is:

  1. Is it appropriate to report 'Connection timed out' error here?
  2. After a node (host2) is forcibly powered off, the expected result is the other node (host1) can fence host2 normally. Why fence fail when method=cycle? 3.What is the difference between onoff and cycle in fence ipmilan?
Jazyy commented 1 week ago

pacemaker logs:

Oct 11 14:48:58 host1 pacemaker-schedulerd[2984671]: warning: Calculated transition 16611 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-2593.bz2 
Oct 11 14:48:58 host1 pacemaker-controld[2984672]: notice: Requesting fencing (reboot) targeting node host2 
Oct 11 14:48:58 host1 pacemaker-fenced[2984668]: notice: Client pacemaker-controld.2984672 wants to fence (reboot) host2 using any device 
Oct 11 14:48:58 host1 pacemaker-fenced[2984668]: notice: Requesting peer fencing (reboot) targeting host2 
Oct 11 14:48:58 host1 pacemaker-fenced[2984668]: notice: Requesting that host1 perform 'reboot' action targeting host2 
Oct 11 14:48:58 host1 /fence_ipmilan[2923680]: Executing: /usr/bin/ipmitool -I lanplus -H xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power status 
Oct 11 14:48:58 host1 /fence_ipmilan[2923680]: Executing: /usr/bin/ipmitool -I lanplus -H  xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power cycle 
Oct 11 14:48:58 host1 /fence_ipmilan[2923680]: Executing: /usr/bin/ipmitool -I lanplus -H  xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power cycle Oct 11 14:49:00 host1 pacemaker-attrd[2984670]: notice: Setting ram_free[host1] in instance_attributes: 50.48 -> 226950 
Oct 11 14:49:00 host1 pacemaker-controld[2984672]: notice: Transition 16611 aborted by status-1-ram_free doing modify ram_free=226950: Transient attribute change 
Oct 11 14:49:00 host1 pacemaker-attrd[2984670]: notice: Setting ram_total[host1] in instance_attributes: 250.17 -> 256200 Oct 11 14:49:00 host1 /fence_ipmilan[2923680]: **Connection timed out** 
Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,324 INFO: Executing: /usr/bin/ipmitool -I lanplus -H  xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power status ] 
Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [  ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,381 DEBUG: 0 Chassis Power is off ] 
Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [   ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [  ] 
Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,382 INFO: Executing: /usr/bin/ipmitool -I lanplus -H  xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power cycle ] 
Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [  ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,567 DEBUG: 1  Set Chassis Power Control to Cycle failed: Command not supported in present state ] 
Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [  ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [  ] 
Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,567 INFO: Executing: /usr/bin/ipmitool -I lanplus -H  xx.xx.xx.xx -p 623 -U Administrator -P [set] -L ADMINISTRATOR -N 2 chassis power cycle ] 
Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [  ] Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:48:58,741 DEBUG: 1  Set Chassis Power Control to Cycle failed: Command not supported in present state ] 
Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [  ] 
Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [  ] 
Oct 11 14:49:00 host1 pacemaker-fenced[2984668]: warning: fence_ipmilan[2923680] stderr: [ 2024-10-11 14:49:00,743 ERROR: Connection timed out ]
oalbrigt commented 1 week ago

The agent tries to report what has most likely happened with error code and a message: https://github.com/ClusterLabs/fence-agents/blob/main/lib/fencing.py.py#L579

Which is why it's always a good idea to check the logs to see if there is any more info before it returned this error code/message.

Jazyy commented 5 days ago

Thanks for your reply. I will learn more information from fencing. py.