Closed digimer closed 5 months ago
Look in Cluster.pm->check_stonith_config() around;
# Setup fence levels.
foreach my $node_name (sort {$a cmp $b} keys %{$fence_order})
{
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => $debug, list => { "something_changed->{$node_name}" => $something_changed->{$node_name} }});
if ($something_changed->{$node_name})
This is line 1654 as of when this bug was filed.
We need this code fixed right now. It´s causing havoc in CI when multiple fence devices are defined.
http://anvil-ci-repo.ci.alteeve.com/testing-logs/dafaq.tar.gz
ipmi, apc, virt and delay are configured in the template but:
Fencing Levels:
Target: an-a01n01
Level 1 - ipmilan_node1
Level 2 - apc_snmp_node1_an-pdu01,apc_snmp_node1_an-pdu02
Resources Defaults:
fence levels are incomplete for node1 and completely missing for node2.
Configuring only IPMI in CI appears to do the trick:
Fencing Levels:
Target: an-a01n01
Level 1 - ipmilan_node1
Level 2 - delay_node1
Target: an-a01n02
Level 1 - ipmilan_node2
Level 2 - delay_node2
I have run a few manual tests adding only apc to the template (drop gravitar/fence_virt) and one time the strikers failed to join the db in early stage and another couple of times they configured the nodes correctly.
There is clearly something deep going on inside the fence config code that must be addressed ASAP.
This is leaving aside that simengine apc is severely broken for other reasons.
Changing the order in the config template does help to get to the right point. For example:
Fencing Levels:
Target: an-a01n01
Level 1 - ipmilan_node1
Level 2 - apc_snmp_node1_an-pdu01,apc_snmp_node1_an-pdu02
Level 3 - virt_node1_gravitar
Level 4 - delay_node1
Target: an-a01n02
Level 1 - ipmilan_node2
Level 2 - apc_snmp_node2_an-pdu01,apc_snmp_node2_an-pdu02
Level 3 - virt_node2_gravitar
Level 4 - delay_node2
Thanks for the update, I will work on this Monday/tomorrow
Currently, fence levels are only updated if a fence device changes. A case has been seen where the fence levels were missing and not repaired because the actual stonith device configs were fine.