ClusterLabs / anvil

The Anvil! Intelligent Availability™ Platform, mark 3
5 stars 6 forks source link

servers are not started on the expected anvil-node #339

Open fabbione opened 1 year ago

fabbione commented 1 year ago

according to anvil design, all servers should (by default) be created and run on the same server. This is not the case.

Both with creation of a server and recovery, often happens that one of the test server is on a different anvil-node.

  * an-test-deploy1     (ocf::alteeve:server):   Started an-a01n01
  * an-test-deploy2     (ocf::alteeve:server):   Started an-a01n01
  * an-test-deploy3     (ocf::alteeve:server):   Started an-a01n01
  * an-test-deploy4     (ocf::alteeve:server):   Started an-a01n02
  * an-test-deploy5     (ocf::alteeve:server):   Started an-a01n01

I have been able to trigger this issue both at create and recovery time.

fabbione commented 1 year ago

after a bunch of fencing events to test node recovery, I think the issue that causes servers to appear on different nodes is this:

Location Constraints: Resource: an-test-deploy1 Enabled on: Node: an-a01n01 (score:200) (id:location-an-test-deploy1-an-a01n01-200) Node: an-a01n02 (score:100) (id:location-an-test-deploy1-an-a01n02-100) Constraint: location-an-test-deploy1 Rule: score=-INFINITY (id:location-an-test-deploy1-rule) Expression: drbd-fenced_an-test-deploy1 eq 1 (id:location-an-test-deploy1-rule-expr) Resource: an-test-deploy2 Enabled on: Node: an-a01n01 (score:100) (id:location-an-test-deploy2-an-a01n01-100) Node: an-a01n02 (score:200) (id:location-an-test-deploy2-an-a01n02-200) Constraint: location-an-test-deploy2 Rule: score=-INFINITY (id:location-an-test-deploy2-rule) Expression: drbd-fenced_an-test-deploy2 eq 1 (id:location-an-test-deploy2-rule-expr) Resource: an-test-deploy3 Enabled on: Node: an-a01n01 (score:100) (id:location-an-test-deploy3-an-a01n01-100) Node: an-a01n02 (score:200) (id:location-an-test-deploy3-an-a01n02-200) Constraint: location-an-test-deploy3 Rule: score=-INFINITY (id:location-an-test-deploy3-rule) Expression: drbd-fenced_an-test-deploy3 eq 1 (id:location-an-test-deploy3-rule-expr) Resource: an-test-deploy4 Enabled on: Node: an-a01n02 (score:100) (id:location-an-test-deploy4-an-a01n02-100) Node: an-a01n01 (score:200) (id:location-an-test-deploy4-an-a01n01-200) Constraint: location-an-test-deploy4 Rule: score=-INFINITY (id:location-an-test-deploy4-rule) Expression: drbd-fenced_an-test-deploy4 eq 1 (id:location-an-test-deploy4-rule-expr) Resource: an-test-deploy5 Enabled on: Node: an-a01n01 (score:200) (id:location-an-test-deploy5-an-a01n01-200) Node: an-a01n02 (score:100) (id:location-an-test-deploy5-an-a01n02-100) Constraint: location-an-test-deploy5 Rule: score=-INFINITY (id:location-an-test-deploy5-rule) Expression: drbd-fenced_an-test-deploy5 eq 1 (id:location-an-test-deploy5-rule-expr)

Not sure which part of anvil creates those constraints tho.

digimer commented 1 year ago

I believe this has been resolved. If not, please let me know. If so, please close.

fabbione commented 1 year ago

Still an issue.

  * an-test-deploy1     (ocf::alteeve:server):   Started an-a01n02
  * an-test-deploy2     (ocf::alteeve:server):   Started an-a01n01
  * an-test-deploy3     (ocf::alteeve:server):   Started an-a01n02
  * an-test-deploy4     (ocf::alteeve:server):   FAILED (blocked) [ an-a01n01 an-a01n02 ]
  * an-test-deploy5     (ocf::alteeve:server):   Started an-a01n02
digimer commented 1 year ago

Should be fixed by pr#408

fabbione commented 1 year ago
  * an-test-deploy1     (ocf::alteeve:server):   Started an-a01n01
  * an-test-deploy2     (ocf::alteeve:server):   Started an-a01n02
  * an-test-deploy3     (ocf::alteeve:server):   Stopped (disabled)
  * an-test-deploy4     (ocf::alteeve:server):   Started an-a01n01
  * an-test-deploy5     (ocf::alteeve:server):   Started an-a01n01
digimer commented 7 months ago

Need to update to ensure that new servers are provisioned on the subnode with the most servers.