AtlasOfLivingAustralia / ala-install

Ansible playbooks for installing the ALA components
https://www.ala.org.au
Apache License 2.0
26 stars 52 forks source link

Add regular calls to solr rebalance script #205

Open ansell opened 6 years ago

ansell commented 6 years ago

The solr cloud becomes unbalanced without nodes being restarted by monit. Another mechanism such as cron is needed to regularly call the solr rebalance script to ensure all nodes are being targeted in a balanced manner.

ansell commented 6 years ago

An example of running it manually a few minutes ago after an outage reconciled itself with only one monit restart showed 4 of the non-restarted nodes (those not mentioned in alreadyLeaders) had lost their leader status. This is an issue due to the extra future load put on the already underresourced solr nodes that took the leader status from those nodes:

# cat /tmp/rebalanceleaders-output.json 
{
  "responseHeader":{
    "status":0,
    "QTime":433},
  "alreadyLeaders":[
    "core_node3",[
      "status","success",
      "msg","Already leader",
      "shard","shard3",
      "nodeName","aws-sc3b.ala:8983_solr"],
    "core_node5",[
      "status","success",
      "msg","Already leader",
      "shard","shard5",
      "nodeName","aws-sc5b.ala:8983_solr"],
    "core_node6",[
      "status","success",
      "msg","Already leader",
      "shard","shard6",
      "nodeName","aws-sc6b.ala:8983_solr"],
    "core_node8",[
      "status","success",
      "msg","Already leader",
      "shard","shard8",
      "nodeName","aws-sc8b.ala:8983_solr"]]}