ebi-gene-expression-group / atlas-web-single-cell

Single Cell Expression Atlas web application
Apache License 2.0
5 stars 5 forks source link

Add monitoring for our Solr Cloud cluster #469

Open ke4 opened 2 months ago

ke4 commented 2 months ago

Currently we need to check our Solr Cloud manually if there are any issues with it or if it is down. The above cases we have to restart them manually using a Jenkins job.

There is a better way to do it. We have to use a monitoring app that can check some critical parameters or if the servers are down and restart it. We can investigate Monit to use for all the above.

ke4 commented 2 months ago

From https://www.webfoobar.com/node/61, but that is only works for 1 Solr nod, not with Solr Cloud.

## Solr monitoring.

## Test the solr service.
check process solr with pidfile /var/solr/solr-8983.pid
  group solr
  start program = "/usr/bin/systemctl start solr"
  stop  program = "/usr/bin/systemctl stop solr"
  restart program  = "/usr/bin/systemctl restart solr"
  if failed port 8983 then restart
  if 3 restarts within 5 cycles then timeout
  depends on solr_bin   
  depends on solr_init
  alert root@localhost only on {timeout}

## Test the process binary.
check file solr_bin with path /opt/solr/bin/solr
  group solr
  if failed checksum then unmonitor
  if failed permission 755 then unmonitor
  if failed uid solr then unmonitor
  if failed gid solr then unmonitor
  alert root@localhost

## Test the init scripts.
check file solr_init with path /etc/init.d/solr
  group solr
  if failed checksum then unmonitor
  if failed permission 744 then unmonitor
  if failed uid root then unmonitor
  if failed gid root then unmonitor
  alert root@localhost
ke4 commented 2 months ago

Just add this here if we need to trigger a Jenkins job remotely: https://serverfault.com/questions/888176/how-to-trigger-jenkins-job-via-curl-command-remotely