systemd service (eg in exec-jar) to configurably restart on failure

matthewandrews commented 1 year ago

A useful option, particularly for production systems, is to configure an app's systemd service to automatically restart after a failure.

This might usefully apply to exec-jar and potentially other roles like tomcat and nginx.

Default would be the current behaviour (no restart); if a variable is set in inventory, the auto restart config will be added.

vjrj commented 1 year ago

I'm not sure if is exactly what you are looking for, but if helps, I use mmonit (apt install monit) with configurations like:

### tomcat-7
check process tomcat7 with pidfile /var/run/tomcat7.pid
  start program = "/usr/sbin/service tomcat7 start"
  stop program = "/usr/sbin/service tomcat7 stop"
  restart program = "/usr/sbin/service tomcat7 restart"

  if failed port 8080 then alert
  if failed port 8080 for 5 cycles then restarts

### tomcat-8
check process tomcat8 with pidfile /var/run/tomcat8.pid
  start program = "/usr/sbin/service tomcat8 start"
  stop program = "/usr/sbin/service tomcat8 stop"
  restart program = "/usr/sbin/service tomcat8 restart"

  if failed port 8080 then alert
  if failed port 8080 for 5 cycles then restarts

### tomcat-9
# tomcat9 does not have pid for some reason so:
# https://serverfault.com/questions/270316/monit-check-process-without-pidfile
# https://mmonit.com/monit/documentation/monit.html#Process
check process tomcat9 
  matching "/usr/share/tomcat9/bin/tomcat-juli.jar"
  start program = "/usr/sbin/service tomcat9 start"
  stop program = "/usr/sbin/service tomcat9 stop"
  restart program = "/usr/sbin/service tomcat9 restart"

  if failed port 8080 then alert
  if failed port 8080 for 5 cycles then restarts

I also have some default ones configurations like to monitor things like fail2ban, mysql, nginx, sshd. I use:

# https://github.com/pgolm/ansible-role-monit
- hosts: all
  vars:
    monit_cycle: 120
    monit_log_destination: syslog
    monit_eventqueue_dir: /var/lib/monit/events
    monit_services:
      - name: sshd
        type: process
        target: /var/run/sshd.pid
        start: /usr/sbin/service sshd start
        stop: /usr/sbin/service sshd stop
      - name: google
        type: host
        target: google.com
        rules:
          - "if failed port 443 type tcpSSL protocol http then alert"
      - name: localhost
        type: system
        rules:
          - "if loadavg (1min) > 2 then alert"
          - "if loadavg (5min) > 2 then alert"
          - "if memory usage > 75% then alert"
          - "if cpu usage (user) > 70% for 8 cycles then alert"
          - "if cpu usage (system) > 40% for 8 cycles then alert"
          - "if cpu usage (wait) > 20%  for 8 cycles then alert"
      - name: fail2ban
        type: process
        target: /var/run/fail2ban/fail2ban.pid
        start: /etc/init.d/fail2ban force-start
        stop: /etc/init.d/fail2ban stop
        rules:
          - "if failed unixsocket /var/run/fail2ban/fail2ban.sock then restart"
          - "if 5 restarts within 5 cycles then timeout"
    monit_webinterface_enabled: true
    monit_webinterface_acl_rules:
      - "localhost"
      - "172.17.17.0/24"
    monit_webinterface_bind: localhost
    monit_mail_enabled: false
    monit_mailserver_host: localhost
    monit_mailserver_port: 25
    monit_mailserver_user: root
    monit_mailserver_password: XXXX
    monit_alert_addresses:
      - sysadmins@l-a.site
    monit_alert_mail_from: noreply@l-a.site
    monit_alert_mail_subject: alert
    monit_alert_mail_message: |+
      $EVENT Servicio $SERVICE
                 Date:        $DATE
                 Action:      $ACTION
                 Host:        $HOST
                 Description: $DESCRIPTION
            Tu fiel empleado,
            Monit
  roles:
    - { role: pgolm.monit }
  tags: monit

And I copy the previous configs via:

- hosts: tomcat_servers
  tasks:
    - copy: src=files/monit/tomcat7 dest=/etc/monit/conf.d/tomcat7 owner=root group=root mode=644
      notify: reload monit
  handlers:
    - name: reload monit
      service: name=monit state=reloaded
  tags:  monit

- hosts: tomcat8_servers
  tasks:
    - copy: src=files/monit/tomcat8 dest=/etc/monit/conf.d/tomcat8 owner=root group=root mode=644
      notify: reload monit
  handlers:
    - name: reload monit
      service: name=monit state=reloaded
  tags:  monit

- hosts: tomcat9_servers
  tasks:
    - copy: src=files/monit/tomcat9 dest=/etc/monit/conf.d/tomcat9 owner=root group=root mode=644
      notify: reload monit
    - name: remove old tomcat8 monit file
      file:
        state: absent
        path: /etc/monit/conf.d/tomcat8
    - name: remove old apache monit file
      file:
        state: absent
        path: /etc/monit/conf.d/apache2
  handlers:
    - name: reload monit
      service: name=monit state=reloaded
  tags:  monit

matthewandrews commented 1 year ago

Thanks @vjrj - I have used monit before, and we may do that again at some point. The change I'm making here is in systemd service config, but it will default to current config unless a specific var is present in the inventory.

matthewandrews commented 8 months ago

Merged to master.

AtlasOfLivingAustralia / ala-install

systemd service (eg in exec-jar) to configurably restart on failure #646