StackStorm / st2

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html
https://stackstorm.com/
Apache License 2.0
6.07k stars 747 forks source link

Can't disable st2timersengine #6039

Open DesireWithin opened 12 months ago

DesireWithin commented 12 months ago

SUMMARY

I followed the documentation(https://docs.stackstorm.com/reference/ha.html#blueprint-box) to install a highly available st2, I can't disable st2timersengine after I add:

[timer]
enable = False

STACKSTORM VERSION

st2 3.8.0, on Python 3.6.9

OS, environment, install method

Ubuntu 18.04.6, install by apt.

Steps to reproduce the problem

add the configuration, and then restart st2:

root@prod-stackstorm-03:/etc/apt/sources.list.d# tail -n 10 /etc/st2/st2.conf
...
db_name = st2
username = stackstorm
password = XXXX
compressors = zstd

[coordination]
url = redis://:Redis_XXXXX@10.XX.XX.XXX:6379

[timer]
enable = False

root@prod-stackstorm-03:/etc/st2# st2ctl restart
Failed to stop st2chatops.service: Unit st2chatops.service not loaded.
Failed to start st2chatops.service: Unit st2chatops.service not found.
##### st2 components status #####
st2actionrunner PID: 102513
st2actionrunner PID: 102515
st2actionrunner PID: 102517
st2actionrunner PID: 102519
st2actionrunner PID: 102521
st2actionrunner PID: 102523
st2actionrunner PID: 102525
st2actionrunner PID: 102527
st2actionrunner PID: 102529
st2actionrunner PID: 102531
st2api PID: 102539
st2stream PID: 102549
st2auth PID: 102559
st2garbagecollector PID: 102562
st2notifier PID: 102565
st2rulesengine PID: 102569
st2sensorcontainer PID: 102572
st2chatops is not running.
st2timersengine PID: 102577
st2workflowengine PID: 102580
st2scheduler PID: 102583

Expected Results

I expect st2timersengine is not running.

Actual Results

Now I have duplicate rule evaluations.

nzlosh commented 12 months ago

What coordination backend are you using? I've observed this behaviour with HA setup and am using redis cluster as the coordination backend. Until a fix is found and released I'm using a workaround by putting a simple lock in the workflow that uses the st2 kv store. (This could be adapted to be an action that any workflow can call)

version: 1.0

vars:
  - check_lock_delay: 2

tasks:
  write_execution_id:
    action: st2.kv.set
    input:
      key: <% ctx(st2).action %>_exec_id
      value: <% ctx(st2).action_execution_id %>
    next:
      - when: <% succeeded() %>
        do: wait_to_check_lock

# Delay to allow all nodes to write to the kv store. (Adjust if nodes are heavily loaded and exceed delay)    
  wait_to_check_lock:
    action: core.local
    input:
      cmd: sleep <% ctx(check_lock_delay) %>
    next: 
      - when: <% succeeded() %>
        do: read_execution_id

  read_execution_id:
    action: st2.kv.get
    input:
      key: <% ctx(st2).action %>_exec_id
    next: 
      - when: <% succeeded() and result().result = ctx().st2.action_execution_id %>
        do: proceed

  proceed:
    action: core.local
    input: 
      cmd: echo "ONLY A SINGLE WORKFLOW SHOULD REACH HERE"
DesireWithin commented 12 months ago

Yes, I'm using redis as a coordination backend. I am looking for a solution using haproxy to monitor st2timersengine progress.

DesireWithin commented 7 months ago

I use keepalived to make sure only one st2timersengine is running.

MASTER config:

global_defs {
    # notification_email {
    #     your_email@example.com
    # }
    # notification_email_from keepalived@your_server.com
    # smtp_server localhost
    # smtp_connect_timeout 30
    router_id LVS_DEVEL
}

vrrp_script chk_program {
    script "/etc/keepalived/check_program.sh"
    interval 2
    weight -2
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface ens4
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass Sts_platform
    }
    track_script {
        chk_program
    }
    notify_master "/etc/keepalived/start_program.sh"
    notify_backup "/etc/keepalived/stop_program.sh"
}

BACKUP config:

global_defs {
    # notification_email {
    #     your_email@example.com
    # }
    # notification_email_from keepalived@your_server.com
    # smtp_server localhost
    # smtp_connect_timeout 30
    router_id LVS_DEVEL
}

vrrp_script chk_program {
    script "/etc/keepalived/check_program.sh"
    interval 2
    weight -2
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens4
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass Sts_platform
    }
    track_script {
        chk_program
    }
    notify_master "/etc/keepalived/start_program.sh"
    notify_backup "/etc/keepalived/stop_program.sh"
}

scripts: check_program.sh

#!/bin/bash

status=$(systemctl status st2timersengine.service)

if [ $? -eq 0 ]; then
  echo "st2timersengine.service is running normally."
  exit 0
else
  echo "Error: st2timersengine.service is not running normally."
  exit 1
fi

start_program.sh

#!/bin/bash
systemctl restart st2timersengine.service

stop_program.sh

#!/bin/bash
systemctl stop st2timersengine.service