Open DesireWithin opened 12 months ago
What coordination backend are you using? I've observed this behaviour with HA setup and am using redis cluster as the coordination backend. Until a fix is found and released I'm using a workaround by putting a simple lock in the workflow that uses the st2 kv store. (This could be adapted to be an action that any workflow can call)
version: 1.0
vars:
- check_lock_delay: 2
tasks:
write_execution_id:
action: st2.kv.set
input:
key: <% ctx(st2).action %>_exec_id
value: <% ctx(st2).action_execution_id %>
next:
- when: <% succeeded() %>
do: wait_to_check_lock
# Delay to allow all nodes to write to the kv store. (Adjust if nodes are heavily loaded and exceed delay)
wait_to_check_lock:
action: core.local
input:
cmd: sleep <% ctx(check_lock_delay) %>
next:
- when: <% succeeded() %>
do: read_execution_id
read_execution_id:
action: st2.kv.get
input:
key: <% ctx(st2).action %>_exec_id
next:
- when: <% succeeded() and result().result = ctx().st2.action_execution_id %>
do: proceed
proceed:
action: core.local
input:
cmd: echo "ONLY A SINGLE WORKFLOW SHOULD REACH HERE"
Yes, I'm using redis as a coordination backend. I am looking for a solution using haproxy to monitor st2timersengine progress.
I use keepalived to make sure only one st2timersengine is running.
MASTER config:
global_defs {
# notification_email {
# your_email@example.com
# }
# notification_email_from keepalived@your_server.com
# smtp_server localhost
# smtp_connect_timeout 30
router_id LVS_DEVEL
}
vrrp_script chk_program {
script "/etc/keepalived/check_program.sh"
interval 2
weight -2
fall 2
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface ens4
virtual_router_id 51
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass Sts_platform
}
track_script {
chk_program
}
notify_master "/etc/keepalived/start_program.sh"
notify_backup "/etc/keepalived/stop_program.sh"
}
BACKUP config:
global_defs {
# notification_email {
# your_email@example.com
# }
# notification_email_from keepalived@your_server.com
# smtp_server localhost
# smtp_connect_timeout 30
router_id LVS_DEVEL
}
vrrp_script chk_program {
script "/etc/keepalived/check_program.sh"
interval 2
weight -2
fall 2
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface ens4
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass Sts_platform
}
track_script {
chk_program
}
notify_master "/etc/keepalived/start_program.sh"
notify_backup "/etc/keepalived/stop_program.sh"
}
scripts: check_program.sh
#!/bin/bash
status=$(systemctl status st2timersengine.service)
if [ $? -eq 0 ]; then
echo "st2timersengine.service is running normally."
exit 0
else
echo "Error: st2timersengine.service is not running normally."
exit 1
fi
start_program.sh
#!/bin/bash
systemctl restart st2timersengine.service
stop_program.sh
#!/bin/bash
systemctl stop st2timersengine.service
SUMMARY
I followed the documentation(https://docs.stackstorm.com/reference/ha.html#blueprint-box) to install a highly available st2, I can't disable st2timersengine after I add:
STACKSTORM VERSION
st2 3.8.0, on Python 3.6.9
OS, environment, install method
Ubuntu 18.04.6, install by apt.
Steps to reproduce the problem
add the configuration, and then restart st2:
Expected Results
I expect st2timersengine is not running.
Actual Results
Now I have duplicate rule evaluations.