StackStorm / st2

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html
https://stackstorm.com/
Apache License 2.0
6.04k stars 746 forks source link

Delay Policy Concurrency = 1 #5905

Open mgazzali opened 1 year ago

mgazzali commented 1 year ago

Lots of action stuck in schedule state as have delay policy concurrency =1

I calculate the number of action that stuck is around 300+, does this caused because of the delay policy?

Problem statement: A lot of automation stuck in delayed status, it is an expected behavior as the policy dictate that only one action should run at one time, however we have 300 webhook coming in and this delay the rest 299 and somehow the instance suddenly in hung status, the delay status making all successor action become a zombie action.

Workaround Rename the action that delayed and rerun.

Stackstorm Version 3.7

Setup & Config Kubernetes HA System Requirement is aligned with St2 doc

image

mgazzali commented 1 year ago

The action is delay as below image

mgazzali commented 1 year ago

Example of the delay policy is as below

name: ap_auto.concurrency description: Limits the concurrent executions for ap_auto action. enabled: true resource_ref: autonomous_operations.ap_auto policy_type: action.concurrency parameters: action: delay threshold: 1

mgazzali commented 1 year ago

Hello, anyone can shed some light on this? The question is; does having this type of delay policy will impact the workload and resource for scheduler?