StackStorm / st2

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html
https://stackstorm.com/
Apache License 2.0
6.1k stars 746 forks source link

Fault-tolerance configuration for messaging/rabbitmq #4987

Open igcherkaev opened 4 years ago

igcherkaev commented 4 years ago

@nmaludy asked to open an issue to discuss making certain retry parameters configurable. The idea here is to make at least these two configurable via st2.conf:

https://github.com/StackStorm/st2/blob/v3.2/st2common/st2common/transport/connection_retry_wrapper.py#L33-L35

I see them in the [messaging] section: https://github.com/StackStorm/st2/blob/v3.2/conf/st2.conf.sample#L183-L203

And also make the following errors visible without enabling debug to allow users see/monitor errors affecting st2 performance:

  1. https://github.com/StackStorm/st2/blob/v3.2/st2common/st2common/transport/connection_retry_wrapper.py#L143-L144
  2. https://github.com/StackStorm/st2/blob/v3.2/st2common/st2common/transport/connection_retry_wrapper.py#L155-L156

And last, but not least, perhaps it'd be nice to implement rabbitmq heartbeat feature to ensure connections are kept alive if they are traversing firewalls or other connection trackers?

Per @nmaludy :

heartbeat (float) – Heartbeat interval in int/float seconds. Note that if heartbeats are enabled then the heartbeat_check() method must be called regularly, around once per second.

Which, I believe, is not being used within st2 source code.

arm4b commented 4 years ago

To cross-link related items, here is an issue for adding RabbitMQ heartbeat configuration in st2.conf: https://github.com/StackStorm/st2/issues/4780