am2alertapi fix metrics reporting for multi-worker configuration

UWIT-UE / am2alertapi

Prometheus alertmanager to UW alertAPI

GNU General Public License v3.0

0 stars 1 forks source link

In diagnosing a recent outage, it was noted that the am2alertapi counters were not increasing values. After some thought it became clear that the problem is with multiple workers they run as separate processes and thus each worker keeps its own metrics independently. (The worker count was increased in October 2021 in https://github.com/UWIT-UE/am2alertapi/commit/4539a9ced987c19789263bf9b84ef7d76143738d)

Researching suggests a solution either using prometheus_client multiprocess mode, example here: https://github.com/amitsaha/python-prometheus-demo/tree/master/flask_app_prometheus_multiprocessing

Or add worker number as a metric label and aggregate in prometheus.

Here's a reference: https://echorand.me/posts/python-prometheus-monitoring-options/

UWIT-UE / am2alertapi

am2alertapi fix metrics reporting for multi-worker configuration #22