In diagnosing a recent outage, it was noted that the am2alertapi counters were not increasing values. After some thought it became clear that the problem is with multiple workers they run as separate processes and thus each worker keeps its own metrics independently. (The worker count was increased in October 2021 in https://github.com/UWIT-UE/am2alertapi/commit/4539a9ced987c19789263bf9b84ef7d76143738d)
In diagnosing a recent outage, it was noted that the am2alertapi counters were not increasing values. After some thought it became clear that the problem is with multiple workers they run as separate processes and thus each worker keeps its own metrics independently. (The worker count was increased in October 2021 in https://github.com/UWIT-UE/am2alertapi/commit/4539a9ced987c19789263bf9b84ef7d76143738d)
Researching suggests a solution either using prometheus_client multiprocess mode, example here: https://github.com/amitsaha/python-prometheus-demo/tree/master/flask_app_prometheus_multiprocessing
Or add worker number as a metric label and aggregate in prometheus.
Here's a reference: https://echorand.me/posts/python-prometheus-monitoring-options/