SODALITE-EU / monitoring-system

Monitoring system description and config files.
0 stars 1 forks source link

Dynamic alerting rules for refactoring #10

Open rosogon opened 3 years ago

rosogon commented 3 years ago

The M18 Rule-based refactorer used static rules to alert on high|load cpu usage. These should be dynamic in the sense that a new application should add its own rules. According to https://prometheus.io/docs/prometheus/latest/configuration/configuration/, rule_files is a glob of files, so this could be addresed as:

  1. rulefiles: - /etc/prometheus/rules*
  2. the deployment of an application installs the rule files as /etc/prometheus/rules_
  3. prometheus server is restarted

Still, the generation of the rule files from the application SLA is needed, but to be addressed in other ticket.

rosogon commented 3 years ago

This relates to the problems raised by a multi-tenant Prometheus. https://github.com/cherti/PromAuthProxy is a project that could help on that.

The other alternative is to modify the approach and use one Prometheus per deployment.

pmundt commented 3 years ago

I've tested this under Kubernetes for the Edge cases as well, and have had success with placing the prometheus config in a Kubernetes configmap and injecting a monitoring sidecar that dispatches a POST to the prometheus server config reload endpoint whenever the configuration changes. The process is roughly described here: https://www.weave.works/blog/prometheus-configmaps-continuous-deployment/

jramosrivas commented 3 years ago

At the moment the ruleserver offers a REST API to add alerts and remove them from the Prometheus server, there is a description of its functioning in the readme.