canonical / prometheus-k8s-operator

https://charmhub.io/prometheus-k8s
Apache License 2.0
21 stars 35 forks source link

Break the Rules monolith. Add support for recording rules. #433

Closed rbarry82 closed 1 year ago

rbarry82 commented 1 year ago

Issue

prometheus_scrape doesn't currently support recording rules via any intentional mechanism -- they can be accidentally loaded via the AlertRules pathway, but none of the log messages or databags will be meaningful. We should support this use case, separated.

Solution

The actual handling of the rules is mostly identical, outside of keys to check for if we're validating single rule formats, and a string to use in log messages/alerts.

Make AlertRules into Rules(ABC) with an @abstractmethod+@property which returns a Literal _RulesType, which child classes implement to indicate the string to interpolate where needed. Similarly, extract the handling for alerts() into a more generalized method which takes the type as an argument in addition and maps it to a databag. All of the handling there is also identical (for now).

Remove every reference I could find to alert in docstrings, docs, and variable names where it should not be present, substituting the interpolated value or appending/amending docstrings to speak about both types. Rename CosTool.validate_alert_rules() to CosTool.validate_rules() (it isn't specific). Rename exceptions to more generic names.

Add constructor args and events for recording rules.

In this way, if/when we ever do need different handling (such as if we do not want to support "single rule file" formats for recording rules, or Loki, or other) or the relation data needs/APIs for either type diverge, they are already cleanly separated without breaking the current API.

In addition, this restructuring makes it possible to, once pydeps for charmcraft is implemented, move the [T]Rules machinery elsewhere, rather than trying to keep prometheus_scrape, prometheus_remote_write, loki_push_api, and others in sync, with independent patches needed any time a bug or feature comes up in any of them (with a little handling to smash metadata into scrape_metadata, even as just an optional metadata_key arg in the constructor for Rules())

Closes #428

Context

Also, mock.create_autospec() exists, and it's great. _dedupe_... was moved back into the class it belongs to (the only place that uses it), and test_functions uses create_autospec to test that function in an isolated way. We don't need to extract methods to the top level to make writing unit tests easier, and, as a common refrain, doing so makes new readers/authors or anyone writing a patch juggle more mental state while trying to keep track of whether it's really doing something generalized, used elsewhere, etc.

The problem domain of _dedupe_... is in the only class which ever calls it, and create_autospec() makes unit testing it in an isolated fashion just as easy as importing it directly from the top-level.

Testing Instructions

Deploy. Make sure there aren't regressions.

Release Notes

Add support for recording rules.

sed-i commented 1 year ago

The plot thickens? Federated rule groups:

name: MyGroupName
source_tenants: ["tenant-a", "tenant-b"]
rules:
  - record: sum:metric
    expr: sum(metric)
rbarry82 commented 1 year ago

Federated rule groups

Probably only if we ever get to the point of multi-tenancy, but at least cos-tool already supports this, and it's nominally easier to add (excluding Prometheus itself) if we get there.

rbarry82 commented 1 year ago

Closing to move this to cosl instead, with unification between all the variants (mostly the same as this patch, just small tweaks to meet in the middle for Prom+Loki)