autometrics-dev / autometrics-go

Easily add metrics to your system -- and actually understand them using automatically customized Prometheus queries
https://autometrics.dev
Apache License 2.0
143 stars 9 forks source link

Ideas for Alert generation #16

Closed gagbo closed 1 year ago

gagbo commented 1 year ago

Getting Prometheus alerts generation

As a reminder, the rust implementation uses this kind of syntax to trigger the generation of alerts for a single function:

#[autometrics(alerts(success_rate = 99.9%, latency(99% <= 200ms)))]
pub async fn handle_http_requests(req: Request) -> Result<Response, Error> {
  // ...
}

We want to provide a similar experience with the Go version, by exploiting the //autometrics:doc directive currently used per-function. This issue proposes a design for the feature as well as a few technical solutions to create the feature.

Reusing Sloth

The rust implementation is relying on Sloth, there's no reason to avoid it here. If we're lucky enough, as it's written in Go, we might be able to reuse the types from its library to serialize alerting rules (We can just build the relevant Go structure and Marshal those)

New argument to autometrics directive

We probably want to specify that the generator should both create documentation and alerts, so a good matching syntax would be

//autometrics:doc,alerts --success-rate 99.9 --latency-perc 99 --latency-threshold 200ms
func handleHttpRequests() (err error) {
        // ...
        return nil
}

That would allow to just split the directive arguments by ,, and then use something as shlex to parse the alerts parts just like CLI flags.

New argument to the go:generate directive

The //go:generate autometrics directive will need to take an extra argument to point to the global location of the Sloth alerts file, otherwise we won't know where to write. The syntax could be

//go:generate autometrics --alerts-file ../../autometrics.sloth.yaml

Having a CLI flag instead of a positional argument (i.e. //go:generate autometrics ../../autometrics.sloth.yaml) helps with backwards compatibility later.

Data races

We want the calls to the generator to concatenate all the Sloth rules to a unique file, so that sloth (the binary) can generate the prometheus rules we want.

autometrics (the go-generator) is only called once per file, so we can safely generate a Sloth fragment per single call to autometrics. Once that's done, we can use a small file-lock library like fslock to query a global, out-of-process, lock on the resulting global file in order to safely concatenate the Sloth fragments in the same file

We would still need to add a step at the end to generate the Prometheus rules. Maybe each autometrics call could use their lock to also keep it while exec-ing sloth to regenerate the Prometheus rules. That would mean that the last call to get the lock will be the one producing the final file. As long as we can guarantee that the last process that has the lock, also is the process with the last version of the sloth rules, we're fine.

gagbo commented 1 year ago

Closing as alert generation will instead follow what's done on Rust side to keep compatibility with autometrics rules