Prometheus Support for Metrics logging.

TLDR

Dynolog provides system telemetry at Meta as well as in open source environments. Metric logging using Prometheus - an industry standard framework for logging/exporting metrics. This can also be leveraged by Meta AI Research super cluster and other open source infra based clusters.

Prometheus

Prometheus is an open source tool for metrics collection and publishing. One can use it to monitor metics remotely, graph them as well as integrate with Grafana for visualization.

A core concept in Prometheus is its data model. It consists of labels - a list of attributes of entities to associate with the metric (ex “ {nodename, gpu id}”), and metrics - numerical values that represent points in a time series..
Prometheus server runs on the box or node. Typically, it uses a pull model, obtaining the latest values of metrics and labels. (Visualized in diagram above)

Implementation

We can leverage the library https://github.com/jupp0r/prometheus-cpp/ that is straightforward to use.

facebookincubator / dynolog

Prometheus Support for Metrics logging. #148

TLDR

Prometheus

Implementation