cda-group / arcon

State-first Streaming Applications in Rust
https://cda-group.github.io/arcon/
Apache License 2.0
175 stars 17 forks source link

Metrics Integration #225

Closed Sanskar95 closed 3 years ago

Sanskar95 commented 3 years ago

This PR integrate two types of metrics: Hardware counters Software metrics enabled using following feature flags respectively.

hardware_counters

metrics

Max-Meldrum commented 3 years ago

Nice work!

Below are some high-level comments, I'll also add some smaller review comments later.

Feature flags and default recorder

We can remove the arcon_metrics feature as we are not using metrics-util + that I think prometheus exporter should be its own feature flag under "prometheus_exporter". That is, if only metrics flag is enabled then we should default to a LogRecorder.

prometheus_exporter = ["metrics-exporter-prometheus", "metrics"] hardware_counters = ["perf-event", "metrics"]

// in src/metrics/
pub struct LogRecorder {
  logger: ArconLogger,
}

impl Recorder for LogRecorder {
   fn increment_counter(&self, key: &Key, value: u64) {
      info!(self.logger, "Key {}, Value {}", key, value);
   }
   ... so on
}
// Application::new()
#[cfg(all(feature = "prometheus_exporter", feature = "metrics"))]
// Prometheus recorder
#[cfg(feature = "metrics")]
// LogRecorder

Runtime_Metrics

For this part, there is no need to have the MetricValue trait and individual implementations of Counters. We can use the metrics macros directly.

// Node Setup
register_counter!("node_name_id_epoch_counter");
register_counter!("node_name_id_watermark_counter");
// Node Execution
increment_counter!("node_name_id_epoch_counter");
increment_counter!("node_name_id_watermark_counter");

We can still keep a NodeMetrics struct for other stuff that is needed. Like the Meter.

pub struct NodeMetrics {
  inbound_throughput: Meter,
}
// Node
self.metrics.inbound_throughput.mark_n(message.total_events());
gauge!("node_name_id_inbound_throughput", self.metrics.inbound_throughput.get_mean_rate());

Histogram

  1. I realised we should probably use histogram for perf-events rather than gauge.
  2. We might as well add histogram to the OperatorContext while we are at it.
Max-Meldrum commented 3 years ago

Looks good!

Closes #181, closes #215, closes #216, closes #217