Extend metric types and insight into rippled operation/state

MarkusTeufelberger commented 5 years ago

Currently rippled exposes metrics and/or information about its current operation via:

the server_state/server_info RPC call (and a few other ones, such as counters or print)
the /crawl endpoint
a statsd style push-based metrics implementation (via the lfkabbVF [library formerly known as beast by Vinne Falco])
the "normal" log (with various logging levels) and the perf log

I'd like to suggest to add 2 more methods of exposing information to this, maybe eventually replacing some of them (statsd + maybe the perf log + maybe the /crawl endpoint):

OpenMetrics (essentially the metrics format for https://prometheus.io standardized) for metrics data. Having data available in this format would really help in monitoring rippled instances, since statsd is push based and modern metrics/monitoring platforms are often pull based.
OpenTelemetry (a merger of OpenTracing and OpenCensus) - OpenTracing support would help in analyzing requests being processed by rippled (take a look at https://www.jaegertracing.io/ for example and imagine you'd be able to trace a request from your external load balancer back to your RocksDB call(s) of the rippled cluster node and back!), OpenCensus is a library that helps with collecting metrics and supplying various (fast!) primitives. These then could be exposed in OpenMetrics format.

jerryzhou196 commented 3 months ago

@intelliot @kennyzlei @shawnxie999 Would be really interested in taking this feature up for fun. I'm actually an intern at Ripple working on the data platform team and just finishing up my last two weeks.

I was working on a feature with Databricks that involved aggregating Apache Spark's Prometheus metrics with an OTEL collector and pushing to Grafana across our data platform for cost observability. It's real-time, nearly seamless to integrate with Grafana, and has helped A TON with our observability.

Would be really interested in doing something similar to what Spark does but for rippled's metrics . I think we could do it by simply converting all the existing metrics into a prometheus time-series format and exposing it on some endpoint, providing an easy adaptor for anyone interested in visualizing the metric data in time series form.

Having this feature would be really useful to integrate with modern observability stacks involving things like OTEL collector and Grafana.

Is there anyone I can work with to implement this? Also, would this qualify for a grant?

kennyzlei commented 3 months ago

@jerryzhou196 thank you for your interest! For the Clio project, native prometheus support was added recently by @kuznetsss https://github.com/XRPLF/clio/blob/develop/docs/metrics-and-static-analysis.md I think he may be a good person to reach out to for insights on how this could be applied to rippled.

In terms of grants, I believe this typically applies to individuals and organizations outside Ripple

XRPLF / rippled

Extend metric types and insight into rippled operation/state #2962