Open MarkusTeufelberger opened 5 years ago
@intelliot @kennyzlei @shawnxie999 Would be really interested in taking this feature up for fun. I'm actually an intern at Ripple working on the data platform team and just finishing up my last two weeks.
I was working on a feature with Databricks that involved aggregating Apache Spark's Prometheus metrics with an OTEL collector and pushing to Grafana across our data platform for cost observability. It's real-time, nearly seamless to integrate with Grafana, and has helped A TON with our observability.
Would be really interested in doing something similar to what Spark does but for rippled
's metrics . I think we could do it by simply converting all the existing metrics into a prometheus time-series format and exposing it on some endpoint, providing an easy adaptor for anyone interested in visualizing the metric data in time series form.
Having this feature would be really useful to integrate with modern observability stacks involving things like OTEL collector and Grafana.
Is there anyone I can work with to implement this? Also, would this qualify for a grant?
@jerryzhou196 thank you for your interest! For the Clio project, native prometheus support was added recently by @kuznetsss https://github.com/XRPLF/clio/blob/develop/docs/metrics-and-static-analysis.md I think he may be a good person to reach out to for insights on how this could be applied to rippled.
In terms of grants, I believe this typically applies to individuals and organizations outside Ripple
Currently
rippled
exposes metrics and/or information about its current operation via:server_state/server_info
RPC call (and a few other ones, such ascounters
orprint
)/crawl
endpointstatsd
style push-based metrics implementation (via thelfkabbVF [library formerly known as beast by Vinne Falco]
)I'd like to suggest to add 2 more methods of exposing information to this, maybe eventually replacing some of them (statsd + maybe the perf log + maybe the /crawl endpoint):
rippled
instances, sincestatsd
is push based and modern metrics/monitoring platforms are often pull based.rippled
(take a look at https://www.jaegertracing.io/ for example and imagine you'd be able to trace a request from your external load balancer back to your RocksDB call(s) of therippled
cluster node and back!), OpenCensus is a library that helps with collecting metrics and supplying various (fast!) primitives. These then could be exposed in OpenMetrics format.