cs3org / reva

WebDAV/gRPC/HTTP high performance server to link high level clients to storage backends
https://reva.link
Apache License 2.0
165 stars 113 forks source link

Expose all prometheus metrics under /metrics #1070

Open labkode opened 3 years ago

labkode commented 3 years ago

@Daniel-WWU-IT @ishank011 I agree that we should expose all the necessary info under one endpoint only.

Daniel-WWU-IT commented 3 years ago

@labkode Makes sense, and also ensures that every site exposes the necessary system info. I will merge the sysinfo service into the Prometheus service right away.

Daniel-WWU-IT commented 3 years ago

@labkode @ishank011 See PR #1071 - the system information is now included in the main metrics.

I tried to use the OpenCensus Prometheus API, though I cannot add custom labels to a single metric; I could only add them to all exported metrics, which would result in a mess. There are other ways to export metrics which include labels, though these can't be combined with what is already there. So I chose to stick with using the Prometheus API directly (this is what OpenCensus does internally just as well); I only had to create a Prometheus registry shared by OpenCensus and my SysInfo system, which seems acceptable to me.

ishank011 commented 3 years ago

@Daniel-WWU-IT after discussions, we decided that we do not want to have info about or carry out operations for any stats that some service or package is exposing in the prometheus HTTP service. The current approach forces us to pass the prometheus client registry to the sysinfo package, and as we continue adding stats from different services, we'd have to do this for all of those and this wouldn't scale well.

So please take a look at this PR https://github.com/cs3org/reva/pull/1105 which uses the opencensus stats package to add labels. We can modify the sysinfo package to work this way and this will take away all dependencies from the HTTP service.

@redblom on similar lines, this is the approach we want to follow for the metrics package as well. The prometheus client would not impose any restrictions (regarding the driver) on any services. Each of those can just register themselves. Let's follow up on your PR regarding this.

Daniel-WWU-IT commented 3 years ago

@ishank011 I took a look at your PR and this seems like a good solution. I will incorporate the necessary changes and create a new PR.

redblom commented 3 years ago

@Daniel-WWU-IT @ishank011 : I was having a discussion on Gitter with @ishank011 regarding the accounting metrics part and I would like to proceed with that here (@Daniel-WWU-IT I'm aware that you may already started with additional coding based on PR #1105 on your part). I noticed that code is being moved around a bit due to changing insight. Anyways both accounting and sysinfo metrics should follow the same approach and implementation so I thought it would be wise to have a common agreement on that.

redblom commented 3 years ago

@Daniel-WWU-IT @ishank011 so initially I was under the impression that any metric (from any! module) would be registered via the metrics pkg and the prometheus service could then simply initialize that pkg and metrics data would be published. So that's how I came to PR #973. Also in this model metrics data are 'simply' put into a repository (file, db, ...) by the metrics producing module which is then read by the metrics pkg to be registered. But then SysInfo came about and that did not follow that route. So from @ishank011 I now understand that each module may register their metrics themselves.

This however leaves the issue of where/how to initialize these metrics modules. It can be done in the prometheus service, the same way dummy metrics is initialized. But that would mean that for each new metrics producing pkg the prometheus service would need to be expanded with a new import. So, can we come up with a solution where a package can be initialized via some configuration instead of hard coding into prometheus service?

Another option would be that we make the requirement for new metrics producing modules to put their metrics data in the repository through a very simple api (how I originally envisioned this). From thereon the metrics pkg picks up the data from the repository and does all the registering.

Daniel-WWU-IT commented 3 years ago

@redblom @ishank011 I don't know if writing metrics to a temporary file which is then read and used to, well, create metrics again, is such a good solution. You'd have to support quite a lot of features -- basically everything the OpenMetrics protocol offers -- to make everyone happy. In the SysInfo case, for example, the one metric it exposes has no meaningful value, it's all about accompanying labels.

My initial suggestion would be to first settle on a "metrics value" API everyone should use to create metrics... the "bare" Prometheus Client API, the various ones OpenCensus has, something else. Then Reva could offer some global registration functions (in a dedicated package I'd say) people can use to register their own metrics. The rest is background-magic. In your dummy driver, you're currently using OpenCensus views for this, so we could just specify that people need to create such a view and register that globally via a function Reva offers.

But the most important thing might be: This MUST be documented clearly -- everyone needs to know what internal features (or guidelines/idioms) Reva sports in order to use them.

redblom commented 3 years ago

@Daniel-WWU-IT @ishank011

@redblom @ishank011 I don't know if writing metrics to a temporary file which is then read and used to, well, create metrics again, is such a good solution. You'd have to support quite a lot of features -- basically everything the OpenMetrics protocol offers -- to make everyone happy. In the SysInfo case, for example, the one metric it exposes has no meaningful value, it's all about accompanying labels.

Yeah that has crossed my mind as well though accounting metrics are quite straightforward in that regard. But I understand that's not necessarily always the case. For accounting metrics it was this idea that gives the partners an opportunity right now to dump actual data instead of dummy data. But I also think this model is still under investigation.

My initial suggestion would be to first settle on a "metrics value" API everyone should use to create metrics... the "bare" Prometheus Client API, the various ones OpenCensus has, something else. Then Reva could offer some global registration functions (in a dedicated package I'd say) people can use to register their own metrics. The rest is background-magic. In your dummy driver, you're currently using OpenCensus views for this, so we could just specify that people need to create such a view and register that globally via a function Reva offers.

I agree, for me OpenCensus views work fine so I am happy to keep using these. If that works for you let's settle on that then.

But the most important thing might be: This MUST be documented clearly -- everyone needs to know what internal features (or guidelines/idioms) Reva sports in order to use them.

Absolutely! A recurring theme :)

Daniel-WWU-IT commented 3 years ago

@redblom @ishank011

From Ishank's PR, views should work fine for me, yeah. Haven't worked on this yet (luckily!), but I think we can agree on using them for all metrics. I just wasn't aware of how to add custom labels (aka tags in OpenCensus), that's why I didn't use them in the first place.

So that being settled, we should then come up with a central Reva-Metrics-Registry where anyone can just add a new view on the fly. From an API point of view, it should just be no more than something like RegisterMetricsView(myView)... something like that. Not sure if you can easily add new views on the fly, but at least I'd suppose so.

redblom commented 3 years ago

@Daniel-WWU-IT @ishank011 Hmm, I like the idea to make it as easy as possible for others to register metrics, but just this single method would not be worth the effort I think. So, there are 3 steps involved here:

  1. Create the view (eg. for number of site users): the measure/metric: m := &Metrics{ NumUsersMeasure: stats.Int64("cs3_org_sciencemesh_site_total_num_users", "The total number of users within this site", stats.UnitDimensionless) } and the view: v := &view.View{ Name: m.NumUsersMeasure.Name(), Description: m.NumUsersMeasure.Description(), Measure: m.NumUsersMeasure, Aggregation: view.LastValue(), }
  2. Register the view (as simple as): view.Register(v)
  3. Record metric data periodically: stats.Record(context.Background(), m.NumUsersMeasure.M(...10397...))

So if we facilitate central registering that would be step 2 which does not take away a lot of work. Now recording is initiated the 'user'. But maybe we could facilitate something there as well to make this central registering more valuable, but I'm not sure if there's really something to gain ... what do you think?

Daniel-WWU-IT commented 3 years ago

@redblom @ishank011

Of course there is a bit more involved than a single function call :) That was just about how to register a new view. Since OpenCensus already exposes such global variables for registering etc., it might at first glance not be worth the effort to put yet another layer on top of this. Then again, hiding some details and possibly putting the second part of step 1 and step 2 into a small extra module might make things easier API wise. And so people would only need to take a look into our "metrics" package and quickly understand what to do. So a user only needs to create such a Metrics objects and pass that along with a few extra params to our internal metrics registration function and later only needs to record new stats (and that could also be made a bit simpler). I'd have to look more closely at how things work in OpenCensus in order to be able to really judge how feasible and helpful this could and would be, though...

labkode commented 3 years ago

OpenCensus is already an abstraction for metrics, having our own abstraction is an overkill, at least for now, where the focus is on MVP. Besides that, OpenCensus is well-document and used in another projects, so people jumping into it should be more comfortable that our custom plugin.

Daniel-WWU-IT commented 3 years ago

@redblom @ishank011 @labkode I have played around with OpenCensus and rewrote the Prometheus exporter to use that library. Works quite well, so I think we can stick to this solution for now.

The question about how to register the various views still remains, though. A simple loader like for the various services at least doesn't work in my case at it is now, since the system information hasn't been initialized at loading time yet.

Daniel-WWU-IT commented 3 years ago

@redblom @ishank011 I rewrote the system info metrics to use OpenCensus: https://github.com/cs3org/reva/pull/1114. For now, I just register the exporter in the Prometheus service, but as discussed, we should find a better solution for this.

ishank011 commented 3 years ago
ishank011 commented 3 years ago

An important thing is to not pollute the prometheus HTTP service. Each service registering the metrics would need to have the prometheus service running on its host. Adding details to every instance of the service would add a lot of duplication and redundancy.

Daniel-WWU-IT commented 3 years ago

I made yet another change to https://github.com/cs3org/reva/pull/1114 and moved the metrics registration out of the Prometheus package. There is no need for a loader module either, so this solution should be the cleanest way.

redblom commented 3 years ago

@Daniel-WWU-IT : didn't you had an issue somewhere with shared configs not yet being available upon pkg initialization. The metrics pkg is supposed to use sharedconfig only but that is only initialized when other configs are ('explicitly') initialized. Did you find a solution for this?

Daniel-WWU-IT commented 3 years ago

@redblom Not exactly. I somehow need to get the version number etc. into the sysinfo package, and I do this with a simple initialization call in the main.go file of revad. And this call has of course not been done yet during package initialization. My workaround was to register the sysinfo metrics in the very same call I use to initialize sysinfo.

redblom commented 3 years ago

@Daniel-WWU-IT : thank you