SoftInstigate / restheart

Rapid API Development with MongoDB
https://restheart.org
GNU Affero General Public License v3.0
805 stars 171 forks source link

Support for metrics using dropwizard.io tooling #249

Closed lenalebt closed 6 years ago

lenalebt commented 6 years ago

Hi together, we're interested in some metrics from restheart (response times etc) for autoscaling in k8s. I already have seen that there is a ticket in your backlog in Jira (https://softinstigate.atlassian.net/browse/RH-23). Do you already have plans to implement that? If not, I'd volunteer for an implementation and create a PR, if that is fine for you.

mkjsix commented 6 years ago

Hi @lenalebt,

We are still missing metrics: if you could create a PR for that it would be very helpful, thank you!

lenalebt commented 6 years ago

Hi @mkjsix , nice! I have a few other things to finish before that one, but am planning to start that some time next week. Anything special you wish for the implementation (other than: keep dependencies low besides dropwizard itself)?

Cheers, Lena

ujibang commented 6 years ago

Hi @lenalebt

It would be super not only to collect the metrics but also allow to get then via a GET request.

If metrics are stored in a file, a LogicHandler can serve them; if they are stored in a collection (for instance at DB level) they are automatically available. In the latter case, having them stored in a collection whose name starts with _ (e.g. _metrics) makes it automatically read only. (_ prefixed resource name are reserved).

Going to configuration file I can think about 3 settings:

metrics:
     enabled: true | false 
     dbs:
           - db1
           - db2
     retention: <number_of_hours>

retention is quite important, old data should be automatically deleted to avoid ever growing _metrics collection size.

ps more info at https://softinstigate.atlassian.net/browse/RH-23

mkjsix commented 6 years ago

The metrics library already offers a metrics-servlets exposing everything in JSON via HTTP (http://metrics.dropwizard.io/3.1.0/manual/servlets/). This thing needs to be handled with care, we don't want a secondo HTTP server in RESTHeart, so I wonder if we could use the embedded Undertow to serve those, but it might require to modify or contribute to the metrics-servlets project (https://github.com/dropwizard/metrics).

BTW, it's worth looking at http://www.jhipster.tech/monitoring/ because they already have an AngularJS console which exposes Dropwizard metrics very nicely.

@lenalebt please share your thoughts about this, however we should keep the initial scope as small as possible and then add options incrementally.

lenalebt commented 6 years ago

Hi together, I'd totally generate a REST endpoint. I already used dropwizard e.g. in https://github.com/zalando-incubator/markscheider in a way similar to the requirements here, so I'd include that in Undertow and definitely not start a second HTTP server. I did not plan to include a dashboard though, since I think this should be part of the infrastructure.

My idea would be to expose metrics in 2 formats: once in the "default" dropwizard JSON format, as well as prometheus, since that is used within the context of k8s in many cases. Format selection would be done through standard HTTP content negotiation then (Accept header).

Would that be good for you?

I will still need some days to come to that though, since there is a bunch of other stuff in the pipeline...

Cheers, Lena

mkjsix commented 6 years ago

@lenalebt sounds good to me, thank you!

BTW, we love to hear stories about how people use RESTHeart, so if you have time please let us know. We plan to add some of these in the Web site. No need to dig into details, only the high level business or technical case.

lenalebt commented 6 years ago

@mkjsix I can provide that. I think you already had contact with some colleagues from over here in the past (@henczi-espirit and @boesebeck-espirit).

Any chance for a short chat in the next days? I am digging where to put things, just wanna check back with you that I identified the right spots in your code to put it before making the implementation effort - and check back that it is what you had in mind / is what we need / is industry-standard :).

  1. Creation of RequestContext -> Use it to identify start time of a request. I am unsure if that is too late already to measure precisely, since some handlers have already been gone through.
  2. RequestDispatcherHandler.putPipedHtppHandler -> use that to instrument metrics. I wanted to use both METHOD and TYPE to group measurements, as well as group them. Example:

All of them would then get a timer looking like this in JSON:

{
      "count" : 26,
      "max" : 97.0,
      "mean" : 7.1769003048531985,
      "min" : 3.0,
      "p50" : 4.0,
      "p75" : 6.0,
      "p95" : 7.0,
      "p98" : 97.0,
      "p99" : 97.0,
      "p999" : 97.0,
      "stddev" : 16.224186027028622,
      "m15_rate" : 0.22549083783430082,
      "m1_rate" : 0.5690742718315064,
      "m5_rate" : 0.27608595098831373,
      "mean_rate" : 1.6052463271940727,
      "duration_units" : "milliseconds",
      "rate_units" : "calls/second"
    },

I might add a response code as well as a leaf (e.g. DOCUMENT.PUT.201 or at least DOCUMENT.PUT.2xx) to be able to distinguish between (normally slower) 2xxs and (normaller faster) 4xxs.

Prometheus format is more or less the same, just flattened.

I would not store time series data in the database. There are databases that are better suited for that (graphite, influx, ...), and dropwizard is designed to auto-aggregate data with very low computation overhead, and memory overhead (they don't store individual request results). Additionally, in k8s you would need prometheus format first-hand (you can't use JSON natively), so the collection does not necessarily help.

So, I think, a short chat would be the easiest thing to do :)? You can reach me via mail as well (see gh).

ujibang commented 6 years ago

Hi @lenalebt, I will reach you via email to schedule a chat.

Bye