Closed josevalim closed 4 years ago
/cc @binaryseed @arkgil
I'm not sure why summary replaces the sum? Do you mean that it should be included in summaries provided by reporters?
Yes, that’s my understanding. OpenTelemetry doesn’t provide sum either.
From [the spec] it looks like OpenTelemetry gauge is closer to what we call a sum. Maybe we should wait a bit until OpenTelemetry spec is completed?
Wouldn't gauge be last_value
?
Yes, that's right, but OpenTelemetry also allows increasing and decreasing gauges by given value. So our sum could be OpenTelemetry gauge where we add/subtract, and with last_value we only set it to absolute value.
Just saw this and wanted to mention that I think the OpenTelemetry metrics API defined in the specification repo is fairly complete now. And there is an open PR to add the SDK spec https://github.com/open-telemetry/opentelemetry-specification/pull/347
I should have mentioned this earlier since they've been looking for feedback for a while now, I've been too busy completing the tracing stuff that I hadn't even thought about metrics.
There are also Metrics "office hours" every Thursday at 11am PST https://lightstep.zoom.us/j/872020227
I don't think it has views
yet, like OpenCensus. We might not have the pain and incompatibility between Telemetry.Metrics
and OpenTelemetry like we do with it and OpenCensus if we start considering it now.
Not sure if Telemetry.Metrics
would be at the API or SDK level of OpenTelemetry metrics, I guess that is the first thing I should figure out.
We can discuss more at the next Observability WG meeting which I plan to schedule for next week.
@tsloughter thanks for these links, it's an interesting read. I see that the spec has diverged from OpenCensus histograms and summaries, and provides a way to record arbitrary events. Happy to discuss this at the meeting!
So the current status in OpenTelemetry is that they are not tracking metrics per-se anymore. They are getting closer to Telemetry itself than Telemetry.Metrics. So I guess this is our call to make. @arkgil do you see any use for sum
now that we have summary? Does any of the backends (stats, prometheus, newrelic, etc) have a use for sum? /cc @bryannaegele @deadtrickster @binaryseed
@josevalim hm? why do you say that?
@tsloughter afaik, they removed the "summary" and "distribution" parts in favor of something that just emits the events (they do have something like counter+gauges though). Or have I misunderstood it?
@josevalim I'm pretty sure in the SDK it defines summaries https://github.com/open-telemetry/opentelemetry-specification/blob/6f3fbdbbad4a036f817ab6cb76d0b88e25fd9ae2/specification/sdk-metric.md#quantile-definition
The SDK isn't finalized yet, hopefully soon, so I'm not sure.
We should hop on the metrics call on January 2nd or have a call with Joshua who has written up the SDK.
That said, having to calculate as little as possible in BEAM would be best. If I was concerned about the amount of traffic increase in not aggregating I'd be all for pushing out all measurements and having an agent/collector do all the aggregation.
Oh, I was reading the Metrics API bit and completely ignored the SDK ones. Yes, let's address it in the next meeting.
Sum can be handled by Gauge
in Prometheus, so it should be OK to remove for that use case.
The Summary
metric has sum
and count
as the base requirement for implementation of that view but it's over a sliding time window. That is all I plan to support to start. I have enough concerns about performance of those calculations. I'll likely add support for doing quantiles but with a blaring warning and telemetry events for gauging impact.
You should check out the Observer OTEP as well https://github.com/open-telemetry/oteps/pull/72
Feedback from the people here would be great to help the final product work well for elixir/erlang :)
At this point, New Relic "dimensional metrics" support gauge
(point-in-time), count
(occurrence count of events) and summary
(pre-aggregated min, max, avg, sum, etc)
The sum
seems like it'd map directly to NewRelic's count
and partially to the summary
Side note, From my experience, pre-aggregating is an un-aviodable part of the whole picture. At high throughput, it's nearly impossible to get statistically accurate metrics w/o pre-aggregating because sampling becomes necessary. Also, doing a little math is always faster than a network call :)
StatsD (including DataDog) both have sum
as a part of another metric (DataDog "summary" and vanilla StatsD "timing"), so we should be fine on that end.
One concern I have, though, is that calculating only sum would be probably cheaper than having it computed along with a few other metrics. This might be even more important when we aggregate in process.
Not sure why there is need for separate backends? We should really be able to pool the work instead of duplicating it all over (opentelemetry, telemetry.metrics.prometheus/statsd, new relic).
I don't think breaking stuff out of OpenTelemetry if you find it has changed the API or become an abandoned project would be that bad, so I'd suggest taking a look, @bryannaegele, and we should discuss.
Especially since OpenTelemetry is an open project that isn't finished and looking for more input to guide the project forward. Meaning we can push for changes if they are needed.
.net is going an interested direction with this, sort of doing doing similar to telemetry and telemetry.metrics but based on working with OpenTelemetry. I'll try to find more details on that to better show what I mean.
Question: In light of the conversation in #68, should this be closed?
// @arkgil
@mcrumm not sure about that yet, I want to bring this up at the today's Observability WG meeting.
Just FYI, related to #68 and the follow-up conversation with @arkgil in telemetry_metrics_statsd
, I've just opened beam-telemetry/telemetry_metrics_statsd#30 which leans on the sum metric. If the previous meeting changes that direction, then I can update the PR accordingly
Sorry, forgot to post an update. Let's close this issue for now, we won't be able to remove the sum metric until Prometheus reporter implements summary.
Also, it's not yet clear if we actually want to remove the sum completely.
Thank you all for your input here!
Now that we have added summary, it probably makes sense to remove sum. This will also makes us fully aligned with opentelemetry types.