dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.05k stars 4.69k forks source link

Frequency of Metrics report #108152

Open Leonardo-Ferreira opened 1 week ago

Leonardo-Ferreira commented 1 week ago

Is there a way to increase the frequency of metrics like "Memory Usage" or "Cpu Usage"?

Perhaps not necessarily increase the frequency of reporting but the capturing, so 1 report can have multiple data points? like, capture every 5 sec but report once every 120 sec along with the minimum value observed, the maximum, the average, the median and the std deviation

ericstj commented 1 week ago

cc @dotnet/area-system-diagnostics-metric I think you can do that during collection, the metrics themselves are meant to be very fast and have low overhead so as to minimally impact the code being monitored.

You can read more about collection here: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics-collection

tarekgh commented 1 week ago

@Leonardo-Ferreira how you are listening to the metrics? are you using specific tools or manually collecting it (in-proc or out-of-proc)?

noahfalk commented 1 week ago

Perhaps not necessarily increase the frequency of reporting but the capturing, so 1 report can have multiple data points?

Nothing does that entire process out-of-the-box that I am aware of, but there are some building blocks you could use to make a custom solution that does that. Here is one possibility:

  1. Define a new Histogram with whatever name you like that will do in-process aggregation of some CPU or memory data. Use multiple histograms if you want to do this for more than one metric.
  2. Create a timer that polls at whatever frequency you like for collecting data in-proc.
  3. When the timer triggers, invoke an API such as Process.GetCurrentProcess().UserProcessorTime to read the value you care about. Pass that value to the Histogram.Record() to store it.
  4. Use your tool of choice (OpenTelemetry, Prometheus.NET, dotnet-counters, dotnet-monitor, etc) to report the Histogram statistics at the lower over-the-network frequency. The reported histogram will contain some statistics about the distribution of values that were observed, but exactly what stats are captured varies by tool.

You may already be aware but just wanted to mention - this type of in-proc polling+aggregation is doable but atypical. Many folks would likely handle this by capturing high fidelity data initially and then downsample it as part of querying or downsample the raw data within the storage of the time-series database. For example with queries PromQL supports functions that can report the min/max/avg over a time range. There are certainly tradeoffs to the different choices and nothing is wrong with doing the aggregation in-process if you are OK with complexity to maintain the custom metrics. Hope that helps!

Leonardo-Ferreira commented 1 week ago

@Leonardo-Ferreira how you are listening to the metrics? are you using specific tools or manually collecting it (in-proc or out-of-proc)?

I have applications using App Insights, Datadog and .Net OTEL...

tarekgh commented 1 week ago

CC @reyang @cijothomas @CodeBlanch

cijothomas commented 1 week ago

In OTel Spec, the metric export frequency is same as observable call back frequency. i.e If exporting occurs every 60 secs, then observable callbacks are also triggered every 60 secs. There was some ask to support a separate interval for the observable callback, but it didn't made it to OTel spec., but there seems to be few workaround suggested there already. None of them are super straightforward. (Noah's suggestion is equally good, but it is not provided out-of-box, so have to code them yourself).

tarekgh commented 1 week ago

@Leonardo-Ferreira do you have any more questions? or ok to close the issue?