Open sourabh1007 opened 1 day ago
It's an anti-pattern to emit runtime metrics in client-specific instrumentations. .NET 9 will have a bunch of native metrics https://github.com/open-telemetry/semantic-conventions/blob/main/docs/runtime/dotnet-metrics.md that cover these and many other things.
The interval at which metrics are collected is configured by users, not instrumentations - https://github.com/open-telemetry/opentelemetry-dotnet/blob/0343715f49ac8e121ec39acd92f8d5572b3d036d/src/OpenTelemetry/Metrics/Reader/PeriodicExportingMetricReaderOptions.cs#L47.
Cosmos measuring things more frequently will result in aggregation across the user-configured interval - https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#metricreader-operations
What do we want to collect? (as Implemented in java SDK)
Available Open Telemetry Compatible Packages
NuGet Gallery | OpenTelemetry.Instrumentation.Runtime 1.9.0 Usage: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/examples/runtime-instrumentation/Program.cs Metrics List: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/src/OpenTelemetry.Instrumentation.Runtime/README.md
NuGet Gallery | OpenTelemetry.Instrumentation.Process 0.5.0-beta.6 Usage: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/examples/process-instrumentation/Program.cs Metrics List: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/src/OpenTelemetry.Instrumentation.Process/README.md#step-2-enable-process-instrumentation
.NET extensions metrics - .NET | Microsoft Learn Metrics List: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/built-in-metrics-diagnostics#microsoftextensionsdiagnosticshealthchecks
In-Built Metrics: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/built-in-metrics-runtime
What we need in Cosmos DB SDK?
We have observed that brief CPU spikes in the past have negatively impacted the customer experience. While existing libraries allow us to capture CPU usage at intervals, such as every minute (depending on the capabilities of the exporter), we require more granular data on CPU and memory usage.
Proposal: Enhance the SDK by introducing custom CPU and memory usage metrics. These metrics will collect and record data every 10 seconds, generating a histogram of the values, as outlined above.