dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.63k stars 4.57k forks source link

System.Diagnostics.Activity out of proc sampling #102895

Closed CodeBlanch closed 2 days ago

CodeBlanch commented 1 month ago

We have DiagnosticSourceEventSource which enables listening to Activity \ traces out of process. The current design when there is a listener is to create all Activity \ spans and write everything to the EventSource. We feel for out of process monitoring of traces this will create a lot of overhead in the process being monitored.

The request here is to allow the listener to set some kind of sampling algorithm.

Here is the OTel spec for samplers: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#built-in-samplers

What we have today is essentially AlwaysOn.

One suggestion would be to allow the listener to set a ratio. If ratio is supplied we could sample according to these rules:

Always creating the root span is important for log correlation but we only need to create children if the root is sampled. This is how the OTel .NET SDK works today.

/cc @noahfalk @tarekgh @samsp-msft @rajkumar-rangaraj @cijothomas

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti See info in area-owners.md if you want to be subscribed.

julealgon commented 1 month ago

@CodeBlanch isn't there a way to achieve this without basically reimplementing abstractions that the OpenTelemetry packages have already implemented?

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @dotnet/area-system-diagnostics-activity See info in area-owners.md if you want to be subscribed.

CodeBlanch commented 1 month ago

@julealgon Anything is possible! This issue is really to collect ideas. But my gut is no. For this out-of-proc monitoring scenario there is no OTel available. Some external utility (think dotnet-monitor) is run to extract telemetry from a running process which doesn't have the OTel SDK hooked up. We also don't want to use the profiler to inject modify/code so this works for AoT-compiled apps where that isn't possible.

julealgon commented 1 month ago

@julealgon Anything is possible! This issue is really to collect ideas. But my gut is no. For this out-of-proc monitoring scenario there is no OTel available. Some external utility (think dotnet-monitor) is run to extract telemetry from a running process which doesn't have the OTel SDK hooked up. We also don't want to use the profiler to inject modify/code so this works for AoT-compiled apps where that isn't possible.

What if we inverted the problem then?

If there are native .NET scenarios that require sampling, maybe the core System.Diagnostics libraries should contain the sampling abstractions (like they contain Activity, Meter, and other "OTEL-compatible" diagnostics classes) and the OpenTelemetry SDK could just leverage those instead of implementing them from scratch.

Otherwise, this will end up in a ton of potential duplication between .NET and OTEL.

Sure, the easy way out would be to just duplicate the concepts, but that feels like a bad solution long-term to me.

samsp-msft commented 1 month ago

@julealgon Anything is possible! This issue is really to collect ideas. But my gut is no. For this out-of-proc monitoring scenario there is no OTel available. Some external utility (think dotnet-monitor) is run to extract telemetry from a running process which doesn't have the OTel SDK hooked up. We also don't want to use the profiler to inject modify/code so this works for AoT-compiled apps where that isn't possible.

What if we inverted the problem then?

If there are native .NET scenarios that require sampling, maybe the core System.Diagnostics libraries should contain the sampling abstractions (like they contain Activity, Meter, and other "OTEL-compatible" diagnostics classes) and the OpenTelemetry SDK could just leverage those instead of implementing them from scratch.

Otherwise, this will end up in a ton of potential duplication between .NET and OTEL.

Sure, the easy way out would be to just duplicate the concepts, but that feels like a bad solution long-term to me.

This is essentially what we are talking about:

Whether to deprecate the sampling options in OTEL or have that map through to the runtime API is up for debate - what this probably maps to in the OTel Specs is the ParentBasedSampler with a fixed rate sampler for Root spans without a parent.

julealgon commented 1 month ago
  • Add an API or possibly use the existing hook in ASP.NET core when a request comes in and picking the traceid/Recorded properties, to be able to specify a sampling algorithm.

Just keep in mind this should work just as "natively" outside of ASP.NET Core, like cmdline apps, workers, etc.

samsp-msft commented 1 month ago

It's all based around https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivityListener.cs#L48C58-L48C62, so is not ASP.NET specific, but that's one of the main consumers.

samsp-msft commented 1 month ago

dotnet/dotnet-api-docs#9978