Relationship between AspireComponentSettings.Tracing and Azure client options config

dotnet / aspire

Tools, templates, and packages to accelerate building observable, production-ready apps

https://learn.microsoft.com/dotnet/aspire

MIT License

3.89k stars 469 forks source link

Relationship between AspireComponentSettings.Tracing and Azure client options config #1149

Open lmolkova opened 11 months ago

lmolkova commented 11 months ago

The bigger question for me is what we think AspireComponentSettings.Tracing should control. Setting *ClientOptions.Diagnostics.IsDistributedTracingEnabled = false; means we'll never do anything, even if you add a listener. Should the Aspire setting just declare whether to wire up a listener automatically? Or should it mirror the underlying Azure behavior of disabling all tracing activity?

I don't love that two similar looking things behave differently in a subtle way. Is there any scenario where someone setting AspireComponentSettings.Tracing = false; could be upset if other code in their app ended up adding a listener? Like imagine seeing a span for GET https://myaccount.table.core.windows.net/diseases(PartitionKey='T',RowKey='Ted') return a non-404 when I explicitly disabled tracing on my diseases Aspire component. :smile: We've been really militant about respecting all *ClientOptions.Diagnostics settings so far because it's not worth risking customer trust.

So if AspireComponentSettings.Tracing = false; should just mean don't add a listener, then I'm fine leaving things as-is because the resulting path through our pipeline does indeed look cheap enough. If instead it means disable tracing for this component, we still might want to push the setting down another layer as well.

(I don't think we need to block the current PR on this question though because these changes are a major improvement for the happy path of the next Beta.)

_Originally posted by @tg-msft in https://github.com/dotnet/aspire/pull/1084#discussion_r1409997215_

Azure clients have additional tracing flag - ClientOptions.Diagnostics.IsDistributedTracingEnabled and we need to make it work consistently with Azure{Component}Settings.Tracing.

lmolkova commented 11 months ago

I'll happily work on it, please assign me.

eerhardt commented 11 months ago

So if AspireComponentSettings.Tracing = false; should just mean don't add a listener

This is exactly what it means. All our components have a Settings object that controls:

HealthChecks
Tracing
Metrics

These booleans are meant to control whether the component adds the necessary DI services for their corresponding feature. Here's the simplest one that has all 3:

https://github.com/dotnet/aspire/blob/28c2ef056cf92a571741ca0c2f924845117ddb60/src/Components/Aspire.Npgsql/AspirePostgreSqlNpgsqlExtensions.cs#L73-L101

I don't think it should control more than that. If the underlying library has lower-level settings, I don't think we should try to hook them together. If someone disables the lower one, and enables the higher one, it is relatively easy to explain that AspireComponentSettings.Tracing adds the listener, but they disabled the underlying source. (Honestly I'm not sure what someone would expect to have happen in this case). Vice versa is the same - if they set AspireComponentSettings.Tracing = false and set ClientOptions.Diagnostics.IsDistributedTracingEnabled = true, there is no listener so no OTel traces are emitted. But in this case if someone added another listener, that should still work because the Azure SDK library is still creating activities.

As an aside, Activity already does this checking of "is anyone listening?" and if there are no listeners an Activity object isn't created.

https://learn.microsoft.com/dotnet/api/system.diagnostics.activitysource.createactivity#system-diagnostics-activitysource-createactivity(system-string-system-diagnostics-activitykind)

Returns Activity The created Activity object or null if there is no any event listener.

I was a bit surprised to learn that Azure SDK has this same logic higher up.

lmolkova commented 11 months ago

After playing with it, I agree with @eerhardt that I don't see the point in making AspireComponentSettings.Tracing and Azure ClientOptions.Diagnostics.IsDistributedTracingEnabled.

Moreover, they are different properties (thanks @eerhardt for the explanation).

While AspireComponentSettings.Tracing and ActivitySource.HasListeners() are global enablement mechanisms, ClientOptions.Diagnostics.IsDistributedTracingEnabled provides per-client instance control.

If I wanted to trace one client, but not the other (e.g. some requests are particularly noisy or perf-sensitive), I'd create two named configurations:

builder.AddKeyedAzureBlobService("blobsNoTracing",  
 clientBuilder => clientBuilder.ConfigureOptions(options => options.Diagnostics.IsDistributedTracingEnabled = false));
builder.AddKeyedAzureBlobService("blobsWithTracing",  
 clientBuilder => clientBuilder.ConfigureOptions(options => options.Diagnostics.IsDistributedTracingEnabled = true));

But I would not be able to do it with Tracing property, the below code results in tracing for both instances of the component.

builder.AddKeyedAzureBlobService("blobsNoTracing",  settings => settings.Tracing = false);
builder.AddKeyedAzureBlobService("blobsWithTracing",  settings => settings.Tracing = true));

And yes, I think the fact that Tracing option is a global one needs to be documented, but it really belongs on a global OTel configuration rather than per-component-instance one.

@tg-msft thoughts?

eerhardt commented 11 months ago

But I would not be able to do it with Tracing property, the below code results in tracing for both instances of the component.

How would you do this with OpenTelemetry's APIs as they exist today?

            builder.Services.AddOpenTelemetry()
                .WithTracing(traceBuilder => traceBuilder.AddSource(ActivitySourceNames));

That adds the ActivitySource to OTel, so any activity logged there will get traced by OTel.

Should we rethink the settings.Tracing name? Or do you think XML docs on this is enough?

lmolkova commented 11 months ago

How would you do this with OpenTelemetry's APIs as they exist today?

There is no way to do it with otel - any non-global config/behavior has to be controlled by the component code.

Should we rethink the settings.Tracing name? Or do you think XML docs on this is enough?

I think this is OTel setting rather than component setting. Other component setting (e.g. on blob) are instance-specific, but not the Tracing. If we had OTel configuration it could be global and every component could enlist it's sources/meters to be enabled. Then users could configure it like OTel__<component_name>__Tracing__Enabled: false.

It seems to be more difficult to use both in code and in the config files, but maybe there is a better proposal on how it can be done? Static property seems ugly, but it'd be accurate.

But this approach is used in OTel Java agent when users can disable specific instrumentations with OTEL_INSTRUMENTATION_[NAME]_ENABLED env var. Similar approach exist in OTel python. I assume OTel at some point will provide a common configuration story for enabling/disabling specific instrumentation.

If we keep Tracing on component settings, I'd update it's description from

https://github.com/dotnet/aspire/blob/50fb0475444bb99bca458653f120a113aeeca815/src/Components/Aspire.Azure.Storage.Blobs/AzureStorageBlobsSettings.cs#L45-L51

to something like

Gets or sets a boolean value that indicates whether the OpenTelemetry tracing is enabled globally for all instances of this component. If you want to disable tracing, make sure to set it to false on all instances of the component you register.

Which I think is not great.

TL;DR: I think it'd be great to get rid or Tracing property and populate it from some env variable in a similar manner as OTLP config happens. In the long run, otel is likely to standardize global configuration story and it'll be much easier to keep legacy env var around than public API.

lmolkova commented 11 months ago

Also, I we'll have a similar problem for Metrics and Logs, so we should really consider having a global config for telemetry separate from instance config.

eerhardt commented 10 months ago

I'm not excited about introducing an environment variable for Aspire. If we need an environment variable to control this, I think it should be done at a lower layer (either OTel or Azure SDK).

I'd be OK with updating the Tracing docs (probably in a remarks section) to indicate that this is global and would need to be turned off for all instances of a component if you want to turn it off.

Thoughts?