dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
860 stars 288 forks source link

Emit OpenTelemetry tracing #2210

Open roji opened 1 year ago

roji commented 1 year ago

OpenTelemetry has become the de-facto standard for distributed, cross-platform tracing; it is being rapidly adopted both inside the .NET ecosystem and outside. OTel defines semantic conventions for how a database client traces information; these specifications are currently in experimental state, but stabilization is likely start happening soon.

There is a separate instrumentation library, OpenTelemetry.Instrumentation.SqlClient, which AFAIK uses the DiagnosticSource instrumentation already emitted by SqlClient to then emit OpenTelemetry tracing.

However, ideally SqlClient would emit this tracing data directly, without requiring a 3rd-party library.

Wraith2 commented 1 year ago

Why is it better to emit third party specific information rather than have their adapter deal with the diagnostic information we already emit?

roji commented 1 year ago

@Wraith2 this isn't 3rd-party specific information, but rather the emerging standard for how database clients emit tracing information; fundational components in the .NET ecosystem (such as ASP.NET itself, but also Npgsql) are being instrumented to emit OpenTelemetry tracing directly, which is the best user experience and guarantees both accurate tracing and best performance. This really belongs in the driver rather than some 3rd-party adapter which may or may not have access to the needed information in an efficient way.

Wraith2 commented 1 year ago

If there is additional information needed in the diagnostic data that we emit then we should identify and add it so that it can be available to the open telemetry adapter. I've already got an issue open at open telemetry about changing the diagnostic information to be strongly typed so it's more aot friendly so adding any new information would be best done along with that work.

I don't see why it's beneficial to have open telemetry as s direct dependency/implementation inside sqlclient though. It adds complexity and another set of diagnostic logging (we already have 2) and isn't needed by many people. If we want to make things pay-for-play then an external adapter that people can opt into sounds like the best approach to me.

roji commented 1 year ago

I don't see why it's beneficial to have open telemetry as s direct dependency/implementation inside sqlclient though.

There's no OpenTelemetry dependency - everything needed to emit tracing (or metrics) data is already present in the framework (e.g. System.Diagnostics.ActivitySource). In addition, the APIs have been optimized so that their overheads when tracing isn't enabled is negligible.

So there's no real pay-per-play aspect here, and no real downside to implementing this in-the-box for people.

As I wrote above, this is becoming the de-facto standard for distributed tracing - not just in .NET. Note that this is also a different type of tracing: this isn't about adding tracing events to each and every function, generating very large dumps for method-by-method debugging; it's much more about emitting high-level information about what database commands are being executed, how long they're taking, and reporting various aspects about them. It's indeed probably possible to derive that information from SqlClient's existing tracing, but that kind of approach is always less efficient and requires that users discover and add yet another dependency.

I suggest taking a look at how this look for Npgsql; here's the doc page (I know some demos are planned for this in .NET Conf).

vishweshbankwar commented 1 year ago

I don't see why it's beneficial to have open telemetry as s direct dependency/implementation inside sqlclient though.

There's no OpenTelemetry dependency - everything needed to emit tracing (or metrics) data is already present in the framework (e.g. System.Diagnostics.ActivitySource). In addition, the APIs have been optimized so that their overheads when tracing isn't enabled is negligible.

So there's no real pay-per-play aspect here, and no real downside to implementing this in-the-box for people.

As I wrote above, this is becoming the de-facto standard for distributed tracing - not just in .NET. Note that this is also a different type of tracing: this isn't about adding tracing events to each and every function, generating very large dumps for method-by-method debugging; it's much more about emitting high-level information about what database commands are being executed, how long they're taking, and reporting various aspects about them. It's indeed probably possible to derive that information from SqlClient's existing tracing, but that kind of approach is always less efficient and requires that users discover and add yet another dependency.

I suggest taking a look at how this look for Npgsql; here's the doc page (I know some demos are planned for this in .NET Conf).

+1. Having libraries natively instrumented is the ideal case. As an example, HttpClient and ASP.NET Core are also moving towards that change. .NET8.0 has added metrics support in HttpClient and ASP.NET Core. Reference: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/built-in-metrics-aspnetcore.

Tracing is considered for .NET9.0: see https://github.com/dotnet/runtime/issues/93019.

@roji How does Npgsql handle experimental semantic conventions part?

roji commented 1 year ago

@vishweshbankwar Npgsql implements the current database tracing semantic conventions in their experimental state, and that support is itself documented as experimental in our docs. When the specs mature and reach stable, we'll align to those (users may need to react).

cijothomas commented 1 year ago

Upvoting :) ❤️ to see this!

perlun commented 4 weeks ago

Semi-related to this, but the SqlCommand class (both the deprecated System.Data.SqlClient.SqlCommand and the now-relevant Microsoft.Data.SqlClient.SqlCommand) are a bit hard to use with OpenTelemetry in the sense that it seems virtually impossible to add a 3rd party "query name" tag to the instantiated SqlCommand. 🤔 This kind of information would be incredibly useful to be able to include in the distributed tracing data, to distinguish different queries from each other.

The reasons why I claim this is impossible:

If either one of these options were available, you could then easily use the OpenTelemetry.Instrumentation.SqlClient-provided Enrich() method, to set arbitrary tags in the tracing data being submitted. We tried setting a dummy Parameter with a user-defined name of the query, but unfortunately this can cause undesired breakage when running certain SQL queries.


Sorry for hijacking the issue quite a bit (...) but I guess what I'm stretching for is:

  1. Does anyone (perhaps you, @cijothomas?) have any ideas on how to workaround this short-term? I thought about a Java:ish approach of using a thread-local variable for this, but unsure if we can guarantee that the Enrich() lambda runs on the same thread as the code creating the SqlCommand in the first place, because of the async nature of modern .NET code.
  2. When investigating this issue in the long run, please take this into consideration ("make it possible to provide a user-defined query name") since it'll make the tracing much more valuable and useful.
KalleOlaviNiemitalo commented 4 weeks ago

@perlun, ConditionalWeakTable\<SqlCommand, string> would work, but I don't know how much performance that would cost.

Wraith2 commented 4 weeks ago

It would work but because SqlCommand are short lifetime objects (usually) it would cause a lot of traffic and be perf concern as you point out. Adding an identifier property (call it Tag?) to a command isn't a difficult thing to do and would likely be approved. The problem is getting it into the diagnostic output or waiting for the full implementation of the open telemetry

perlun commented 4 weeks ago

Thanks @KalleOlaviNiemitalo, appreciated! 🙏

It would work but because SqlCommand are short lifetime objects (usually) it would cause a lot of traffic and be perf concern as you point out.

It (ConditionalWeakTable) could actually be an option in our case, despite not a perfect solution. Might give this a try as an interim solution at least.

Adding an identifier property (call it Tag?) to a command isn't a difficult thing to do and would likely be approved.

Thank you for that @Wraith2 - let's give it a try: https://github.com/dotnet/SqlClient/pull/2925

The problem is getting it into the diagnostic output or waiting for the full implementation of the open telemetry

That's actually not a problem, at least with the current approach taken by the OpenTelemetry.Instrumentation.SqlClient-tracing-related code, because it's Enrich-property is defined like this. 👇

    /// <summary>
    /// Gets or sets an action to enrich an <see cref="Activity"/> with the
    /// raw <c>SqlCommand</c> object.
    /// </summary>
    /// <remarks>
    /// <para><b>Enrich is only executed on .NET and .NET Core
    /// runtimes.</b></para>
    /// The parameters passed to the enrich action are:
    /// <list type="number">
    /// <item>The <see cref="Activity"/> being enriched.</item>
    /// <item>The name of the event. Currently only <c>"OnCustom"</c> is
    /// used but more events may be added in the future.</item>
    /// <item>The raw <c>SqlCommand</c> object from which additional
    /// information can be extracted to enrich the <see
    /// cref="Activity"/>.</item>
    /// </list>
    /// </remarks>
    public Action<Activity, string, object>? Enrich { get; set; }

Since we have access to the raw SqlCommand there, it's trivial to then "pass it on" into the Activity object which gets sent to the OpenTelemetry collector:

var cmd = (SqlCommand)command;
activity.SetTag("query.name", cmd.Tag); // Depending on the new property not yet available in SqlCommand

Of course, for the full (SqlClient-"native") OpenTelemetry implementation, things might be a completely different beast, I'm just talking about "how could we make this work at all right now" scenario right now.

vonzshik commented 4 weeks ago

I thought about a Java:ish approach of using a thread-local variable for this, but unsure if we can guarantee that the Enrich() lambda runs on the same thread as the code creating the SqlCommand in the first place, because of the async nature of modern .NET code.

You might as well just use AsyncLocal. It works very similar to ThreadStatic, though it's passed through async context.

roji commented 4 weeks ago

@perlun this is indeed quite off-topic to this issue, would be better to have this conversation in its own issue.

perlun commented 3 weeks ago

Alright @roji, fair point indeed. I created a new issue at https://github.com/dotnet/SqlClient/issues/2931 now, and copied my comments there. Let's continue that thread there instead.

Feel free to flag the comments related to this thread as off-topic to keep the original content more readable. 👍