Service discovery should enrich activity with incoming service name before resolution

davidfowl commented 1 year ago

When looking at traces, it's hard to back track to the original information when trying to track down a potential problem:

This is because HTTP client is the one creating the activity and the service discovery system's information is after resolution of the underlying address information. We should enrich the outgoing telemetry context with tags that say what the service name was for a particular request.

danmoseley commented 1 year ago

@ReubenBond we're assuming you're doing this next, per backlog review

please also update dashboard to display it meaningfully

davidfowl commented 1 year ago

If we invent new activity tags we should follow the https://opentelemetry.io/docs/specs/otel/trace/semantic_conventions/.

cc @noahfalk

DamianEdwards commented 1 year ago

Perhaps server.address is what it should be adding (from https://opentelemetry.io/docs/specs/otel/trace/semantic_conventions/span-general/#serveraddress).

davidfowl commented 1 year ago

Wouldn't that overwrite the real address?

DamianEdwards commented 1 year ago

It's described as representing the "logical server hostname" and doesn't seem to be populated at all right now.

davidfowl commented 1 year ago

@noahfalk is there a way to enrich the outgoing activity without creating a new one?

noahfalk commented 1 year ago

It's described as representing the "logical server hostname"

I think there are different ways you could interpret 'logical', but OTel's interpretation is request.RequestUri.Host

doesn't seem to be populated at all right now.

It is populated if folks use OpenTelemetry's instrumentation: https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Instrumentation.Http/Implementation/HttpHandlerDiagnosticListener.cs#L189

Eventually we probably want to absorb that into HttpClient so I'd be cautious about doing anything that conflicts.

@noahfalk is there a way to enrich the outgoing activity without creating a new one?

Sure.

Activity.Current.SetTag("MyCoolTag", "ReallyImportantData");

This assumes that your service discovery code is running while HttpClient's activity is the current one. I'm not sure where it actually runs?

ReubenBond commented 1 year ago

If the Host header is that, is that enough, or does it have to be the Uri.Host value?

DamianEdwards commented 1 year ago

It is populated if folks use OpenTelemetry's instrumentation: https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Instrumentation.Http/Implementation/HttpHandlerDiagnosticListener.cs#L189

Should we be updating our ServiceDefaults project template to use this then?

davidfowl commented 1 year ago

If the Host header is that, is that enough, or does it have to be the Uri.Host value?

Oh with your latest change, will this just work?

davidfowl commented 1 year ago

With this change we get "net.host.name" in the traces.

davidfowl commented 1 year ago

Nope, I was wrong, this is the server span, not the client span. We want to see the outbound activity with this.

davidfowl commented 1 year ago

yea, so I finally re-read this and we need to update our service defaults. I'll get on that.

danmoseley commented 1 year ago

@davidfowl @ReubenBond can you update here, is the work left all on David?

davidfowl commented 1 year ago

Setting the host name seems to break everything (even on ACA) and we turned it off by default so AFAIK, nothing has changed here. We have no trace of what the original service name was once service discovery has run

davidfowl commented 1 year ago

@ReubenBond I updated this issue to only be about the outgoing activity, that is the high order bit. Since the host name plan was a bust, I suggest we attempt to do this finding a way to mutate the outgoing activity (if possible).

davidfowl commented 1 year ago

@noahfalk After a cursory glance, I can't see an obvious way to get in between here and add extra state to the outgoing activity. Worst case scenarios it looks like we would need to do some kinda of otel specific trace enrichment? Is that right?

noahfalk commented 1 year ago

I can't see an obvious way to get in between here ...

You mean the point in time where your code runs doesn't overlap the time period where the Activity you want to modify is active? Various options:

An ActivityListener can hook the ActivityStarted callback and do arbitrary modification to an Activity. You would have to stash the state somewhere earlier so that you can retrieve it, likely in an AsyncLocal.
Similar to the ActivityListener you could also use an OTel specific processor. Downside is it takes an explicit dependency on OTel library, upside is it runs with predictable ordering relative to whatever inspection/modification of the Activity OTel is going to do.
Engineer a way for the component creating the data to flow it to the HttpClient handler component that is creating the Activity. This is similar in principle to the enrichment for metrics, except now it is enrichment for Activity.

Its possible there are more scenario specific options but at this point I only have a very cursory understanding of how service discovery is hooked up. Above are the generic options that could be used for any two components.

davidfowl commented 1 year ago

An ActivityListener can hook the ActivityStarted callback and do arbitrary modification to an Activity. You would have to stash the state somewhere earlier so that you can retrieve it, likely in an AsyncLocal.

The static wire up of this bugs me. I wouldn't want this in library code. It's it ironic that we have to set state on an async local to propagate that state to another async local 😄.

Similar to the ActivityListener you could also use an OTel specific processor. Downside is it takes an explicit dependency on OTel library, upside is it runs with predictable ordering relative to whatever inspection/modification of the Activity OTel is going to do.

This might be OK for now. We can build a ServiceDiscovery.Otel package that could wire this up. It's a little unfortunate we can't do this out of the box yet but by .NET 9, we should hopefully have something more convenient.

Engineer a way for the component creating the data to flow it to the HttpClient handler component that is creating the Activity. This is similar in principle to the enrichment for metrics, except now it is enrichment for Activity.

The code here is pretty much the outermost HttpMessageHandler, AFAIK DiagnosticsHandler is the inner most handler. There's no real easy way to run another handler at that layer. If we could control the creation of it, we could write up it manually in the right order so that we could run while the activity was active. This is much like how the incoming ASP.NET Core activity works.

davidfowl commented 1 year ago

@noahfalk I think the cleanest solution would be to expose the DiagnosticHandler and allow it to be added to the pipeline earlier so that other delegating handlers can see the outgoing activity. We'd also need a local way to disable the inner SocketsHttpHandler creating one... I'll need to think about it some more.

I experimented with an otel processor but I also need to expose an async local that stores the state from the service discovery system... which is not ideal.

JamesNK commented 10 months ago

An alternative solution is a way to add a callback to a HttpRequestMessage that DiaganosticHandler runs. It would be responsible for enriching the activity. This approach is what was added for metrics in .NET 8.

davidfowl commented 10 months ago

@JamesNK did some magic here in the dashboard specifically to reverse engineer server names from the outgoing address:

davidfowl commented 9 months ago

@ReubenBond do you want to punt this until later or do you want to produce a ServiceDiscovery.Otel pacakge that adds data to the outgoing span using enrichment?

davidfowl commented 1 month ago

@noahfalk @tarekgh is this possible in .NET 9?

JamesNK commented 1 month ago

No. Request to support adding tags to HttpClient activities: https://github.com/dotnet/runtime/issues/96263

dotnet / aspire

Service discovery should enrich activity with incoming service name before resolution #293