Open lmolkova opened 6 years ago
@brettsam @paulbatum @fabiocav please review the plan and provide your comments. /cc @SergeyKanzhelev @zakimaksyutov
Great writeup! Here's some comments:
If we're not going to backport this to v1.x, we should remove all those checkmarks so it's clear we're only doing this for v2.x.
I'm all for doing things the right way. Let's try to make this work. Some custom stuff we do today that I'd think we want to include:
client.Context.GetInternalContext().SdkVersion = $"webjobs: {assemblyVersion}";
looks good -- we can discuss the open questions separately
Some questions:
Is there something that the HttpTrigger can do to allow this happen automatically? Which part is preventing that?
@brettsam Thanks!
I updated the table to reflect v2 only.
I agree all current stuff (initializers, filters, SDK version) should be added into the DI and participate in TeletmeryConfiguration creation. Later, when AI SDK provides generic DI config, we'll still be able to adjust this config and add custom things
So these client libraries (EH, SB, etc) are responsible for setting Activity.Current when the message, etc, is received? Or is that something that WebJobs (or AI) will need to do?
Yes, SB and HTTP (and eventually EH) can extract Activity from the message and make it current.
SB do this already today - nothing is expected from WebJobs
EH can only extract, but not assign current, for EH WebJobs part would look like
var message = value.GetSingleEventTriggerInput(i);
var activity = message.Events.SingleOrDefault()?.ExtractActivity();
TriggeredFunctionData input = new TriggeredFunctionData
{
ParentActivity = activity,
TriggerValue = message
};
Task task = _parent._executor.TryExecuteAsync(input, _cts.Token);
ApplicaitonInsights logger would do following (instead of RequestTelemtery creation)
if (parentActivity != null && parentActivity != Activity.Current)
{
stateValues[OperationContext] = _telemetryClient.StartOperation<RequestTelemetry>(parentActivity);
}
For Http, this functionality is enabled by specific listener we cannot use right now (see PR4 below). So in HttpTrigger we'd have similar code to extract Activity from HttpHeaders and pass it to the executor and logger.
Blob/queue trigger code would be similar to EH, except we'll assugn parent explicitly
Guid? parentId = await _causalityReader.GetWriterAsync(blob, cancellationToken);
Activity activity = new Activity("BlobQueueTriggerExecutor");
if (parentId != null)
activity.SetParentId(parentId.ToString());
TriggeredFunctionData input = new TriggeredFunctionData
{
ParentActivity = activity,
TriggerValue = blob
};
return await registration.Executor.TryExecuteAsync(input, cancellationToken);
What do you mean by 'function serialization into the persistent queue"?
Using Activity instead of ParentId leads to changes in FunctionStartedMessage
, FunctionStartedMessage
, InstanceTableEntity
. which are serialized into Azure Table and Queue.
Is it correct it is used for logging only?
I also expect it is somehow serialized and propagated to non .NET functions? I have not think about it yet, but we'll need to ensure correlation would work there as well.
Is there something that the HttpTrigger can do to allow this happen automatically? Which part is preventing that?
Instrumentation is only available in presence of the listener. ApplicationInsights ASP.NET Core SDK implements this listener, and it can be configured from the WebJobs (I was mistakenly thinking that is it not). But:
So all these concerns are solvable, but it seems re-using this auto-collection now may do more harm then good. I'd prefer if we configured it after AspNEt Core 2.2 is released and AppInsights AspNetCore SDK is split into seveal packages (so no unwanted dependencies introduced for WebJobs).
Assuming this, in HttpTrigger, we'd just parse headers and set the on the Activity and pass it to the executor. Executor will pass it to AppInsights and it knows what to do with the Activity.
Note -- we moved this to Active Questions so it falls off of our "Untriaged" query. It's more of a "spec" and not really a direct issue.
Sorry, but I don't really understand the status of things here - the request-id from an incoming http request is still not picked up and tracked, right? Any idea when this might be released?
@lmolkova -- can you update this with the current status?
Do plans exists to add this functionality to OrchestrationTriggers? Right now an HttpTrigger picks up the Request-Id and sets up the Activity correctly. However, when subsequently calling an OrchestrationTrigger from a DurableOrchestrationClient the Activity.Current is a new, unrelated Activity in the OrchestrationTrigger function.
Are there any plans to make Azure Storage Queue auto trackable?
I've tried out a few things to enable end-to-end correlation when Azure Queues are involved, but couldn't find a way to make it work. Even if I manually serialize a ParentID
as part of my Queue payload, I can't find a way to use it to set the Request telemetry ParentID for the Function that consumes the Queue message.
I guess what I'm asking is if there are plans to:
Thanks!
Seconding @marknblueswitch
I guess what I'm asking is if there are plans to:
1. Automagically enable end-to-end tracking for Azure Queue messages across different systems 2. Allow the ability to set the ParentID manually in the Function telemetry, so I can serialize a ParentID in the queue payload and set it myself.
This is something that would be extremely useful - we are looking to use Storage Queues over Service Bus due to the simplicity (Service Bus has a lot of extra features we just don't need, and doesn't have any local development/emulator support) but the lack of distributed tracing is a real bummer. It would be really nice to provide some sort of mechanism to allow setting the ParentID from a property in the JSON serialized message body.
Are there any plans to support something like this or another mechanism of distributed E2E tracing with storage queues?
@fabiocav @lmolkova is distributed tracing supposed to currently work for function to function queue messages? I ran across QueueCausalityManager which sets the $AzureWebJobsParentId
value in the JSON being sent into the queue, but I don't see the app insights request associated with the second function execution being correlated. I was trying to see if adding $AzureWebJobsParentId
with the operation ID of an external operation would also cause correlation, but it doesn't even appear to currently happen for function to function messages in the same app.
In queue storage & blob storage - Function-to-function only E2E logging means what? if it offer distributed log tracing and how to achieve it ?
I created this repository to reproduce this issue.
https://github.com/dlebee/corrolation-function-servicebus
More and more library are trying to implement OpenTelemetry in C# which heavily depends on higher version of System.Diagnostics.DiagnosticSource
This is causing azure function service bus trigger correlation issues.
The following is what I think is causing the issue
<= 4.6.0
public partial class Activity { public static Activity Current { #if ALLOW_PARTIALLY_TRUSTED_CALLERS [System.Security.SecuritySafeCriticalAttribute] #endif get { ObjectHandle activityHandle = (ObjectHandle)CallContext.LogicalGetData(FieldKey); // Unwrap the Activity if it was set in the same AppDomain (as FieldKey is AppDomain-specific). if (activityHandle != null) { return (Activity)activityHandle.Unwrap(); }
return null;
}
}
}
VS
> Since 4.7.1 >= or around there
```csharp
public partial class Activity
{
private static readonly AsyncLocal<Activity> s_current = new AsyncLocal<Activity>();
/// <summary>
/// Gets or sets the current operation (Activity) for the current thread. This flows
/// across async calls.
/// </summary>
public static Activity Current
{
get { return s_current.Value; }
set
{
if (ValidateSetCurrent(value))
{
SetCurrent(value);
}
}
}
}
I'll go through the planned changes (split by several PRs) and describe high-level design and expected outcome.
Here is a messy prototype for all changes: https://github.com/lmolkova/azure-webjobs-sdk/commit/22a7affce96f0c34363b3746d0af5797a1c1c593
I'm open to any feedback and suggestions and want to make sure we are on the same page before I start the implementation
Motivation
We want to enable distributed tracing for Azure Functions using ApplicationInsights users are able to track calls to and from the Azure Functions and correlate it to the rest of telemetry reported by other applications.
As a result, we want to enable experiences like End-to-end transaction viewer and Application Map for all users automagically.
Goal 1: supported auto-tracking (triggers only are tracked now)
ApplicationInsights can track HTTP, ServiceBus, EventHub, SQL calls automagically if configured so. It's a component responsibility to participate in distributed tracing, i.e. when new components (e.g. ServiceGrid) implement specific instrumentation, they just light up - requiring no changes from Azure Functions.
Enabling current tracking capabilities gives us following level of support:
[UPDATE] -v1 and all bindings that are not supported in v2 are removed [UPDATE] -added correlation column for clarity
Goal 2: Correlation.
Within the function: function call should be correlated to any tracked binding call
Beyond the function: if trigger contains some context for correlation (specific http header, ServiceBus message property, blob metadata, etc), we want to use it to correlate function telemetry to the rest. The context should also flow to the output.
Correlation is achieved using .NET
System.Diagnostics.Activity
that holds all tracing context. Anyone can access staticActivity.Current
to get current context.Each component is responsible to start a new Activity and propagate the context to downstream service (Azure Storage does not support it though). I.e. ServiceBus knows how to serialize Activity into the message.
Component is also responsible for extracting Activity from the wire. I.e. ServiceBus knows how to extract Activity from the messsage.
So, we will leverage this in Functions.
The Plan
PR1: Dependency Injection (optional, but highly desirable)
Motivation
Now, configuration of applicationInsights is done via DefaultTelemetryClientFactory.
With enabling dependency collection and correaltion this is going to grow and become complicated.
It's also hard to tune for users.
Another issue is that if AppInsights enables new features, Azure Functions would need to change the code to enable them as well.
Proposal
Leverage Dependency Injection to instantiate and control lifetime of AppInsights instances. Hide it beyond
IHostBuilder.ConfigureApplicationInsights
(Configure/Add/Use) extension. Eventually, ApplicationInsights should provide such extension for non-Web Hosts and maintain it (now it works in AspNetCore SDK for web hosts).PR2: Tracking dependency calls
Proposal
Enable
DependencyTrackingTelemteryModule
and enable it to track HTTP, ServiceBus, EventHubBump ServiceBus and EventHub to the stable versions that support tracing
Enable dependency calls filtering based on log category (user code, bindings, triggers, host):
Set scopes with appropriate Log Categories and level for user code, input bindings, output bindings, triggers - TBD
In FilteringTelemteryProcessor ensure proper filtering
[UPDATE] Enable W3C correaltion protocol for outgoing HTTP requests
As a result, users should be able to see all supported calls, and control verbosity for each category separately.
Host logs would not have any scope and would not be logged for now (which is easy to change).
Open questions:
dependency calls for triggers
store Activity.Id instead of guid in blob metadata: breaking change and have to be done carefully or not done at all
PR3: Enable auto-requests collection (function executions) with parent context
Motivation
ServiceBus (EventHubs in future) processing are instrumented already as well as HTTP Asp.NET Core hosting. I.e. ApplicaitonInsights could track such calls automagically.
Since each component knows how to extract trace context and set it on the
Activity
, we want to leverage it.So, there are 3 types of triggers
auto-trackable: ServiceBus, EventHub (in future), Http (in future, more on this later)
non auto-trackable, but can extract parent context: EventHub (now), Blob, Azure Storage Queue
non trackable and no context: Timer
Proposal
we are going to enable automatic request collection
for triggers that could have parent context (EH, blobs, queues), extract Activity and store it in
IFunctionInstance
.Obsolete
ParentId
and addParentActivity
onIFunctionInstance
and other places where applicable.when request tracking starts, based on the
Activity.Current
presence andParentActivity
presence in the scope, decide whether it's needed to track request and how to do itmove properties assignment to telemetry initializers, so that auto-tracked requests have all functions tags (name, invocation id, trigger reason, etc)
Open questions:
function serialization into the persistent queue (whether Activity have to be serialized)
trigger can have parent context AND AzureFunctions ParentId altogether - both should appear on the request telemetry
multiple EH messages/Blobs in a trigger: how to correlate
PR4: HTTP extension and context extraction
Motivation
HTTP request tracking could be automagical, but unfortunately, we cannot configure the listener for it now - it's an internal ApplicationInsights thing - we will expose it. But we still want to get tracing context from the incoming requests and stamp it on the telemetry.
Proposal
In the HTTP extension, parse headers of the incoming request.
Support W3C and Request-Id protocols
Set parsed context on
IFunctionInstance.ParentActivity
[if possible in this iteration] Support multiple instrumentation keys scenarios (AppInsights sets some headers to possible to correlate telemetry across different apps with different ikeys)
Future work outside of Azure functions: will be done later at some point (not blocking current effort)
Splitting ApplicationInsights AspNetCore SDK into several packages:
Efficient filtering for dependency telemetry (ApplicationInsights SDK)
EventHub Processor instrumentation[EDITED - does not make sense]