Azure / durabletask

Durable Task Framework allows users to write long running persistent workflows in C# using the async/await capabilities.
Apache License 2.0
1.53k stars 296 forks source link

On replay of orchestration, Activity.Current is null, loosing correlation for traces within an orchestration #507

Open joerage opened 3 years ago

joerage commented 3 years ago

I am facing a problem tracing using an ILogger. Things are working fine except for tracing happening on an orchestration (tracing during a activities is fine). Traces done from orchestration are not correlated (no operation_Id, or operation_ParentId). From looking at the code, there is not Activity.Current that is setup during the replay of an orchestration, which is needed to achieve correlation

TsuyoshiUshio commented 3 years ago

The issue happens by the following reason. Thank you for the great feedback. It will be the issue on Durable Functions as well. He is using Distributed Tracing feature.

The reason why it happens on orchestrator is, The host generate the new Activity.Id for each orchestration even if it is the same orchestrator. If you use orchestrator with ILogger, the Logger will be generated on the host side Activity.Id will be different each execution. For mitigate this issue, you can refer to this sample. (This is Durable Functions sample though).

We can fetch the correated id with CorrelationTraceContext.Current on orchestrators.

https://github.com/Azure/azure-functions-durable-extension/blob/dev/samples/correlation-csharp/FunctionAppCorrelation/SimpleCorrelationDemo.cs

As sample represent it, if we want to have correlated log, it might be difficult to use ILogger directly, instead, we might be able to provide logging wrapper on the context object instead.

jviau commented 3 years ago

@TsuyoshiUshio - I am not exactly sure how DTFx plans on handling distributed tracing, but here are some expectations I would like . Maybe you can answer if this is possible or not:

As a consumer of DTFx, I would like to be able to just use the .net Activity system as is:

jviau commented 3 years ago

I found the issue: Activity.Current works in TaskActivities because TaskActivityDispatcher sets it here. TaskOrchestrationDispatcher is missing a similar call.

But I also noticed Activity.Current is null during middleware execution, so it needs to be set in there. I think distributed tracing needs some updates overall. It isn't following the standard activity conventions:

  1. Name the operation with your assembly name + activity name.
  2. Use DiagnosticSource.StartActivity and StopActivity
  3. new in .net5.0 (but available in older versions via System.Diagnostics.DiagnosticSource 5.0 nuget package), use ActivitySource for creating these activities.

The end result is for legacy behavior (DiagnosticSource + Activity operation name), the names should be "Microsoft.Azure.DurableTask.Core.{Activity}.{Start/Stop}" - note that ".Start" and ".Stop" are appended for you via the DiagnosticSource.StartActivity/StopActivity methods. For ActivitySource, "Start/Stop" is not used as they have explicit Start / Stop events you subscribe to.