Azure / azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
MIT License
711 stars 263 forks source link

Distributed tracing doesn't correlate when using System.Diagnostics.DiagnosticSource 4.7.1 #1615

Closed NickSevens closed 1 year ago

NickSevens commented 3 years ago

Description

I've setup distributed tracing for Application Insights in my durable functions. The DtOrchestrator and DtActivity items are correlated correctly, but my Traces (LogInformation, LogDebug, LogTrace) are not correlated to the Activities anymore. I'm pretty sure this worked a while ago, and also in the early preview NuGet package. The traces are logged to AI, just not linked to any DtActivity or DtOrchestrator instance.

Expected behavior

I expect to see the Traces correlated to a specific DtActivity, so I can filter them by Activity. In this pane for example: image

Actual behavior

The Trace messages are not correlated to the Activity, so I have to search for them separately to browse logs.

Relevant source code snippets

Logging setup has been tried with and without W3CDistributedTracing enabled e.g.:

{
  "version": "2.0",
  "logging": {
    "logLevel": {
      "Default": "Information",
      "DurableTask": "Warning",
      "Function": "Trace",
      "Host": "Warning",
      "Microsoft.Azure.WebJobs": "Warning"
    },
    "applicationInsights": {
      "httpAutoCollectionOptions": {
        "enableW3CDistributedTracing": true
      },
      "samplingExcludedTypes": "Request",
      "samplingSettings": {
        "isEnabled": true
      }
    }
  },
  "extensions": {
    "durableTask": {
      "hubName": "WorkspacesHub",
      "tracing": {
        "distributedTracingEnabled": true,
        "distributedTracingProtocol": "W3CTraceContext"
      }
    }
  }
}

Known workarounds

(None)

App Details

Screenshots

As an example, these are the properties returned for a DtActivity and should-be-correlated Trace message image image

davidmrdavid commented 3 years ago

Hi @NickSevens, thank you for bringing this to our attention!

I'm sorry this isn't working as expected, the feature is still in early development and was only recently merged into the main package release, so it's definitely possible some older features stopped working. I'll take a look at this.

In the meantime, it would help me greatly if you could provide a minimal reproducer that illustrates how you're generating these traces and that I could use to determine if your issue has been fixed. Any chance you could provide that? Thanks!

⚡ ⚡

NickSevens commented 3 years ago

Hi @davidmrdavid, I tried creating a minimal sample, but there it does correlate correctly. So ... I'm now trying to figure out what's different in my app. :) I'll post here if I find anything causing the issue at my end...

mpaul31 commented 3 years ago

i also noticed the operation name is <Empty> in the portal now after the upgrade so unfortunately this feature is not mature enough for our production workloads quiet yet.

davidmrdavid commented 3 years ago

Hi @mpaul31, you're correct. The feature is still in public preview so the experience needs some more work before being ready for production. Just for my reference, where are you seeing this <Empty> operation name. Do you mind clarifying / sharing a screenshot Thanks!

mpaul31 commented 3 years ago

Sure. You can see it from the performance blade:

image

davidmrdavid commented 3 years ago

@mpaul31 , thanks. I'll turn that into a separate ToDo item and prioritize it for an upcoming release of Dist. Tracing

NickSevens commented 3 years ago

@davidmrdavid I've narrowed it down to this: I recently upgraded a NuGet package which has a dependency on System.Diagnostics.DiagnosticSource. The update caused the dependency to go from 4.6.0 to 4.7.1. This causes the correlation to fail.

When the correlation fails, I noticed a difference in Operation ID format:

As a reproduction, you can simply create a new project, and add the NuGet package. <PackageReference Include="System.Diagnostics.DiagnosticSource" Version="4.6.0" /> --> works <PackageReference Include="System.Diagnostics.DiagnosticSource" Version="4.7.1" /> --> fails

davidmrdavid commented 3 years ago

Thanks @NickSevens!. I'll need to dive into the documentation to understand why this would break our expected behavior. Thanks for providing this context, this is really helpful :)

NickSevens commented 3 years ago

Not sure if it's the same, but https://tsuyoshiushio.medium.com/system-diagnostics-activity-update-and-distributed-tracing-in-c-f02428dcd050 explains something similar as well.

lindgrend commented 3 years ago

Having the same issue, also if you in the c# sample use Microsoft.Azure.WebJobs.Extensions.DurableTask Version="2.4.0" it also breaks for me. Maybe this can be used as your minimum repro?

NickSevens commented 3 years ago

Hi @davidmrdavid, do you have any update on the issue? Anything you can share? :) Not that it's a blocking issue, but still curious ;)

santo2 commented 3 years ago

Hi guys, is this being planned to fix? we were already looking forward to this a couple of months before the actual release, it would really help us find bugs in our system. Now it's really hard to correlate all activities together.

davidmrdavid commented 3 years ago

Hi folks! Unfortunately we've been slowed down by prioritizing this major bug but I'll take a different approach during our next release sprint and prioritize this instead. I think the bug I linked above is going to be solved on a much longer timeframe than I originally expected, so I can allow myself to multi-task on the issues moving forward :)

Expect this to be worked on for our upcoming milestone. We haven't gotten to the release planning stage yet (we just had a release) but I understand we should be starting that process soon.

santo2 commented 3 years ago

Hi @davidmrdavid any news on this?

mauve commented 3 years ago

Hi, any update? Or is there a workaround we can do?

davidmrdavid commented 3 years ago

Hi all, sorry for my delayed response, I somehow missed the notification on April 14th.

Thankfully, we do have several updates! Let me list them below, starting with some general news on Distributed Tracing support:

With regards to this specific ticket, unfortunately there seem to be multiple different kinds of issues being reported here, so I'll try to address all concerns below.

(1) With respect to @NickSevens's original issue, which is about Traces not being correlated to Activities, we believe this issue should be fixed with our next release. By this we mean that, in our latest attempt to reproduce this issue, we do see activity traces being correlated. We're still working on correlating orchestration traces, but we describe a workaround to support this scenario here: https://github.com/Azure/azure-functions-durable-extension/discussions/1788#discussioncomment-671121

(2) @mpaul31 reported another issue: the operation name is <Empty> in the portal. Unfortunately, we haven't been able to reproduce this. Since we're nearing another release (one with various distributed tracing improvements), it would be good to validate if the operation name remains empty with that new release. If it does, we should create another issue to track it.

And that's all! Please let me, and @bachuv, know if there's anything more we can do to help!

mauve commented 3 years ago

Hi @davidmrdavid thanks for the detailed update, is v2.5.0 still planned to be released on the 28th of May (or around that date)?

NickSevens commented 3 years ago

Hi @davidmrdavid does this mean that using System.Diagnostics.DiagnosticSource 4.7.* still won't correlate correctly - without using a workaround? Also: does the workaround described work with ILogger? it seems to be working for direct usage of TelemetryClient only?

We're beginning to notice issues updating other packages, which have their dependencies updated as well, which is starting to be blocking.

davidmrdavid commented 3 years ago

Hi @mauve. Yes, the update is still planned to be released around May 28th, most likely to be released a few days after that due to some extra validation steps we're trying for the first time. You can track our release progress here: https://github.com/Azure/azure-functions-durable-extension/issues/1852 .

Relatedly, we always create release checklist-issues like the one above around our release dates, so looking for them is a good way to get up-to-date information on how the release is progressing :) .

@NickSevens, I'll defer those questions to @bachuv, who should have more context on these.

bachuv commented 3 years ago

@NickSevens, using System.Diagnostics.DiagnosticSource 4.7.* doesn't currently correlate correctly. We would need to do more investigation to understand what changes would need to be made to support that version.

The workaround works with TelemetryClient since certain operation id properties have to be set for it to correlate correctly. #1805 tracks this issue.

Can you create a new GitHub issue and give us more information about what package updates are conflicting with Distributed Tracing? That way we can take a look at those packages and see if there's a workaround or solution. Thanks!

NickSevens commented 3 years ago

@bachuv I'm confused. This issue was created specifically since I had an issue when using System.Diagnostics.DiagnosticSource 4.7.*, which was explained as such here. I can surely create a new issue, but what would be the difference with this issue be?

cc: @davidmrdavid

bachuv commented 3 years ago

@NickSevens, sorry for any confusion! I was suggesting creating a new issue since you had mentioned having issues with updating other packages as well. Feel free to give us more information about those other packages here and we will take a look!

mauve commented 3 years ago

I just used/ran a distributed function with System.Diagnostics.DiagnosticSource 5.0.0 and Durablefunctions 2.5.0 and the traces are not being correlated.

DtOrchestrator has:

Activity has (this works):

However my log.logInformation() messages have (logging from inside the above activity):

davidmrdavid commented 3 years ago

@mauve, could you please open up a new ticket describing this issue? Since this reproduction is for a different DiagnosticSource version, it might be best to separate the threads :) . Thank you!

NickSevens commented 2 years ago

@bachuv @davidmrdavid is there any progress on this?

santo2 commented 2 years ago

waiting for a long time ;-) just letting you know we would really appreciate this

bachuv commented 2 years ago

@NickSevens @santo2 - We are actively working on making improvements in this area, but for now, to get traces to correlate in Activity functions, can you try following this method from the SimpleCorrelationDemo.cs to log traces using a TelemetryClient? Logging traces using this method should correlate the traces with the Activity function.

We also upgraded the System.Diagnostics.DiagnosticSource package version in this PR and it should be a part of our next release.

dlebee commented 2 years ago

I created a project to demonstrate the issue in case anyone needs it.

https://github.com/dlebee/corrolation-function-servicebus

JakubMajsniar commented 2 years ago

Do you have any findings about this, guys ? I've tried to upgrade everything (AZ functions version, target framework, all packages) in my incubator project for end-to-end transaction tracing and traces for DF are correlated finally correctly. I found out that also deprecated SDK MS.Azure.ServiceBus may break the correlation... But to make this feature work on bigger project is almost impossible.

dlebee commented 2 years ago

@JakubMajsniar using net6 with version 4 of azure function runtime and system.diagnosticsource v6.0.0 did the trick tor me

JakubMajsniar commented 2 years ago

@dlebee yes, for me too. Also in mentioned bigger project (I was able to upgraded all necessary stuff) and it is working. But with V3 version of runtime the latest working version of DiagnosticSource seems to be 4.7.0. There are 474 packages which are dependent on DiagnosticSource and newer versions of them may need and bring newer version of DiagnosticSource. We have also found solution in our project where we had just downgrade 3 very common packages dependent on DiagnosticSource to use 4.7.0 version of it and project could stay in .netcore 3.1 and V3 runtime. And btw Microsoft.ApplicationInsights 2.20.0 needs DiagnosticSourcce >= 5.0.0. so we had to go with ApplicationInsights 2.14.0. I just wanted to share our experience :)

NickSevens commented 1 year ago

It seems to me correlation now works correctly - when indeed upgrading to the latest packages and Azure Func V4. Which is good enough for me ;)

Thank you for the support guys.