Azure / azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
MIT License
715 stars 271 forks source link

Isolated Worker: Application Insights Durable Function Distributed Tracing not working #2662

Open RobARichardson opened 12 months ago

RobARichardson commented 12 months ago

I have a .NET 7 Function App (Isolated Worker) that has Application Insights setup using the same instructions documented here. This app is also using Durable Functions.

Application Insights appears to be working just fine for non-durable functions as one would expect given this announcement. Unfortunately, I can't seem to get distributed tracing to work for Durable Functions like it does for In-Process Durable Functions. After reading a lot of the existing issues here in this repo, it's not clear to me whether it's not fully supported yet or I've got something misconfigured in the app.

The closest I've been able to get is by following the README in the Distributed Tracing V2 for Durable Functions which looks to be targeted at In-Process Durable Functions. By updating the host.json config as documented here, I've been able to see Orchestrator & Activity Functions correlated together albeit with an Operation Name value of <Empty>. Furthermore, the ServiceBusTrigger Function that started the orchestrator is still not being correlated with the Orchestrator or Activities.

It's also worth noting that when I took the same project and migrated it to In-Process Functions, the distributed tracing worked in the same way it's documented here in the sample.

Any guidance from Microsoft or the community would be much appreciated.

jviau commented 12 months ago

@RobARichardson only distributed tracing v2 is supported for dotnet isolated. This feature is still in preview and we are exploring ways to improve it. The ServiceBusTrigger (or rather, the parent span) not being correlated would be a feature gap we have and will look into addressing.

Will also look into operation name being empty.

RobARichardson commented 12 months ago

@jviau Thank you for the update!

If you don't mind, I have a quick follow-up question based on my attempts to understand how distributed tracing v2 is working and hopefully implement a temporary workaround for the gaps mentioned above.

It looks like distributed tracing v2 is handled by DurableTelemetryModule which is being provided by Microsoft.Azure.DurableTask.ApplicationInsights.

From what I can see, the only consumer of this package is Microsoft.Azure.WebJobs.Extensions.DurableTask where it registers the module inside of TelemetryActivator.

What I'm struggling to understand is how distributed tracing v2 is partially working for my dotnet isolated function app that isn't referencing either of these packages?

jviau commented 12 months ago

This attribute: https://github.com/Azure/azure-functions-durable-extension/blob/b5c70e5e40d12a1aed281ada1c4d5651608df247/src/Worker.Extensions.DurableTask/AssemblyInfo.cs#L7

Instructs the function SDK to fetch the DurableTask WebJobs nuget as a host extension. So, it is referenced, just through a functions specific way as these assemblies are needed for the host process only - not your own.

RobARichardson commented 12 months ago

@jviau Ahh, that makes sense. Thank you!

RobARichardson commented 11 months ago

@nytian any details on the resolution of this issue, for example which release and package contains the fix?

RobARichardson commented 10 months ago

Hi @nytian & @jviau, just wanted to follow up again. Do either of you have more context on the resolution for this issue? My team is eagerly awaiting a fix since if would greatly improve observability for us. Thanks!

nytian commented 10 months ago

Hi, @RobARichardson sorry I thought your question was answered. Tag @bachuv who works for distributed tracing for further answers. Thanks!

bachuv commented 10 months ago

Hi @RobARichardson, thanks for following up! I'll try to repro and investigate the issue so we can find a fix.

CharlesToniolo commented 10 months ago

Hi! I'm having the same issue. The activities are being correlated but they can be seen as <Empty> in the Operation Name when using the app insights performance tab. The orchestration I'm testing is started by a http trigger and the start of the orchestration is not being correlated with the suborchestrations, activities, etc... I'm using .NET 8 in Isolated Worker...

faarbaek commented 9 months ago

Dont know if this is related, but I have a similar issue, where the Monitor view in Azure Portal does not show any Orchestration Traces for an orchestration trigger.

Seems like the following three properties are missing from customDimentions: DurableFunctionsInstanceId, DurableFunctionsRuntimeStatus and DurableFunctionsType

In-process function image

Isolated worker function image

RobARichardson commented 8 months ago

Hi @bachuv - was just curious if there was any news on this issue. This continues to be a big observability pain point for my team and other teams in my organization.

ngu-khoi commented 7 months ago

To preface, the distributed tracing as it is for durable functions is insanely useful especially when using complex patterns so I appreciate the effort being made here. The others in the thread above have touched on the issues I'm facing, but I'm just going to share my feedback.

Issue

I'm currently grappling with an issue regarding distributed tracing across different components of our architecture, particularly between a Function App utilizing Azure Durable Functions, Queue Storage, and an App Service instance. We have a setup where a client function, responsible for starting an orchestration, is triggered by a queue binding. This setup is supposed to facilitate automatic tracking.

Although tracing picks up the event of reading a message from the queue, this interaction doesn't integrate as expected into the Application Map as expected. I followed the same setup as @RobARichardson , and while we can track the internal workings of the orchestrator and the activity functions it invokes, the tracing falls short in capturing the full path of the queue binding interaction externally which is a major challenge.

This image shows the client function (QueueTriggeredExtractionOrchestrator) and the orchestrator function / activity functions (all showing up as <Empty>) CleanShot 2024-04-05 at 13 55 03

Examining the client function, we see which queue the message came from, but no extended trace from the queue. I expected the dependency should show information from that. Additionally, it doesn't show the distributed trace when the client function calls the orchestrator function. CleanShot 2024-04-05 at 14 09 04

In the orchestrator, the stack trace between the activity functions and orchestrator is good. I just can't seem to figure out what information implies the relationship between the orchestrator and the client function CleanShot 2024-04-05 at 14 44 15

Durable functions distributed tracing is a different solution than functions. @RobARichardson I see you already have an issue on the DF extension repo: Azure/azure-functions-durable-extension#2662 - please continue to use that issue there.

@plamber, as @brettsam said we should have Activity.Current hydrated in the worker today. Is it not working for you?

We are also looking to improve this experience with our OTel efforts: Azure/azure-functions-host#9273

I also took a look at the referenced issue thread, but I wasn't able to make use of the advice shared over there i.e. I don't know what I'm looking for to access Activity.Current.

Related Issues

To further this point, when I have an activity function in the orchestrator call another sub-orchestrator via sending to a message that the other orchestrator relies on, the distributed trace is lost. For example, if I were to set the logging category to Error, and my sub-orchestrator fails as a result of some flaw in the message, I would have 0 information on how the message was actually formed from the primary orchestrator since it's not logged due to the logging category, and if I were to set it at Information, it's generally challenging to isolate the relevant traces. Any advice here would be appreciated anyways since I'm newer to Application Insights.

On a related note, our backend App Service, which sends messages to this same queue, shows successful connections on the Application Map and I can access the distributed trace. Ideally, I'd like to see the distributed trace linking the function app all the way back to the web app, especially when errors are raised. However, there seems to be a disconnect in the tracing from the web app to the function app, despite both components interacting with the same queue storage.

jviau commented 7 months ago

@ngu-khoi thank you for the write up and details. Happy to hear the distributed tracing for durable is useful for you. But yes, we definitely want to evaluate your scenarios and continue to improve upon the experience.

Can you share a repro of your functions you are using? I am particularly interested in what bindings you use, the flow between triggers and durable, and what context you expect to see.

harleyz commented 6 months ago

Not much to add, but also have the same issue as others in this thread. dotnet 8 c# durable function isolated with distributed v2 tracing turned on and getting the <empty> operation name and also not able to set cloud rolename.

jviau commented 6 months ago

@harleyz being unable set those values in dotnet isolated is expected right now due to the design. It is a different process entirely (the host) that is actually emitting the spans, so any modifications done in the worker process will not be reflected in the host process. This includes any ITelemetryInitializer or ITelemetryProcessor you have in the worker - they will never see the emitted span telemetry because it is done in a different process.

This is a known gap we will have in our Functions OpenTelemetry support we are adding. We will need to review this scenario and see if we can provide a way to flow back mutations on Activity.Current back to the host process.

@bachuv and @RohitRanjanMS for awareness.