DataDog / dd-trace-dotnet

.NET Client Library for Datadog APM
https://docs.datadoghq.com/tracing/
Apache License 2.0
456 stars 142 forks source link

Discrepancies between Application Insights and DataDog for Azure Functions #3757

Open scott-mcchesney-optum opened 1 year ago

scott-mcchesney-optum commented 1 year ago

All -

I am a developer working to create Azure Durable Functions in .NET for my company. We are implementing in Azure using the in-process model and DI. I am using a standard logging setup (i.e. no custom logging code), so I'm just leveraging the built-in Application Insights logging that Azure Functions injects for me. I am using ILogger<T> injected loggers as well.

My infrastructure team installed the DataDog plug-in for log capturing. We are using DataDog as our log platform, but Function logging has required a complicated solution that doesn't provide APM, so they are looking at this plug-in as an alternative. It also seems to solve some logging issues we're having with the current solution around the data that's captured.

In any event, the plug-in seems to be working great - but we're noticing some discrepancies between what we see in the App Insights entries and what we see in DataDog. Here is a trace entry from App Insights:

App Insights Trace

While here is the corresponding entry from DataDog:

DataDog Trace

The screenshots show that the Category value is different, which is the oddest part. The Application Insights entry has the <T> part from my injected logger, while the DataDog entry seems to have the Functions-provided value - as if I was using the generic ILogger injected into the Run method. Not only am I not using the generic logger, it's not being injected into the Run method. Finally, the DataDog operation context suggests to me that it's piggybacking on the Application Insights stream...

Also, the Host Instance ID doesn't seem to be captured in the DataDog entry - but it is in the App Insights entry.

I will reiterate that I'm doing no custom logging at all - my Startup.cs has no logging-configuration code in it. I have my host.json file configured as follows (the last logging entry is a partial namespace so I can capture all my logging from all the libraries in the solution):

{
  "version": "2.0",
  "logging": {
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "excludedTypes": "Request"
      }
    },
    "logLevel": {
      "default": "Information",
      "Guru.Project": "Information"
    }
  }
}

I was not privy to the DataDog setup, but I can get that from the engineer who did it if it's needed.

I don't know that this is a bug - we might have some sort of setup/configuration wrong. But it's not a Feature Request. :-) Any assistance would be greatly appreciated.

pierotibou commented 1 year ago

Hello @scott-mcchesney-optum, thanks for reaching out. So it's hard for us to tell exactly what's going on based on the information provided. Would you mind giving a bit more insights about your code? Seeing where the logger has been called, how the function is defined.. would be of great help here if you don't mind.

scott-mcchesney-optum commented 1 year ago

Sure.

So in this case this happens to be a service that's injected into an orchestration (making the whole thing stranger, IMO... how is DataDog getting the function name this service is being called from?)

The logger is a straight ILogger<T> injected into the service constructor and stored in a class variable:

public ControlPlaneJobPollingService(IOptions<OrchestrationConfigurationOptions> options, ILogger<ControlPlaneJobPollingService> logger)
{
    _logger = logger ?? throw new ArgumentNullException(nameof(logger), "Logger is missing.");
    _options = options?.Value ?? throw new ArgumentNullException(nameof(options), "Orchestration Options are missing.");
}

The method in question gets the IDurableOrchestrationContext passed into it:

public async Task<WorkspaceProcessResponse> PollForJobStatusAsync(IDurableOrchestrationContext context, string jobName, long workspaceId, string eventId, string guruProjectId)

and a replay-safe logger is created from that:

var logger = context.CreateReplaySafeLogger(_logger);

that is then used to make standard logging calls:

logger.LogInformation("Getting Job-Status information");

This service is injected into the orchestration constructor:

public CreateComputeOrchestration(IInstanceRequestHeaderService requestHeaderService, IRequestHeaderMapper<ProjectWorkflowRequest> headerMapper, IControlPlaneJobPollingService controlPlaneJobPollingService)
{
    _requestHeaderService = requestHeaderService ?? throw new ArgumentNullException(nameof(requestHeaderService), "API Header Cache service is missing");
    _headerMapper = headerMapper ?? throw new ArgumentNullException(nameof(headerMapper), "Request Header Mapper is missing");
    _controlPlaneJobPollingService = controlPlaneJobPollingService ?? throw new ArgumentNullException(nameof(controlPlaneJobPollingService), "Job Polling Service is missing");
}

Which is then called at the appropriate points in the orchestration's Run method:

[FunctionName(FunctionConstants.CreateComputeOrchestrationName)]
public async Task<ComputeProvisionResult> Run([OrchestrationTrigger] IDurableOrchestrationContext context, ILogger log)

...

var pollResponse = await _controlPlaneJobPollingService.PollForJobStatusAsync(context, FunctionConstants.CreateComputeOperation, workspaceId, eventId, input.GuruProjectId);

Note that the orchestration gets the ILogger that the infrastructure provides - it's not using ILogger<T> yet. (It will soon.) But that logging instance is not passed into the job-polling service. And while the service creates a replay-safe logger from the context, the code I saw for that extension method always returns a new logger instance.

I hope this helps!

andrewlock commented 1 year ago

Hi @scott-mcchesney-optum,

The short answer is that we haven't looked at what's required to support durable functions, so the behaviour is not entirely unexpected (though it's certainly interesting!) We'll investigate how hard it would be to add support, including investigating this logger issue, and will clarify in the docs.

Thanks!

scott-mcchesney-optum commented 1 year ago

@andrewlock Durable Functions are a layer on top of regular Azure Functions, so hopefully it isn't too hard to sort through...

Thanks!

jmelosegui commented 1 month ago

I have a related issue (An invalid asynchronous invocation was detected). @andrewlock, are there any plans to support Durable Functions within the Datadog extension for Azure Application Services?

Thanks in advance.

bouwkast commented 1 month ago

Hi @jmelosegui,

At the moment we don't have any immediate plans to support tracing Durable Functions for AAS. Could you open a feature request ticket here: https://help.datadoghq.com/hc/requests/new?tf_1260824651490=pt_product_type:apm&tf_1900004146284=pt_apm_language:.net&tf_1260825272270=pt_apm_category_feature_request

This will help us get it routed to the correct teams to get the feature request tracked and prioritized correctly.

Thanks!

jbanschbach commented 23 hours ago

The orchestrators are the only thing it seems to be breaking since there are code restrictions on them. Does DataDog have any way to annotate the orchestrator functions so they are excluded from being traced by the extension? As it stands, our only option to get logs for durable functions into DataDog is to configure a diagnostic setting to forward logs to an event hub which is less than desirable.