DataDog / dd-trace-dotnet

.NET Client Library for Datadog APM
https://docs.datadoghq.com/tracing/
Apache License 2.0
435 stars 135 forks source link

DD_DBM_PROPAGATION_MODE = full on Azure App Service not working #6044

Open jeremytbrun opened 3 days ago

jeremytbrun commented 3 days ago

Describe the bug I have updated the Azure App Service to the latest version 3.3.0 in order to try full DD_DBM_PROPAGATION_MODE for SQL Server. Previously I was using service mode for DD_DBM_PROPAGATION_MODE and it was working fine. Now I'm getting no propagation data whatsoever. I noticed that upon startup the dotnet-tracer-loader-w3wp-.log is getting spammed with the following over and over again. See attached. dotnet-tracer-loader-w3wp-6096.log

jeremytbrun commented 3 days ago

These dotnet-tracer-loader log files continue to pile up.

jeremytbrun commented 3 days ago

I just did a fresh install of the extension and even with DD_DBM_PROPAGATION_MODE set to "service" with version 3.3.0 of the extension I get the same thing.

bouwkast commented 3 days ago

Hi @jeremytbrun

We are looking into this now. As a potential step could you try seeing if setting DD_DBM_PROPAGATION_MODE to disabled resolves the issue? (obviously only a temporary workaround to the logs getting spammed)

I'm unsure based on those logs if the Tracer is even initializing.

jeremytbrun commented 3 days ago

Hi @jeremytbrun

We are looking into this now. As a potential step could you try seeing if setting DD_DBM_PROPAGATION_MODE to disabled resolves the issue? (obviously only a temporary workaround to the logs getting spammed)

I'm unsure based on those logs if the Tracer is even initializing.

Yes, I'll try this now. Stand by.

jeremytbrun commented 3 days ago

Just did a clean install of v3.3.0 with DD_DBM_PROPAGATION_MODE set to disabled and I'm getting the same behavior. Confirmed that since I first updated from v3.2.0 to v3.3.0 I am not getting ANY trace data within Datadog from this service.

Everything worked fine with v3.2.0 and DD_DBM_PROPAGATION_MODE set to service.

bouwkast commented 3 days ago

Okay thanks for checking and confirming.

I'm following up with other team members at the moment, I'm wondering if there is an issue with the 3.3.000 AAS extension, but I want to see what some others have say

jeremytbrun commented 3 days ago

I wondered the same but comparing the two most recent versions there really isn't much there other than the bump to the tracer library version.

https://github.com/DataDog/datadog-aas-extension/compare/dotnet-v3.2.000...dotnet-v3.3.000

bouwkast commented 3 days ago

@jeremytbrun

If you could, could you try to enable DD_TRACE_DEBUG (set to 1 or true) and see if we get additional debug logs? Also wondering if we have some managed logs from the Tracer.

We have a test AAS application running on 3.3.000 and it seems to be working normally from what we can tell and don't see those same issues.

I'm wondering if there is a way (maybe via commandline) to specify a specific version of the extension - 3.2.000 so we could at least get it back up and running for you.

Also please don't hesitate to also open a ticket here: https://help.datadoghq.com/hc/requests/new?tf_1260824651490=pt_product_type:apm&tf_1900004146284=pt_apm_language:.net

But I will continue looking into it.

jeremytbrun commented 3 days ago

I do have DD_TRACE_DEBUG set to on. Is there a log file you'd like to see?

bouwkast commented 3 days ago

Would you be willing to share all of those logs with us?

bouwkast commented 3 days ago

Would you be willing to share all of those logs with us?

To clarify on this as I wasn't very clear sorry

Would you be willing to share all of those logs with us via that support ticket and upload them that way so it can be done privately. :)

bouwkast commented 3 days ago

Also, if you have profiling enabled, could you try without that to rule out whether it something within the Profiler part of the code?

DD_PROFILING_ENABLED set to false for that.

bouwkast commented 3 days ago

Hi @jeremytbrun

We think we've identified the cause of this issue and are working on a hotfix for this as soon as we can. We will keep you posted on this though.

jeremytbrun commented 3 days ago

That's wonderful news. I will be able to do some more troubleshooting on this tomorrow. Also once this is working I should be able to see database query explain plans directly from an APM trace right? Like in this documentation.

jeremytbrun commented 3 days ago

I was able to tinker with this more. I tried deleting my Azure App Service fully and redeploying again with the v3.3.0 of the extension. This time things seem to be working. But I can't explain why. This isn't the first time I've seen flakiness with the installation/startup process of the extension. I know in the Datadog documentation it explicitly says to Stop the app service before making configuration changes to the extension. I had done that multiple times already. Why does it need to be fully stopped before configuring/installing the extension?

jeremytbrun commented 2 days ago

Things seem to be functioning. I do see a new _contextinfo element in the query samples that are being picked up. Is this a reference to an APM trace? I'm not seeing what I'd expect based on the documentation where if I go to an APM trace I should be able to see a link back to the DBM query information. Documentation I'm talking about is here. image