elastic / apm-agent-dotnet

https://www.elastic.co/guide/en/apm/agent/dotnet/current/index.html
Apache License 2.0
582 stars 209 forks source link

[BUG] Broken Distributed Tracing in ASP.NET Core 3.1 Razor Pages with Elastic APM Agent ≥ 1.26.0 #2432

Closed Accudraft closed 1 month ago

Accudraft commented 2 months ago

Environment:

Bug Description: Distributed tracing is not functioning as expected in ASP.NET Core 3.1 Razor Pages projects when using the Elastic APM Agent version 1.28.4. While MVC and API projects seem unaffected, Razor Pages projects fail to generate end-to-end traces for requests involving other microservices or external services, even when those services utilize the same APM agent and server. Another issue I found to only affect Razor pages is the Transaction Name. The route could be something like /customer/details/abc-123, but the transaction name only comes through as /details. Again, this doesn’t seem to affect MVC or API projects.

Steps to Reproduce:

  1. Create a new ASP.NET Core 3.1 Razor Pages application.
  2. Install the Elastic APM Agent version 1.26.0 or greater
  3. Implement functionality within the Razor Pages application to communicate with other microservices using HttpClient (ensure these microservices also have the Elastic APM agent installed), and external services such as Dropbox API, Twilio, or Mailgun.
  4. Generate traffic to the Razor Pages application, triggering requests to both internal microservices and external services.
  5. Observe the behavior in Elastic Observability, noting the lack of end-to-end traces, incomplete service map, and missing transaction details.

Expected Behavior:

Actual Behavior:

Additional Information:

stevejgordon commented 1 month ago

Thanks for raising this, @Accudraft. This is surprising and certainly something we need to investigate. I can't see any changes in the diff that would immediately explain it. I'll attempt to reproduce locally and identify why this isn't working as expected.

Can you gather trace level logs from two instrumented services that form a distributed trace? If so, could you email us at microsoft@elastic.co? I can then provide a secure link to send the log files. Our logging should reveal why the distributed tracing may not work.

stevejgordon commented 1 month ago

I've successfully reproduced it and now have the logs I need to proceed. I have a theory that this could be due to the inclusion of http.request.cookies, which we updated after implementing the spec around cookie redaction. There have been some issues with mapping this data in older versions of APM server. I'll try to verify that and then look at the best solutions.

Accudraft commented 1 month ago

Great, I'm glad you were able to reproduce it. I wouldn't have been able to get you the required logs until next week otherwise. Thanks for looking into this.

stevejgordon commented 1 month ago

After digging a bit deeper, I see that this is likely due to the cookie handling changes we made in 1.26.0. Although strictly speaking, the agent is now doing the "right" thing per our apm specification. The issue in my repro is due to dropped transactions when cookies are prefixed with a period character, which fails to be indexed by the server. I'm discussing this with the server team to see what we can/should do about this. @Accudraft, would you be able to check if you use any cookies with a period character at the start, which may confirm my current theory?

Accudraft commented 1 month ago

Yes, all of my cookies start with ".AspNetCore"

Session, Identity, Antiforgery, etc...

Edit: I should note, these are the only cookies my applications use, and I believe they are just the default cookies created by the Microsoft.AspNetCore.Authentication.OpenIdConnect package we use.

stevejgordon commented 1 month ago

Thanks, @Accudraft. Yeah, those are some of the built-in cookies. Technically, it's possible to configure them with different names, but ultimately, any cookie starting with a . will be a problem. We've decided to partially revert the cookie change from 1.26.0 which I'm starting work on this week.

stevejgordon commented 1 month ago

Fixed by #2444