dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.15k stars 9.92k forks source link

NRE in Kestrel Telemetry EventSources #44817

Open Tratcher opened 1 year ago

Tratcher commented 1 year ago

Mirror of https://github.com/dotnet/runtime/issues/77434

EventSource.IsEnabled() may start returning true before the OnEventCommand completes. We initialize shared EventCounter instances in OnEventCommand that we then use from instrumented code paths.

If the EventSource is initialized before being enabled, and the instance is accessed from multiple threads, a thread could see IsEnabled() == true and then read a null EventCounter instance. This race condition only appears once per process.

I believe this is the cause of an exception YARP hit in CI:

System.NullReferenceException : Object reference not set to an instance of an object.
   at System.Net.Http.HttpTelemetry.Http11RequestLeftQueue(Double timeOnQueueMilliseconds)
   at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)

We should fix these cases.

cc: @Tratcher aspnetcore has 2 such cases as well: https://github.com/dotnet/aspnetcore/blob/1e8fe7e4ba93935ebc10aee6ef2a4d2b3b0c05d9/src/Middleware/ConcurrencyLimiter/src/ConcurrencyLimiterEventSource.cs#L47 https://github.com/dotnet/aspnetcore/blob/1e8fe7e4ba93935ebc10aee6ef2a4d2b3b0c05d9/src/SignalR/common/Http.Connections/src/Internal/HttpConnectionsEventSource.cs#L46

An internal customer hit something similar: MicrosoftTeams-image (1)

MicrosoftTeams-image (2)

adityamandaleeka commented 1 year ago

cc: @davmason

ghost commented 1 year ago

Thanks for contacting us.

We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s). If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

davmason commented 1 year ago

This looks like the same underlying cause as the issue fixed in https://github.com/dotnet/runtime/pull/76965.

EventSource has always called DoCommand in the base EventSouce constructor (before any derived class constructors are run), we've just happened to not run in to it so far because most of our internal EventSources have a static Log field and were being touched before they were enabled.

If you enable an EventSource via the pause on startup feature, then it is enabled before any managed code runs and will always hit this issue. We solved it in MetricsEventSource by moving initialization outside of the constructor.