Closed sandersaares closed 2 years ago
@josalem
I see the exact same behavior, however it does not happen if I disable metrics. I turn metrics back on and every time I get the error. If I just ignore the error the application continues and works fine.
Checking in to confirm I can reproduce this behavior on Win11 with VS2022. Just to rule out this being related to VS changes, what version of VS were you using?
VS 2022 on windows 10 and 11.
Sent from my T-Mobile 5G Device Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: John Salem @.> Sent: Wednesday, March 23, 2022 10:47:54 AM To: dotnet/runtime @.> Cc: Steve Hoff @.>; Manual @.> Subject: Re: [dotnet/runtime] NRE in System.Diagnostics.Tracing.EventPipePayloadDecoder if both GC + Threading event keywords are enabled (Issue #65630)
CAUTION: This message originated externally, please use caution when clicking on links or opening attachments!
Checking in to confirm I can reproduce this behavior on Win11 with VS2022. Just to rule out this being related to VS changes, what version of VS were you using?
— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fruntime%2Fissues%2F65630%23issuecomment-1076503073&data=04%7C01%7Cshoff%40manageamerica.com%7Cb65ee91c99794873b6cc08da0ce47f4b%7Cc018dadddf3246b5a934014aff27b8fe%7C1%7C0%7C637836472782296804%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=0LsXCiWAd8%2By9Zppa3gYcrwlu%2B02skByd3EQK%2Fx96r4%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FARQY4C5XX3IUECDDUEX62RTVBM4KVANCNFSM5O33GKOQ&data=04%7C01%7Cshoff%40manageamerica.com%7Cb65ee91c99794873b6cc08da0ce47f4b%7Cc018dadddf3246b5a934014aff27b8fe%7C1%7C0%7C637836472782296804%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=64i4aYuh89R659%2Ba5%2FK8hKT8qwkp%2BdmFbUYDsySYRsM%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Okay, I've root caused this. It is happening regardless of the debugger being attached. The lack of a using
statement on the EventListener
in the sample was just causing the exception to go unnoticed because it happens on a task that is only awaited in the teardown of the listening logic. If you add a manual Dispose()
or a using
statement, the NRE is realized without the debugger. This has the added repercussion that you shouldn't be receiving any events on the EventListener
after this happens because the Task
will be in an exception state till it is awaited.
In .net6 we added a managed implementation of the thread pool. To send the same events under the same provider we had to do some duplication of event definitions so that managed code had the ability to send the same events. Every event needs to be represented in managed code for EventListener
to know how to parse the events. To that end, there is a python script that generates managed function stubs with the correct event IDs and parameters. For components that transition from native to managed, these stubs would be filled in by hand. For example, the thread pool had all the events sent from managed code filled out.
After that conversion, it looks like any event with ThreadPool
in its name gets left out of the code generation phase:
https://github.com/dotnet/runtime/blob/0d59046238d36178e8d1f60a6509fbd6b73fc697/src/coreclr/scripts/genRuntimeEventSources.py#L88-L91
This specific event, however, is only sent inside an FCALL related to overlapped IO and wasn't implemented in managed code:
<event value="65" version="0" level="win:Verbose" template="ThreadPoolIOWork"
keywords="ThreadingKeyword"
task="ThreadPool"
opcode="IOPack"
symbol="ThreadPoolIOPack"
message="$(string.RuntimePublisher.ThreadPoolIOPackEventMessage)"/>
As a result, managed code doesn't have any metadata for this event. When the event is dispatched to managed code, EventSource
attempts to read the struct in m_eventData
which was zero initialized, so it NREs.
The reason this was only repro-ing for a web app is that ASP.NET Core on Windows uses a lot of overlapped IO so it sends these events while a basic console app does not.
I have a patch that fixes this and will hopefully have a PR out very soon. We will need to backport this to .net6.
edit: This issue only presents for an in-process EventListener
. Out-of-process listeners will not be causing NREs in the target runtime.
CC @kouvel @dotnet/dotnet-diag - We are going to need a better system of keeping track of what events need to be represented where. Right now, there is no validation ensuring all events are represented. There may be other events getting filtered out by genruntimeeventsource.py
that aren't being represented inNativeRuntimeEventSource.*.cs
. I'm not sure we should add such validation to the initial fix to keep it small for back porting, but we should have something going forward. I'm inclined to say we should check in the outputs of the gen*.py
scripts and have developers manually re-run them if ClrEtwAll.man
is changed.
Reopening to track backporting to .net6
Backported to 6.0 in https://github.com/dotnet/runtime/pull/68876
Description
An unhandled NullReferenceException is thrown in
System.Diagnostics.Tracing.EventPipePayloadDecoder.DecodePayload()
becausemetadata.Parameters
is null.Reproduction Steps
Create a .NET 6 ASP.NET Core web app and replace Program.cs with:
Ensure that no filtering is happening in IDE (e.g. Just My Code) and press F5 to run the app.
Expected behavior
App runs normally, returns a 404 result as the web pipeline is empty. Debugger does not break on an exception.
Actual behavior
Debugger immediately breaks on exception at startup.
The entire metadata object passed to
DecodePayload
looks uninitialized to my eye. The broken event always seems to occupy index 65 in the array of events.Exception only occurs once on startup. If I continue execution to skip past the exception, all seems well.
Original report: https://github.com/djluck/prometheus-net.DotNetRuntime/issues/65
Regression?
Comments in original report (https://github.com/djluck/prometheus-net.DotNetRuntime/issues/65) indicate it worked fine in .NET 5.
Known Workarounds
None.
Configuration
.NET 6, Windows 10 x64
Other information
No response