Closed v-wenyuxu closed 11 months ago
Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.
Author: | v-wenyuxu |
---|---|
Assignees: | - |
Labels: | `arch-arm64`, `os-windows`, `JitStress`, `area-Diagnostics-coreclr`, `untriaged`, `needs-area-label` |
Milestone: | - |
Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti See info in area-owners.md if you want to be subscribed.
Author: | v-wenyuxu |
---|---|
Assignees: | - |
Labels: | `arch-arm64`, `area-System.Diagnostics.Tracing`, `os-windows`, `JitStress`, `area-Diagnostics-coreclr`, `needs-area-label` |
Milestone: | 9.0.0 |
Failed again in: runtime-coreclr libraries-jitstress 20230912.1
Failed tests:
net9.0-windows-Release-arm64-CoreCLR_checked-zapdisable-Windows.11.Arm64.Open
- BasicEventSourceTests.TestsWrite.Test_Write_T_ETW
Error message:
System.Runtime.InteropServices.COMException : The instance name passed was not recognized as valid by a WMI data provider. (0x80071069)
Stack trace:
at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.Flush()
at BasicEventSourceTests.EtwListener.Dispose() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/Harness/EtwListener.cs:line 97
at BasicEventSourceTests.TestsWrite.Test_Write_T_ETW() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestsWrite.Etw.cs:line 27
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs:line 57
I see two problems
new EtwListener()
which means they all use the same file:
public EtwListener(string dataFileName = "EventSourceTestData.etl", string sessionName = "EventSourceTestSession")
for this reason they try hard to run serially, disabling parallelism 3 ways -- https://github.com/dotnet/runtime/blob/e555a0e16635c6ff2d88366f19d0781ac1fef0ee/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/XUnitAssemblyAttributes.cs#L7
this is apparently not working however
Discovering: System.Diagnostics.Tracing.Tests (method display = ClassAndMethod, method display options = None)
Discovered: System.Diagnostics.Tracing.Tests (found 45 of 49 test cases)
Starting: System.Diagnostics.Tracing.Tests (parallel test collections = on, max threads = 4) <<<<<<<<<<<<<
and that may be causing this
System.Runtime.InteropServices.COMException : The instance name passed was not recognized as valid by a WMI data provider. (0x80071069)
Stack Trace:
at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.Flush()
/_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/Harness/EtwListener.cs(97,0): at BasicEventSourceTests.EtwListener.Dispose()
/_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestsWrite.Etw.cs(27,0): at BasicEventSourceTests.TestsWrite.Test_Write_T_ETW()
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
/_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs(57,0): at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
Question is -- why is parallelism happening
hence this output
Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
at System.Collections.Generic.SortedSet`1.Node.get_Is4Node() in /_/src/libraries/System.Collections/src/System/Collections/Generic/SortedSet.cs:line 1622
at System.Collections.Generic.SortedSet`1.AddIfNotPresent(T item) in /_/src/libraries/System.Collections/src/System/Collections/Generic/SortedSet.cs:line 339
at System.Collections.Generic.TreeSet`1.AddIfNotPresent(T item) in /_/src/libraries/System.Collections/src/System/Collections/Generic/SortedDictionary.cs:line 901
at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.get_ProviderNameToGuid()
at Microsoft.Diagnostics.Tracing.Session.TraceEventProviders.GetProviderGuidByName(String name)
at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.EnableProvider(String providerName, TraceEventLevel providerLevel, UInt64 matchAnyKeywords, TraceEventProviderOptions options)
at BasicEventSourceTests.TestsManifestGeneration.<Test_EventSource_EtwManifestGenerationRollover>b__3_0() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestsManifestGeneration.Etw.cs:line 98
I first assumed the first issue is simply exposing the second. But these are happening in remote exec processes. Only one thing should be happening in the process. Not clear what is going on. But whatever is causing this to be exposed -- is it a bug that should be fixed in TraceEventSession? (in dotnet/perfview repo -- @brianrob )?
@noahfalk @tarekgh
CC @davmason as it is EventSource issue.
My best guess is that the NullReferenceException
is due to unprotected concurrent calls to TraceEventSession.ProviderNameToGuid
. Once this fails, the session likely doesn't get fully created and calling Flush
results in further failures because the underlying session wasn't actually created.
From what I can see, the fix is to address the unexpected concurrency. TraceEventSession, especially with the same session name and file, is not designed to be used concurrently and can result in all sorts of unexpected behaviors.
TraceEventSession, especially with the same session name and file, is not designed to be used concurrently and can result in all sorts of unexpected behaviors.
@brianrob is constructing two TraceEventSession concurrently (with different file and name parameters) expected to be unsafe?
TraceEventSession, especially with the same session name and file, is not designed to be used concurrently and can result in all sorts of unexpected behaviors.
@brianrob is constructing two TraceEventSession concurrently (with different file and name parameters) expected to be unsafe?
This should be fine. The key is that the name and file parameters are used to create machine-wide constructs, and that's why it's a problem if they are the same (unless you're explicitly attaching to an existing session, but that's another story and not what you're doing here). If you wanted to run tests in parallel, you could change the session names and ETL file paths and that should be OK. You just don't want too much parallelism because each machine has a maximum number of concurrent sessions, and it's not difficult to hit that, especially for Microsoft machines.
@brianrob the reason I'm asking is that it appears that is not safe eg., the static SortedDictionary can get written by two concurrent threads.
I don't think there is a strong statement around these APIs as being thread-safe, but many of them are. It feels to me like you should be able to call EnableProvider
on two different sessions concurrently. I'll make a fix for this.
Posted https://github.com/microsoft/perfview/pull/1934 to address this.
OK, this issue is now tracking why our elaborate protections against any test concurrency in System.Diagnostics.Tracing.Tests are not doing the job...
@danmoseley is there a better person to assign this to since we're now tracking test concurrency issues? I don't believe I'm the right person to investigate issues in the test runner
Failed in: runtime-coreclr libraries-jitstress 20230907.1
Failed tests:
Error message:
Stack trace: