dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.97k stars 4.66k forks source link

profiler\multiple test failed #100114

Closed jkotas closed 3 months ago

jkotas commented 6 months ago

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=655183 Build error leg or test failing: profiler\multiple\multiple\multiple.cmd Pull request: https://github.com/dotnet/runtime/pull/101291

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": ["win-arm64", "profiler\\multiple\\multiple\\multiple.cmd [FAIL]"],
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: :mag_right: Result validation: :warning: Validation could not be done without an Azure DevOps build URL on the issue. Please add it to the "Build: :mag_right:" line. Validation performed at: 4/25/2024 9:12:24 PM UTC

Report

Build Definition Test Pull Request
667308 dotnet/runtime profiler.multiple.WorkItemExecution dotnet/runtime#101925

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 1
jkotas commented 6 months ago

This is crashing in new EH:

0b 00000032`5797bd90 00007ffa`cd1a213c     ntdll!RtlDispatchException+0x200 [minkernel\ntos\rtl\arm64\exdsptch.c @ 720] 
0c 00000032`5797c3d0 00007ffa`c8ef8274     ntdll!RtlRaiseException+0xbc [minkernel\ntos\rtl\arm64\raise.c @ 123] 
0d 00000032`5797c8b0 00007ffa`756d4be8     KERNELBASE!RaiseException+0x54 [minkernel\kernelbase\xcpt.c @ 936] 
0e 00000032`5797c960 00007ffa`742f2160     coreclr!SfiNext+0x308 [D:\a\_work\1\s\src\coreclr\vm\exceptionhandling.cpp @ 8432] 
0f 00000032`5797cad0 00007ffa`742f1678     System_Private_CoreLib!System.Runtime.EH.DispatchEx(System.Runtime.StackFrameIterator ByRef, ExInfo ByRef)+0x6a0 [/_/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/ExceptionHandling.cs @ 861] 
10 00000032`5797cc30 00007ffa`753b1e54     System_Private_CoreLib!System.Runtime.EH.RhThrowEx(System.Object, ExInfo ByRef)+0x68 [/_/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/ExceptionHandling.cs @ 642] 
11 00000032`5797cc50 00007ffa`7552f9c8     coreclr!CallDescrWorkerInternal+0x84 [D:\a\_work\1\s\artifacts\obj\coreclr\windows.arm64.Checked\vm\wks\CallDescrWorkerARM64.asm @ 4989] 
12 00000032`5797cc70 00007ffa`755301e0     coreclr!CallDescrWorkerWithHandler+0x128 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 67] 
13 00000032`5797ccc0 00007ffa`756c9280     coreclr!DispatchCallSimple+0xe0 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 221] 
14 00000032`5797cda0 00007ffa`756c9320     coreclr!DispatchManagedException+0x1e0 [D:\a\_work\1\s\src\coreclr\vm\exceptionhandling.cpp @ 5645] 
15 00000032`5797e0a0 00007ffa`755ba154     coreclr!DispatchManagedException+0x40 [D:\a\_work\1\s\src\coreclr\vm\exceptionhandling.cpp @ 5662] 
16 00000032`5797e460 00007ffa`755abb14     coreclr!ThrowNew+0xe4 [D:\a\_work\1\s\src\coreclr\vm\jithelpers.cpp @ 4229] 
17 00000032`5797e4c0 00007ffa`15f0e534     coreclr!IL_Throw+0xb4 [D:\a\_work\1\s\src\coreclr\vm\jithelpers.cpp @ 4259] 
lewing commented 4 months ago

@janvorli has this been fixed?

janvorli commented 4 months ago

@lewing I believe I have looked into the original and it was actually not an issue of the new EH, but rather a regular unhandled exception or something along these lines. Unfortunately, the only two remaining failures from the list above are misclassified, so I cannot double check it now.

jkotas commented 4 months ago

Either way, CI does not hit this anymore, so the issue can be closed.

jkotas commented 4 months ago

Failed again in https://dev.azure.com/dnceng-public/public/_build/results?buildId=655183&view=ms.vss-test-web.build-test-results-tab&runId=16217960&resultId=100465&paneView=debug with the same stacktrace as https://github.com/dotnet/runtime/issues/100114#issuecomment-2014220793

janvorli commented 4 months ago

@jkotas the failure in the referenced run is just an unhandled exception. From the test log:

      Output:
        Unhandled exception. System.Exception: Profiler tests are expected to contain the text 'PROFILER TEST PASSES' in the console output of the profilee app to indicate a passing test. Usually it is printed from the Shutdown() method of the profiler implementation. This text was not found in the output above.
           at Profiler.Tests.ProfilerTestRunner.FailFastWithMessage(String error)
           at Profiler.Tests.ProfilerTestRunner.Run(String profileePath, String testName, Guid profilerClsid, String profileeArguments, ProfileeOptions profileeOptions, Dictionary`2 envVars, String reverseServerName, Boolean loadAsNotification, Int32 notificationCopies)
           at Profiler.Tests.MultiplyLoaded.Main(String[] args)

The last message from the actual test is

       Waiting for profilers to all detach

So that sounds like profiler or test related issue.

jkotas commented 4 months ago

Ah ok, I have not realized that this is the new callstack for unhandled exception. I agree that it looks like a profiler test issue. We would need a crash dump of the hung process to diagnose it.

dotnet-policy-service[bot] commented 4 months ago

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.

jkotas commented 4 months ago

Based on the dump, the child process exit code is 0:

0:000> !do 000001f8`7080afc8 
Name:        System.Diagnostics.Process
MethodTable: 00007ffbf63dca00

...
00007ffbf5c4f900  40000dd       d8         System.Int32  1 instance                0 _exitCode
...

It suggests that the child process exited without crashing. The problem seems to be in the parsing of the child process output.

tommcdon commented 4 months ago

@jkotas do we expect this issue to be fixed by https://github.com/dotnet/runtime/pull/101607?

jkotas commented 4 months ago

Missing volatile that I have added in https://github.com/dotnet/runtime/pull/101607 can explain failures with these symptoms. I suspect that my change is not going to fix the failures. The test has been failing too frequently for missing volatile to be the root cause of the failures.

I would keep this issue open for 2 weeks. If we do not see any new hits in 2 weeks, we can declare it fixed.

v-wenyuxu commented 4 months ago

Failed in: runtime-coreclr r2r 20240429.1

Failed tests:

R2R-CG2 windows arm64 Checked no_tiered_compilation @ Windows.11.Arm64.Open
    - profiler\\multiple\\multiple\\multiple.cmd

Error message:

 Unhandled exception. System.Exception: Profiler tests are expected to contain the text 'PROFILER TEST PASSES' in the console output of the profilee app to indicate a passing test. Usually it is printed from the Shutdown() method of the profiler implementation. This text was not found in the output above. Profilee returned exit code 100.
   at Profiler.Tests.ProfilerTestRunner.FailFastWithMessage(String error)
   at Profiler.Tests.ProfilerTestRunner.Run(String profileePath, String testName, Guid profilerClsid, String profileeArguments, ProfileeOptions profileeOptions, Dictionary`2 envVars, String reverseServerName, Boolean loadAsNotification, Int32 notificationCopies)
   at Profiler.Tests.MultiplyLoaded.Main(String[] args)

Return code:      1
Raw output file:      C:\h\w\C29B0A72\w\AB5E08EF\uploads\Reports\profiler.multiple\multiple\multiple.output.txt
Raw output:
BEGIN EXECUTION
eventpipe_common.dll
Microsoft.Diagnostics.NETCore.Client.dll
multiple.dll
Profiler.dll
profiler_common.dll
TestLibrary.dll
        6 file(s) copied.
 6:55:01.56
Response file: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll.rsp
C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\IL-CG2\eventpipe_common.dll
-o:C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll
--targetarch:arm64
--targetos:windows
--verify-type-and-field-layout
--method-layout:random
-r:C:\h\w\C29B0A72\p\System.*.dll
-r:C:\h\w\C29B0A72\p\Microsoft.*.dll
-r:C:\h\w\C29B0A72\p\xunit.*.dll
-r:C:\h\w\C29B0A72\p\mscorlib.dll
-r:C:\h\w\C29B0A72\p\netstandard.dll
" "dotnet" "C:\h\w\C29B0A72\p\crossgen2\crossgen2.dll" @"C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll.rsp"   -r:C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\IL-CG2\*.dll"
Emitting R2R PE file: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll
" "dotnet" "C:\h\w\C29B0A72\p\r2rdump\r2rdump.dll" --header --sc --in C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll --out C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll.r2rdump --val"
 6:55:06.92
 6:55:06.92
Response file: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll.rsp
C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\IL-CG2\Microsoft.Diagnostics.NETCore.Client.dll
-o:C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll
--targetarch:arm64
--targetos:windows
--verify-type-and-field-layout
--method-layout:random
-r:C:\h\w\C29B0A72\p\System.*.dll
-r:C:\h\w\C29B0A72\p\Microsoft.*.dll
-r:C:\h\w\C29B0A72\p\xunit.*.dll
-r:C:\h\w\C29B0A72\p\mscorlib.dll
-r:C:\h\w\C29B0A72\p\netstandard.dll
" "dotnet" "C:\h\w\C29B0A72\p\crossgen2\crossgen2.dll" @"C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll.rsp"   -r:C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\IL-CG2\*.dll"
Emitting R2R PE file: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll
" "dotnet" "C:\h\w\C29B0A72\p\r2rdump\r2rdump.dll" --header --sc --in C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll --out C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll.r2rdump --val"
 6:55:10.09
 6:55:10.11
Response file: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\multiple.dll.rsp
C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\IL-CG2\multiple.dll
-o:C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\multiple.dll
--targetarch:arm64
--targetos:windows
--verify-type-and-field-layout
--method-layout:random
-r:C:\h\w\C29B0A72\p\System.*.dll
-r:C:\h\w\C29B0A72\p\Microsoft.*.dll
-r:C:\h\w\C29B0A72\p\xunit.*.dll
-r:C:\h\w\C29B0A72\p\mscorlib.dll
-r:C:\h\w\C29B0A72\p\netstandard.dll
" "dotnet" "C:\h\w\C2

Stack trace:

   at profiler_multiple._multiple_multiple_._multiple_multiple_cmd()
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
davmason commented 4 months ago

The latest failure doesn't make sense to me

      Profiler path: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Profiler.dll
      Profiler path: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Profiler.dll
      Profiler.dll!DllGetClassObject
      MultiplyLoaded::Initialize
      Profiler.dll!Profiler::Initialize
      Profiler.dll!DllGetClassObject
      MultiplyLoaded::Initialize
      Profiler.dll!Profiler::Initialize
      Profiler path: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Profiler.dll
      Profiler.dll!DllGetClassObject
      MultiplyLoaded::InitializeForAttach
      Profiler.dll!Profiler::Initialize
      Throwing exception
      Exception caught
      Waiting for profilers to all detach

      Test failed: Profiler tests are expected to contain the text 'PROFILER TEST PASSES' in the console output of the profilee app to indicate a passing test. Usually it is printed from the Shutdown() method of the profiler implementation. This text was not found in the output above. Profilee returned exit code 100.
      Expected: 100
      Actual: -532462766
      END EXECUTION - FAILED
      FAILED
      Test failed. Trying to see if dump file was created in C:\cores since 4/30/2024 6:55:01 AM

The profilee returned code 100 which means that the _profilerDone.WaitOne call succeeded, but we should only set the event after seeing all three profilers successfully detach. There are no logs indicating any profiler attached however. My only thought is that somehow we are just missing output from the profiled process, but we call Process.WaitForExit() which says in the docs that all IO is guaranteed to occur before it returns.

jkotas commented 4 months ago

A good question to ask is why this test and why arm64 only. It all points to bug(s) related to arm64 weaker memory model in the test code or in the code specifically exercised by this test.

The missing volatile that I have added in https://github.com/dotnet/runtime/pull/101607/files#diff-667a604f90dafe4391d86c87780a736aa2eb10010f194966110baeb141d05370R198 was a bug like that. My guess is that we have more of them.

davmason commented 4 months ago

I fixed another issue and added a speculative fix in #102066, for now I am going to wait to see if it fails again before investigating any further

tommcdon commented 3 months ago

There have been no issues reported by automation since this issue has been fixed, closing as "fixed"