dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.07k stars 4.69k forks source link

Test `System.Runtime.Serialization.Xml.ReflectionOnly.Tests` stack overflowing in Windows x64 release for `libraries-pgo` pipeline #100246

Closed TIHan closed 1 month ago

TIHan commented 6 months ago

Some tests in System.Runtime.Serialization.Xml.ReflectionOnly.Tests are failing due to a stack overflow on windows x64 for libraries-pgo pipeline.

Link

set DOTNET_TieredCompilation=1
set DOTNET_TC_OnStackReplacement=1
set DOTNET_TC_QuickJitForLoops=1
set DOTNET_TC_OnStackReplacement_InitialCounter=1
set DOTNET_OSR_HitLimit=2
set DOTNET_JitRandomOnStackReplacement=15

C:\h\w\A5B1092D\w\B8060970\e>"C:\h\w\A5B1092D\p\dotnet.exe" exec --runtimeconfig System.Runtime.Serialization.Xml.ReflectionOnly.Tests.runtimeconfig.json --depsfile System.Runtime.Serialization.Xml.ReflectionOnly.Tests.deps.json xunit.console.dll System.Runtime.Serialization.Xml.ReflectionOnly.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Runtime.Serialization.Xml.ReflectionOnly.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Runtime.Serialization.Xml.ReflectionOnly.Tests (found 260 of 261 test cases)
  Starting:    System.Runtime.Serialization.Xml.ReflectionOnly.Tests (parallel test collections = on [4 threads], stop on fail = off)
Stack overflow.
   at System.Xml.ValueHandle.GetCharsText()
   at System.Xml.ValueHandle.GetString()
AndyAyersMS commented 6 months ago

Likely OSR related, as this stress mode forces a whole lot of methods to go through OSR.

@dotnet/jit-contrib -- possibly a good issue for somebody who has never debugged an OSR problem. Feel free to assign yourself.

amanasifkhalid commented 6 months ago

I haven't been able to reproduce this locally -- @AndyAyersMS is DOTNET_JitRandomOnStackReplacement deterministic?

AndyAyersMS commented 6 months ago

I haven't been able to reproduce this locally -- @AndyAyersMS is DOTNET_JitRandomOnStackReplacement deterministic?

Yes and no -- the "random" placement of patchpoints is deterministic; whether or how often the patchpoints get hit depends on the determinism of the test. And xunit by default is intentionally randomizing, so it's often the case that you need to run a few hundred times to be sure you don't have a repro.

amanasifkhalid commented 2 months ago

Using a checked JIT built from release/9.0-preview5, I cannot reproduce this after hundreds of runs. libraries-pgo hasn't hit this recently, either.

amanasifkhalid commented 2 months ago

I spoke too soon... The last run of libraries-pgo hit this issue again.

AndyAyersMS commented 2 months ago

Also here: https://dev.azure.com/dnceng-public/public/_build/results?buildId=758261&view=ms.vss-test-web.build-test-results-tab

amanasifkhalid commented 1 month ago

After discussing with @AndyAyersMS, I don't think this is due to an OSR bug. This failure only affects Windows x64, which is noteworthy for a few reasons:

My new hypothesis is this test has especially deep call stacks, and the additional memory pressure from OSR patchpoints causes intermittent stack overflows, particularly when methods with patchpoints are especially hot in a recursive call chain, and we don't have time to tier these methods up before overflowing.

I tried reproducing these stack overflows on Linux x64, since it's easier to modify the default stack size. First, I tried constraining the stack size significantly, just to see if these large call chains would become a problem without any OSR stress. With a stack size of ~700 KB, I can reliably reproduce the stack overflow without any OSR stress modes set. Next, I set the stack size to 1 MB to match Windows, and set DOTNET_JitRandomOnStackReplacement=100 to maximize frame bloat. I couldn't consistently reproduce the stack overflow, but if I drop the stack size to 980 KB, I hit it every time (I suspect this discrepancy is due to tiering sometimes taking too long?).

In a particular run, System.Runtime.Serialization.ReflectionReader:ReflectionReadValue shows up in the problematic stack trace multiple times. Without any OSR stress modes, we don't insert patchpoints into this method, and the frame info looks like this:

;  V00 this         [V00    ] (  1,  1   )     ref  ->  [rbp-0x08]  do-not-enreg[] this class-hnd <System.Runtime.Serialization.ReflectionReader>
;  V01 arg1         [V01    ] (  1,  1   )     ref  ->  [rbp-0x10]  do-not-enreg[] class-hnd <System.Runtime.Serialization.XmlReaderDelegator>
;  V02 arg2         [V02    ] (  1,  1   )     ref  ->  [rbp-0x18]  do-not-enreg[] class-hnd <System.Runtime.Serialization.XmlObjectSerializerReadContext>
;  V03 arg3         [V03    ] (  1,  1   )     ref  ->  [rbp-0x20]  do-not-enreg[] class-hnd <System.Runtime.Serialization.DataContracts.DataMember>
;  V04 arg4         [V04    ] (  1,  1   )     ref  ->  [rbp-0x28]  do-not-enreg[] class-hnd <System.String>
;  V05 loc0         [V05    ] (  1,  1   )     ref  ->  [rbp-0x30]  do-not-enreg[] must-init class-hnd <System.Type>
;  V06 loc1         [V06    ] (  1,  1   )     ref  ->  [rbp-0x38]  do-not-enreg[] must-init class-hnd exact <System.String>
;  V07 OutArgs      [V07    ] (  1,  1   )  struct ( 8) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V08 tmp1         [V08    ] (  1,  1   )     ref  ->  [rbp-0x40]  do-not-enreg[] must-init "argument with side effect"
;
; Lcl frame size = 80

And with DOTNET_JitRandomOnStackReplacement=100 set, we see the patchpoint bloating the frame:

;  V00 this         [V00    ] (  1,  1   )     ref  ->  [rbp-0x30]  do-not-enreg[] this class-hnd <System.Runtime.Serialization.ReflectionReader>
;  V01 arg1         [V01    ] (  1,  1   )     ref  ->  [rbp-0x38]  do-not-enreg[] class-hnd <System.Runtime.Serialization.XmlReaderDelegator>
;  V02 arg2         [V02    ] (  1,  1   )     ref  ->  [rbp-0x40]  do-not-enreg[] class-hnd <System.Runtime.Serialization.XmlObjectSerializerReadContext>
;  V03 arg3         [V03    ] (  1,  1   )     ref  ->  [rbp-0x48]  do-not-enreg[] class-hnd <System.Runtime.Serialization.DataContracts.DataMember>
;  V04 arg4         [V04    ] (  1,  1   )     ref  ->  [rbp-0x50]  do-not-enreg[] class-hnd <System.String>
;  V05 loc0         [V05    ] (  1,  1   )     ref  ->  [rbp-0x58]  do-not-enreg[] must-init class-hnd <System.Type>
;  V06 loc1         [V06    ] (  1,  1   )     ref  ->  [rbp-0x60]  do-not-enreg[] must-init class-hnd exact <System.String>
;  V07 OutArgs      [V07    ] (  1,  1   )  struct ( 8) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V08 tmp1         [V08    ] (  1,  1   )     int  ->  [rbp-0x68]  do-not-enreg[X] addr-exposed "patchpoint counter"
;  V09 tmp2         [V09    ] (  1,  1   )     ref  ->  [rbp-0x70]  do-not-enreg[] must-init "argument with side effect"
;
; Lcl frame size = 128

Thus, I don't think this is an actual OSR bug, but rather the stress mode exhausting the machine's stack space. @AndyAyersMS thanks for your help in investigating this! Do you suggest we just exclude this test from OSR stress? I imagine we can do that similarly to how we exclude tests from GCStress?

AndyAyersMS commented 1 month ago

There is no OSR stress exclude mechanism right now, and I don't know if we have a way of exempting libraries tests from gc stress either...?

I wouldn't expect tier0+patchpoint frames to get all that much bigger than normal tier0 methods. Though I suppose there is extra register save space (x64 only) and the slot for the counter. Maybe that's enough to tip this case over. Would be good to look at the mixture of Tier0 vs OSR methods on the stack at the point of overflow.

Also it looks like in the CI run above that the thread that hits overflow is in the patchpoint helper at the point it overflows.

        Child SP               IP Call Site
0000005C092C6FE0 00007FFB4A7E6AE3 coreclr!EEContract::DoChecks + 131
0000005C092C7060 00007FFB4A540969 coreclr!GetIP + 537
0000005C092C71A0 00007FFB4A6E26B7 coreclr!Thread::VirtualUnwindCallFrame + 663
0000005C092C7320 00007FFB4A6E2C40 coreclr!Thread::VirtualUnwindToFirstManagedCallFrame + 816
0000005C092C74E0 00007FFB4A87CDD8 coreclr!JIT_Patchpoint + 1176
0000005C092C7B10 00007FFAEE9693EA 
0000005C092C7B18 00000249F7D82790 
0000005C092C7B20 0000020965EC1F88 
0000005C092C7B28 0000020965EC13A0 
0000005C092C7B30 00000249F851AC50 
0000005C0943AEF8                  [HelperMethodFrame_PROTECTOBJ: 0000005c0943aef8] System.Private.CoreLib.dll!System.RuntimeMethodHandle.InvokeMethod(System.Object, Void**, System.Signature, Boolean)
0000005C0943B050 00007FFAEEAB46E1 System.Private.CoreLib.dll!System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(System.Object, System.Reflection.BindingFlags) + 145
0000005C0943B0C0 00007FFAEEAB4361 System.Private.CoreLib.dll!System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo) + 241

It would be good to get a more accurate picture of this stack (you should hopefully be able to do this from the dump). It might make sense for the patchpoint helper to try and minimize the work it does (eg return without trying to trigger jitting and/or transitioning) if there is very limited stack available.

Or if this is really mostly managed code, we might want to ask the libraries folks if it is normal that we see such deep recursion -- generally this is just asking for trouble.

amanasifkhalid commented 1 month ago

Would be good to look at the mixture of Tier0 vs OSR methods on the stack at the point of overflow.

Sure. Looking at the following group of recursive calls in the offending call stack, every method was in Tier0, and had patchpoints inserted (I tweaked JitDisasmSummary to indicate if Compiler::doesMethodHavePatchpoints is true for a method):

at System.Runtime.Serialization.ReflectionXmlReader.ReflectionReadMembers(System.Runtime.Serialization.XmlReaderDelegator, System.Runtime.Serialization.XmlObjectSerializerReadContext, System.Xml.XmlDictionaryString[], System.Xml.XmlDictionaryString[], System.Runtime.Serialization.DataContracts.ClassDataContract, System.Object ByRef)
at System.Runtime.Serialization.ReflectionReader.ReflectionReadClass(System.Runtime.Serialization.XmlReaderDelegator, System.Runtime.Serialization.XmlObjectSerializerReadContext, System.Xml.XmlDictionaryString[], System.Xml.XmlDictionaryString[], System.Runtime.Serialization.DataContracts.ClassDataContract)
at System.Runtime.Serialization.ReflectionXmlClassReader.ReflectionReadClass(System.Runtime.Serialization.XmlReaderDelegator, System.Runtime.Serialization.XmlObjectSerializerReadContext, System.Xml.XmlDictionaryString[], System.Xml.XmlDictionaryString[])
at System.Runtime.Serialization.DataContracts.ClassDataContract.ReadXmlValue(System.Runtime.Serialization.XmlReaderDelegator, System.Runtime.Serialization.XmlObjectSerializerReadContext)
at System.Runtime.Serialization.XmlObjectSerializerReadContext.ReadDataContractValue(System.Runtime.Serialization.DataContracts.DataContract, System.Runtime.Serialization.XmlReaderDelegator)
at System.Runtime.Serialization.XmlObjectSerializerReadContext.InternalDeserialize(System.Runtime.Serialization.XmlReaderDelegator, System.String, System.String, System.Type, System.Runtime.Serialization.DataContracts.DataContract ByRef)
at System.Runtime.Serialization.XmlObjectSerializerReadContext.InternalDeserialize(System.Runtime.Serialization.XmlReaderDelegator, Int32, System.RuntimeTypeHandle, System.String, System.String)
at System.Runtime.Serialization.ReflectionReader.ReflectionInternalDeserialize(System.Runtime.Serialization.XmlReaderDelegator, System.Runtime.Serialization.XmlObjectSerializerReadContext, System.Runtime.Serialization.DataContracts.CollectionDataContract, System.Type, System.String, System.String)
at System.Runtime.Serialization.ReflectionReader.ReflectionReadValue(System.Runtime.Serialization.XmlReaderDelegator, System.Runtime.Serialization.XmlObjectSerializerReadContext, System.Type, System.String, System.String, System.Runtime.Serialization.DataContracts.PrimitiveDataContract)
at System.Runtime.Serialization.ReflectionReader.ReflectionReadValue(System.Runtime.Serialization.XmlReaderDelegator, System.Runtime.Serialization.XmlObjectSerializerReadContext, System.Runtime.Serialization.DataContracts.DataMember, System.String)
at System.Runtime.Serialization.ReflectionReader.ReflectionReadMember(System.Runtime.Serialization.XmlReaderDelegator, System.Runtime.Serialization.XmlObjectSerializerReadContext, System.Runtime.Serialization.DataContracts.ClassDataContract, System.Object ByRef, Int32, System.Runtime.Serialization.DataContracts.DataMember[])

Also it looks like in the CI run above that the thread that hits overflow is in the patchpoint helper at the point it overflows.

Looking at this under WinDbg, execution seems to have been halted here. There is an inlining hint in the signature of BaseContract::DoChecks, but I'm guessing it wasn't inlined, and this function call exhausted any remaining stack space, triggering the stack overflow.

From offline discussion, the above code shape seems unlikely to try to use OSR outside of stress modes, and modifying the patchpoint helper to handle low-memory scenarios doesn't seem like the right solution. We don't think there's anything to "fix" here, though we don't seem to have any obvious options for opting this test out of OSR stress at the moment:

jkoritzinsky commented 1 month ago

@amanasifkhalid you can use [SkipOnCoreCLR("https://github.com/dotnet/runtime/issues/100246", RuntimeTestModes.AnyJitOptimizationStress)] on the test method to get the same behavior as JitOptimizationSensitive. Otherwise you could add an ActiveIssue attribute and have a property on the type that checks the relevant combination of environment variables for when to skip the test.

amanasifkhalid commented 1 month ago

@jkoritzinsky Thank you! SkipOnCoreCLR looks like it will work for our purposes. I don't think we need to be as heavy-handed as JitOptimizationSensitive -- maybe I can add a new RuntimeTestMode for random OSR, and exclude the offending test on that?

jkoritzinsky commented 1 month ago

We can add another mode, but that needs to be added in Arcade and in our XUnitWrapperGenerator. I think it would be easier/faster to add your own environment variable check with ConditionalFact.

JulieLeeMSFT commented 1 month ago

Failed in runtime-coreclr libraries-pgo/20240811.1

net9.0-windows-Release-x64-jitosr_stress_random-Windows.10.Amd64.Open