dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.95k stars 4.65k forks source link

ARM64 GCStress=3 failures #9391

Closed BruceForstall closed 4 years ago

BruceForstall commented 6 years ago

There are many failures in current Windows ARM64 GCStress=3 runs:

Tests.lst=try-fault01.cmd_54, , , Smrt00000001, , # baseservices\exceptions\generics\try-fault01\try-fault01.cmd  CATS: EXPECTED_PASS;Pri1
Tests.lst=EnterExit08.cmd_178, , , Smrt00000001, , # baseservices\threading\generics\Monitor\EnterExit08\EnterExit08.cmd  CATS: Pri1;LONG_RUNNING;EXPECTED_PASS
Tests.lst=TryEnter05.cmd_188, , , Smrt00000001, , # baseservices\threading\generics\Monitor\TryEnter05\TryEnter05.cmd  CATS: Pri1;EXPECTED_PASS
Tests.lst=thread24.cmd_304, , , Smrt00000001, , # baseservices\threading\generics\WaitCallback\thread24\thread24.cmd  CATS: Pri1;EXPECTED_PASS
Tests.lst=CompareExchangeTClass.cmd_344, , , Smrt00000001, , # baseservices\threading\interlocked\compareexchange\CompareExchangeTClass\CompareExchangeTClass.cmd  CATS: EXPECTED_PASS;Pri1
Tests.lst=CompareExchangeTClass_1.cmd_345, , , Smrt00000001, , # baseservices\threading\interlocked\compareexchange\CompareExchangeTClass_1\CompareExchangeTClass_1.cmd  CATS: EXPECTED_PASS;Pri1
Tests.lst=threadstatic07.cmd_517, , , Smrt00000001, , # baseservices\threading\threadstatic\threadstatic07\threadstatic07.cmd  CATS: EXPECTED_PASS;Pri1
Tests.lst=13662-a.cmd_472, , , Smrt00000001, , # baseservices\threading\regressions\13662\13662-a\13662-a.cmd  CATS: EXPECTED_PASS;Pri1
Tests.lst=13662-b.cmd_473, , , Smrt00000001, , # baseservices\threading\regressions\13662\13662-b\13662-b.cmd  CATS: EXPECTED_PASS;Pri1
Tests.lst=437044.cmd_481, , , Smrt00000001, , # baseservices\threading\regressions\beta2\437044\437044.cmd  CATS: EXPECTED_PASS;Pri1
Tests.lst=ArraySort3.cmd_661, , , Smrt00000001, , # CoreMangLib\cti\system\array\ArraySort3\ArraySort3.cmd  CATS: Pri1;RT;LONG_RUNNING;EXPECTED_PASS
Tests.lst=ArraySort3b.cmd_662, , , Smrt00000001, , # CoreMangLib\cti\system\array\ArraySort3b\ArraySort3b.cmd  CATS: Pri1;RT;EXPECTED_PASS
Tests.lst=OpCodesConv_Ovf_U1.cmd_2003, , , Smrt00000001, , # CoreMangLib\cti\system\reflection\emit\opcodes\OpCodesConv_Ovf_U1\OpCodesConv_Ovf_U1.cmd  CATS: Pri1;RT;EXPECTED_PASS
Tests.lst=StringCompare9.cmd_2463, , , Smrt00000001, , # CoreMangLib\cti\system\string\StringCompare9\StringCompare9.cmd  CATS: Pri1;RT;EXPECTED_PASS
Tests.lst=IntConv.cmd_3595, , , Smrt00000001, , # JIT\CodeGenBringUpTests\IntConv\IntConv.cmd  CATS: JIT;EXPECTED_PASS;Pri1
Tests.lst=StringEquals6.cmd_2481, , , Smrt00000001, , # CoreMangLib\cti\system\string\StringEquals6\StringEquals6.cmd  CATS: Pri1;RT;EXPECTED_PASS
Tests.lst=castclass-generics045.cmd_5730, , , Smrt00000001, , # JIT\jit64\valuetypes\nullable\castclass\generics\castclass-generics045\castclass-generics045.cmd  CATS: EXPECTED_PASS;Pri1
Tests.lst=_il_dbginitializearray_enum.cmd_5848, , , Smrt00000001, , # JIT\Methodical\Arrays\misc\_il_dbginitializearray_enum\_il_dbginitializearray_enum.cmd  CATS: JIT;EXPECTED_PASS;Pri1
Tests.lst=_dbglcs2.cmd_5809, , , Smrt00000001, , # JIT\Methodical\Arrays\lcs\_dbglcs2\_dbglcs2.cmd  CATS: JIT;EXPECTED_PASS
Tests.lst=_rellcs2.cmd_5819, , , Smrt00000001, , # JIT\Methodical\Arrays\lcs\_rellcs2\_rellcs2.cmd  CATS: JIT;EXPECTED_PASS
Tests.lst=_speed_dbglcs2.cmd_5827, , , Smrt00000001, , # JIT\Methodical\Arrays\lcs\_speed_dbglcs2\_speed_dbglcs2.cmd  CATS: JIT;EXPECTED_PASS
Tests.lst=_speed_rellcs2.cmd_5835, , , Smrt00000001, , # JIT\Methodical\Arrays\lcs\_speed_rellcs2\_speed_rellcs2.cmd  CATS: JIT;EXPECTED_PASS
Tests.lst=Adams.cmd_7971, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\Adams\Adams\Adams.cmd  CATS: EXPECTED_PASS
Tests.lst=BenchMk2.cmd_7972, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\BenchMk2\BenchMk2\BenchMk2.cmd  CATS: EXPECTED_PASS
Tests.lst=BenchMrk.cmd_7973, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\BenchMrk\BenchMrk\BenchMrk.cmd  CATS: EXPECTED_PASS
Tests.lst=Bisect.cmd_7974, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\Bisect\Bisect\Bisect.cmd  CATS: EXPECTED_PASS
Tests.lst=DMath.cmd_7975, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\DMath\DMath\DMath.cmd  CATS: EXPECTED_PASS
Tests.lst=FFT.cmd_7976, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\FFT\FFT\FFT.cmd  CATS: EXPECTED_PASS
Tests.lst=InProd.cmd_7977, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\InProd\InProd\InProd.cmd  CATS: EXPECTED_PASS
Tests.lst=InvMt.cmd_7978, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\InvMt\InvMt\InvMt.cmd  CATS: EXPECTED_PASS
Tests.lst=LLoops.cmd_7979, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\LLoops\LLoops\LLoops.cmd  CATS: EXPECTED_PASS
Tests.lst=Lorenz.cmd_7980, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\Lorenz\Lorenz\Lorenz.cmd  CATS: EXPECTED_PASS
Tests.lst=MatInv4.cmd_7981, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\MatInv4\MatInv4\MatInv4.cmd  CATS: EXPECTED_PASS
Tests.lst=NewtE.cmd_7982, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\NewtE\NewtE\NewtE.cmd  CATS: EXPECTED_PASS
Tests.lst=NewtR.cmd_7983, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\NewtR\NewtR\NewtR.cmd  CATS: EXPECTED_PASS
Tests.lst=Regula.cmd_7984, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\Regula\Regula\Regula.cmd  CATS: EXPECTED_PASS
Tests.lst=Romber.cmd_7985, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\Romber\Romber\Romber.cmd  CATS: EXPECTED_PASS
Tests.lst=Secant.cmd_7986, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\Secant\Secant\Secant.cmd  CATS: EXPECTED_PASS
Tests.lst=Simpsn.cmd_7987, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\Simpsn\Simpsn\Simpsn.cmd  CATS: EXPECTED_PASS
Tests.lst=SqMtx.cmd_7988, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\SqMtx\SqMtx\SqMtx.cmd  CATS: EXPECTED_PASS
Tests.lst=Trap.cmd_7989, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\Trap\Trap\Trap.cmd  CATS: EXPECTED_PASS
Tests.lst=Whetsto.cmd_7990, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchF\Whetsto\Whetsto\Whetsto.cmd  CATS: EXPECTED_PASS
Tests.lst=8Queens.cmd_7991, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\8Queens\8Queens\8Queens.cmd  CATS: EXPECTED_PASS
Tests.lst=Ackermann.cmd_7992, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\Ackermann\Ackermann\Ackermann.cmd  CATS: EXPECTED_PASS
Tests.lst=AddArray.cmd_7993, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\AddArray\AddArray\AddArray.cmd  CATS: EXPECTED_PASS
Tests.lst=AddArray2.cmd_7994, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\AddArray2\AddArray2\AddArray2.cmd  CATS: EXPECTED_PASS
Tests.lst=Array1.cmd_7995, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\Array1\Array1\Array1.cmd  CATS: EXPECTED_PASS
Tests.lst=Array2.cmd_7996, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\Array2\Array2\Array2.cmd  CATS: EXPECTED_PASS
Tests.lst=BenchE.cmd_7997, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\BenchE\BenchE\BenchE.cmd  CATS: EXPECTED_PASS
Tests.lst=BubbleSort.cmd_7998, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\BubbleSort\BubbleSort\BubbleSort.cmd  CATS: EXPECTED_PASS
Tests.lst=BubbleSort2.cmd_7999, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\BubbleSort2\BubbleSort2\BubbleSort2.cmd  CATS: EXPECTED_PASS
Tests.lst=CSieve.cmd_8000, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\CSieve\CSieve\CSieve.cmd  CATS: EXPECTED_PASS
Tests.lst=Fib.cmd_8001, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\Fib\Fib\Fib.cmd  CATS: EXPECTED_PASS
Tests.lst=HeapSort.cmd_8002, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\HeapSort\HeapSort\HeapSort.cmd  CATS: EXPECTED_PASS
Tests.lst=IniArray.cmd_8003, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\IniArray\IniArray\IniArray.cmd  CATS: EXPECTED_PASS
Tests.lst=LogicArray.cmd_8004, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\LogicArray\LogicArray\LogicArray.cmd  CATS: EXPECTED_PASS
Tests.lst=Midpoint.cmd_8005, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\Midpoint\Midpoint\Midpoint.cmd  CATS: EXPECTED_PASS
Tests.lst=MulMatrix.cmd_8006, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\MulMatrix\MulMatrix\MulMatrix.cmd  CATS: EXPECTED_PASS
Tests.lst=NDhrystone.cmd_8007, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\NDhrystone\NDhrystone\NDhrystone.cmd  CATS: EXPECTED_PASS
Tests.lst=Permutate.cmd_8008, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\Permutate\Permutate\Permutate.cmd  CATS: EXPECTED_PASS
Tests.lst=Pi.cmd_8009, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\Pi\Pi\Pi.cmd  CATS: EXPECTED_PASS
Tests.lst=Puzzle.cmd_8010, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\Puzzle\Puzzle\Puzzle.cmd  CATS: EXPECTED_PASS
Tests.lst=QuickSort.cmd_8011, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\QuickSort\QuickSort\QuickSort.cmd  CATS: EXPECTED_PASS
Tests.lst=TreeInsert.cmd_8012, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\TreeInsert\TreeInsert\TreeInsert.cmd  CATS: EXPECTED_PASS
Tests.lst=TreeSort.cmd_8013, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\TreeSort\TreeSort\TreeSort.cmd  CATS: EXPECTED_PASS
Tests.lst=XposMatrix.cmd_8014, , , Smrt00000001, , # JIT\Performance\CodeQuality\BenchI\XposMatrix\XposMatrix\XposMatrix.cmd  CATS: EXPECTED_PASS
Tests.lst=b25701.cmd_8192, , , Smrt00000001, , # JIT\Regression\CLR-x86-JIT\V1-M09.5-PDC\b25701\b25701\b25701.cmd  CATS: JIT;EXPECTED_PASS
Tests.lst=b41990.cmd_8396, , , Smrt00000001, , # JIT\Regression\CLR-x86-JIT\V1-M11-Beta1\b41990\b41990\b41990.cmd  CATS: EXPECTED_PASS
Tests.lst=_speed_dbgstress1.cmd_7562, , , Smrt00000001, , # JIT\Methodical\refany\_speed_dbgstress1\_speed_dbgstress1.cmd  CATS: EXPECTED_PASS
Tests.lst=_speed_relvirtcall.cmd_7569, , , Smrt00000001, , # JIT\Methodical\refany\_speed_relvirtcall\_speed_relvirtcall.cmd  CATS: EXPECTED_PASS
Tests.lst=b80764.cmd_8762, , , Smrt00000001, , # JIT\Regression\CLR-x86-JIT\V1-M12-Beta2\b80764\b80764\b80764.cmd  CATS: EXPECTED_PASS;Pri1
Tests.lst=Generated1272.cmd_11380, , , Smrt00000001, , # Loader\classloader\TypeGeneratorTests\TypeGeneratorTest1272\Generated1272\Generated1272.cmd  CATS: EXPECTED_PASS;NEW;Pri1
Tests.lst=NoGC.cmd_11700, , , Smrt00000001, , # GC\API\NoGCRegion\NoGC\NoGC.cmd  CATS: EXPECTED_PASS;NEW

https://ci.dot.net/job/dotnet_coreclr/job/master/job/jitstress/job/arm64_cross_checked_windows_nt_gcstress0x3_tst/23/artifact/bin/tests/Windows_NT.arm64.Checked/Smarty.run.0/Smarty.0.fail.smrt/*view*/

While it appears it has never been clean, it also appears there has been a recent regression.

The CI job is here:

https://ci.dot.net/job/dotnet_coreclr/job/master/job/jitstress/job/arm64_cross_checked_windows_nt_gcstress0x3_flow/

BruceForstall commented 6 years ago

cc @dotnet/arm64-contrib

@sdmaclea Do you see GCStress=3 failures on Linux?

sdmaclea commented 6 years ago

@BruceForstall I'll trigger a run

sdmaclea commented 6 years ago

@BruceForstall I ran cced4d7.

A lot of the failing tests you listed are passing on Arm64 Ubuntu.

I see a few failures I haven't seen before (although I typically run gcStress==0xf).

baseservices/threading/generics/WaitCallback/thread10/thread10.sh
JIT/Performance/CodeQuality/BenchmarksGame/regex-redux/regex-redux-1/regex-redux-1.sh

I run on QDT's Amberwing with a 300" timeout.

I generally do not run tests in the testUnsupportedOutsideWindows or tests which fail on x64 ubuntu.

I also have a few Arm64 fails disabled for various reasons

// Optimization failure, No exception thrown
// https://github/com/dotnet/coreclr/issues/8648
// https://github/com/dotnet/coreclr/issues/10111
JIT/Regression/JitBlue/DevDiv_359736/DevDiv_359736_do/DevDiv_359736_do.sh
JIT/Regression/JitBlue/DevDiv_359736/DevDiv_359736_ro/DevDiv_359736_ro.sh

//
// Intermittent failures
//

JIT/CheckProjects/CheckProjects/CheckProjects.sh
JIT/Performance/CodeQuality/Roslyn/CscBench/CscBench.sh
JIT/Performance/CodeQuality/Serialization/Serialize/Serialize.sh
Regressions/coreclr/0080/delete_next_card_table/delete_next_card_table.sh

// Segfault
JIT/Generics/Coverage/chaos56200037cs/chaos56200037cs.sh
JIT/Generics/Coverage/chaos65204782cs_o/chaos65204782cs_o.sh
JIT/Generics/Coverage/chaos65204782cs/chaos65204782cs.sh

// m_alignpad == 0
baseservices/threading/regressions/beta2/437017/437017.sh

//
// GC Stress Fails
//

// Heap contamination detected!
// https://github.com/dotnet/coreclr/issues/14299
tracing/eventpipesmoke/eventpipesmoke/eventpipesmoke.sh
tracing/eventsourcesmoke/eventsourcesmoke/eventsourcesmoke.sh

// Unhandled Exception: System.IndexOutOfRangeException: Index was outside the bounds of the array.
Regressions/coreclr/1514/InterlockExchange/InterlockExchange.sh

// !CREATE_CHECK_STRING(pMT && pMT->Validate())
baseservices/threading/regressions/13662/13662-simple/13662-simple.sh
JIT/Generics/Exceptions/general_struct_instance01/general_struct_instance01.sh

// Segfault
baseservices/threading/paramthreadstart/ThreadStartNeg1/ThreadStartNeg1.sh
JIT/Performance/CodeQuality/Span/SpanBench/SpanBench.sh

I also do not run a few long running tests by default

// Long running
GC/Features/LOHCompaction/lohcompactapi/lohcompactapi.sh
GC/Scenarios/muldimjagary/muldimjagary/muldimjagary.sh
JIT/jit64/opt/cse/HugeArray1/HugeArray1.sh
JIT/jit64/opt/cse/HugeField2/HugeField2.sh
JIT/jit64/opt/cse/hugeSimpleExpr1/hugeSimpleExpr1.sh
JIT/jit64/opt/rngchk/RngchkStress3/RngchkStress3.sh
JIT/Methodical/tailcall_v4/hijacking/hijacking.sh
CoreMangLib/cti/system/array/ArraySort12/ArraySort12.sh

// Long running (debug)
CoreMangLib/cti/system/string/StringConcat8/StringConcat8.sh
GC/Features/HeapExpansion/bestfit-threaded/bestfit-threaded.sh
GC/Features/HeapExpansion/expandheap/expandheap.sh
GC/Regressions/v2.0-rtm/494226/494226/494226.sh
GC/Scenarios/ServerModel/servermodel/servermodel.sh
JIT/jit64/opt/cse/HugeArray1/HugeArray1.sh
JIT/jit64/opt/cse/HugeField2/HugeField2.sh
JIT/Regression/VS-ia64-JIT/V1.2-M02/b28158/b28158/b28158.sh

// Medium running (checked)
baseservices/threading/generics/TimerCallback/tighttimercallback/tighttimercallback.sh
baseservices/threading/interlocked/exchange/ExchangeTClass/ExchangeTClass.sh
baseservices/threading/interlocked/compareexchange/CompareExchangeTClass_1/CompareExchangeTClass_1.sh
baseservices/threading/monitor/tryenter/longtimeout/longtimeout.sh
CoreMangLib/cti/system/string/StringConcat4/StringConcat4.sh
GC/Features/HeapExpansion/bestfit-finalize/bestfit-finalize.sh
JIT/Performance/CodeQuality/Bytemark/Bytemark/Bytemark.sh

// Long running GC tests
baseservices/compilerservices/dynamicobjectproperties/Dev10_535767/Dev10_535767.sh
baseservices/threading/generics/Monitor/EnterExit12/EnterExit12.sh
baseservices/threading/generics/Monitor/EnterExit14/EnterExit14.sh
baseservices/threading/generics/Monitor/TryEnter03/TryEnter03.sh
baseservices/threading/generics/Monitor/TryEnter06/TryEnter06.sh
baseservices/threading/interlocked/compareexchange/CompareExchangeTClass/CompareExchangeTClass.sh
baseservices/threading/interlocked/compareexchange/CompareExchangeTClass_1/CompareExchangeTClass_1.sh
baseservices/threading/regressions/13662/13662-a/13662-a.sh
baseservices/threading/regressions/13662/13662-b/13662-b.sh
baseservices/threading/regressions/269336/objmonhelper/objmonhelper.sh
baseservices/threading/regressions/beta2/437044/437044.sh
CoreMangLib/cti/system/array/ArrayIndexOf1b/ArrayIndexOf1b.sh
CoreMangLib/cti/system/array/ArrayLastIndexOf1b/ArrayLastIndexOf1b.sh
CoreMangLib/cti/system/array/ArraySort11/ArraySort11.sh
CoreMangLib/cti/system/array/ArraySort12/ArraySort12.sh
CoreMangLib/cti/system/array/ArraySort2/ArraySort2.sh
CoreMangLib/cti/system/array/ArraySort3/ArraySort3.sh
CoreMangLib/cti/system/array/ArraySort3b/ArraySort3b.sh
CoreMangLib/cti/system/array/ArraySort4/ArraySort4.sh
CoreMangLib/cti/system/array/ArraySort5/ArraySort5.sh
CoreMangLib/cti/system/convert/ConvertFromBase64String/ConvertFromBase64String.sh
CoreMangLib/cti/system/gc/GCCollect/GCCollect.sh
CoreMangLib/cti/system/string/StringCompare9/StringCompare9.sh
CoreMangLib/cti/system/string/StringConcat4/StringConcat4.sh
CoreMangLib/cti/system/string/StringEquals6/StringEquals6.sh
GC/API/NoGCRegion/NoGC/NoGC.sh
GC/Coverage/LargeObjectAlloc2/LargeObjectAlloc2.sh
JIT/Directed/tailcall/tailcall/tailcall.sh
JIT/jit64/opt/cse/HugeArray/HugeArray.sh
JIT/jit64/opt/cse/hugeSimpleExpr1/hugeSimpleExpr1.sh
JIT/jit64/regress/vsw/539509/test1/test1.sh
JIT/Methodical/Arrays/lcs/_dbglcs2/_dbglcs2.sh
JIT/Methodical/Arrays/lcs/_rellcs2/_rellcs2.sh
JIT/Methodical/Arrays/lcs/_speed_dbglcs2/_speed_dbglcs2.sh
JIT/Methodical/Arrays/lcs/_speed_rellcs2/_speed_rellcs2.sh
JIT/Methodical/refany/_speed_dbgstress1/_speed_dbgstress1.sh
JIT/Methodical/refany/_speed_relvirtcall/_speed_relvirtcall.sh
JIT/Methodical/tailcall/_il_dbgreference_i/_il_dbgreference_i.sh
JIT/Methodical/tailcall/_il_relreference_i/_il_relreference_i.sh
JIT/Regression/VS-ia64-JIT/V2.0-Beta2/b311420/b311420/b311420.sh
sdmaclea commented 6 years ago

@BruceForstall Looks like dotnet/coreclr#15231 renamed a bunch of tests, so maybe the ARM64 windows test lists need to be regenerated?

BruceForstall commented 6 years ago

Yes, that's it (or, that's the big regression). In fact, all arm/arm64 testing is broken because of that.

sdmaclea commented 6 years ago

@BruceForstall Should this be closed. While there are gcStress failures, this Isse does not seem particularly useful

BruceForstall commented 6 years ago

I'd still like to track failures in the GCStress=3 job. It seems like this is as good a place as any.

The last run: https://ci.dot.net/job/dotnet_coreclr/job/master/job/jitstress/job/arm64_cross_checked_windows_nt_gcstress0x3_tst/40/artifact/bin/tests/Windows_NT.arm64.Checked/Smarty.run.0/Smarty.0.fail.smrt/*view*/

The last failures:

[TESTS]
Tests.lst=CompareExchangeTClass_1.cmd_345, , , Smrt00000001, , # baseservices\threading\interlocked\compareexchange\CompareExchangeTClass_1\CompareExchangeTClass_1.cmd  CATS: EXPECTED_PASS;Pri1
Tests.lst=CompareExchangeTClass.cmd_344, , , Smrt00000001, , # baseservices\threading\interlocked\compareexchange\CompareExchangeTClass\CompareExchangeTClass.cmd  CATS: EXPECTED_PASS;Pri1
Tests.lst=ArraySort3b.cmd_662, , , Smrt00000001, , # CoreMangLib\cti\system\array\ArraySort3b\ArraySort3b.cmd  CATS: Pri1;RT;EXPECTED_PASS
Tests.lst=StringCompare9.cmd_2463, , , Smrt00000001, , # CoreMangLib\cti\system\string\StringCompare9\StringCompare9.cmd  CATS: Pri1;RT;EXPECTED_PASS
Tests.lst=StringEquals6.cmd_2481, , , Smrt00000001, , # CoreMangLib\cti\system\string\StringEquals6\StringEquals6.cmd  CATS: Pri1;RT;EXPECTED_PASS
Tests.lst=_dbglcs2.cmd_5809, , , Smrt00000001, , # JIT\Methodical\Arrays\lcs\_dbglcs2\_dbglcs2.cmd  CATS: JIT;EXPECTED_PASS
Tests.lst=_rellcs2.cmd_5819, , , Smrt00000001, , # JIT\Methodical\Arrays\lcs\_rellcs2\_rellcs2.cmd  CATS: JIT;EXPECTED_PASS
Tests.lst=_speed_dbglcs2.cmd_5827, , , Smrt00000001, , # JIT\Methodical\Arrays\lcs\_speed_dbglcs2\_speed_dbglcs2.cmd  CATS: JIT;EXPECTED_PASS
Tests.lst=_speed_rellcs2.cmd_5835, , , Smrt00000001, , # JIT\Methodical\Arrays\lcs\_speed_rellcs2\_speed_rellcs2.cmd  CATS: JIT;EXPECTED_PASS
Tests.lst=_speed_dbgstress1.cmd_7562, , , Smrt00000001, , # JIT\Methodical\refany\_speed_dbgstress1\_speed_dbgstress1.cmd  CATS: EXPECTED_PASS
Tests.lst=_speed_relvirtcall.cmd_7569, , , Smrt00000001, , # JIT\Methodical\refany\_speed_relvirtcall\_speed_relvirtcall.cmd  CATS: EXPECTED_PASS

I haven't investigated these at all, so perhaps there are timeouts here, so some jobs (maybe all?) should be marked "long running" or simply disabled from GCStress.

janvorli commented 6 years ago

I've just checked - all of these failures are timeouts - the tests took more than 10 minutes to run. Exactly the same set of tests fails due to the same reason on ARM. I have ran the _dbglcs2.cmd on ARM yesterday and it actually completed after a long time.

sdmaclea commented 6 years ago

@janvorli I see a few sporadic/intermittent gcStress failures on test like

baseservices/threading/interlocked/compareexchange/CompareExchangeTClass_1/CompareExchangeTClass_1.sh
baseservices/threading/interlocked/compareexchange/compareexchange1_cti/compareexchange1_cti.sh
baseservices/threading/generics/Monitor/TryEnter06/TryEnter06.sh

I have tried to pin them down, but haven't been able to.

sdmaclea commented 6 years ago

Any advice on how to find the GC hole?

sdmaclea commented 6 years ago
// Assert failure(PID 42224 [0x0000a4f0], Thread: 42648 [0xa698]): Consistency check failed: hit privileged instruction!FAILED: !ExecutionManager::IsManagedCode(GetIP(pContext))
// 2242 GCStress=0x4
baseservices/threading/generics/Monitor/EnterExit02/EnterExit02.sh
// 2246 GCStress=0x4
baseservices/threading/generics/Monitor/EnterExit04/EnterExit04.sh
// 2250 GCStress=0x4
baseservices/threading/generics/Monitor/EnterExit06/EnterExit06.sh
// 2241 GCStress=0x4
baseservices/threading/generics/Monitor/EnterExit08/EnterExit08.sh

// Assert failure(PID 10936 [0x00002ab8], Thread: 10936 [0x2ab8]): Consistency check failed: Crst Level violation: Can't take level 0 lock CrstExecuteManRangeLock because you already holding level 0 lock CrstLeafLock
// 2232, 2234-2236 GCStress=0x2
baseservices/exceptions/regressions/Dev11/154243/dynamicmethodliveness/dynamicmethodliveness.sh
janvorli commented 6 years ago

Consistency check failed: Crst Level violation: Can't take level 0 lock CrstExecuteManRangeLock because you already holding level 0 lock CrstLeafLock

I can see this one happening on ARM in the lab tests too, but in a different test: JIT\Performance\CodeQuality\BenchmarksGame\regex-redux\regex-redux-1\regex-redux-1.cmd

I wasn't able to repro it locally yet. The culprit should be visible from the callstack if we can get it. But it seems strange. The CrstExecuteManRangeLock is taken in a single place - in ExecutionManager::Init(). That one is called only from EEStartupHelper, which in turn is called only from EEStartup which in turn is called only from EnsureEEStarted. This one is called from three different places, so maybe the GC stress somehow causes it being called with the CrstLeafLock taken.

Consistency check failed: hit privileged instruction!FAILED: !ExecutionManager::IsManagedCode(GetIP(pContext))

This means that we've hit an illegal instruction and so we assume it is the illegal instruction that we inject for triggering GC at the specific location, but the OnGcCoverageInterrupt returned FALSE. That can happen in case the failure address was not in managed code or the MethodDesc of the managed function in which we've hit this didn't have m_GcCover set.

janvorli commented 6 years ago

Any advice on how to find the GC hole?

The best tool is the stress log. You'll need lldb with libsosplugin to dump and analyze the stress log since it is stored in an in-memory buffer. This enables stress log with the largest buffer possible: COMPlus_StressLog=1 COMPlus_LogLevel=7 COMPlus_LogFacility=80103 COMPlus_StressLogSize=2000000 COMPlus_TotalStressLogSize=40000000 I'd also disabled concurrent GC and server GC - CORECLR_CONCURRENT_GC=0 CORECLR_SERVER_GC=0

janvorli commented 6 years ago

The libsosplugin then provides the following commands:

Please note that the history is limited by the size of the stresslog buffer, so it may happen that the evidence for the issue is gone from it.

janvorli commented 6 years ago

Let me give you a quick example. You hit a spot when you get an assert indicating that an object address is incorrect. E.g. the method table check fails. You run sos HistObjFind on that object address and you get roots that were pointing to that object in the past. Then you run sos HistRoot command for each of the root addresses the sos HistObjFind has dumped. That will show the history of object addresses the root was pointing to, the first one being the current location of the object. You can also use sos HistObj to see how the specific object was moving through memory during recent GCs. The GC hole is caused by a root the GC didn't know about. So now the tedious part of the work comes in. You need to look at the call stack to see where the references to this object were stored and which one of them was not dumped as a root by the previous commands. Then you'd check the GC info for that location and see if there is a problem. The hole can also be in native code where we would hold a reference without GC protecting it. And finally, I've seen cases when the GC hole didn't result in a wrong object pointer, but in an object pointer pointing to a completely different object that by a chance got relocated to the original location of the object that the affected reference is pointing to. Then it depends on the code of the application how soon, if at all, such an issue surfaces. For example, say you have a code that prints an object using Console.WriteLine. If that object reference is incorrect due to a gc hole, but still pointing to a valid object, the WriteLine will happily call ToString on it and write it. No crash would occur, the output would be wrong though.

Also sos VerifyHeap is useful to find GC heap corruptions.

sdmaclea commented 6 years ago

@janvorli Glad I asked. Last time SOS wasn't working so I was wading through GC log files manually. Very tedious. Hopefully this will be better.

BruceForstall commented 6 years ago

The failures in the latest run (Windows arm64 GCStress=3) (https://ci.dot.net/job/dotnet_coreclr/job/master/view/arm64/job/jitstress/job/arm64_cross_checked_windows_nt_gcstress0x3_tst/54/):

Tests.lst=ArraySort3b.cmd_662, , , Smrt00000001, , # CoreMangLib\cti\system\array\ArraySort3b\ArraySort3b.cmd  CATS: Pri1;RT;EXPECTED_PASS
Tests.lst=StringCompare9.cmd_2463, , , Smrt00000001, , # CoreMangLib\cti\system\string\StringCompare9\StringCompare9.cmd  CATS: Pri1;RT;EXPECTED_PASS
Tests.lst=StringEquals6.cmd_2481, , , Smrt00000001, , # CoreMangLib\cti\system\string\StringEquals6\StringEquals6.cmd  CATS: Pri1;RT;EXPECTED_PASS
Tests.lst=_speed_dbgstress1.cmd_7562, , , Smrt00000001, , # JIT\Methodical\refany\_speed_dbgstress1\_speed_dbgstress1.cmd  CATS: EXPECTED_PASS
Tests.lst=b426654.cmd_8926, , , Smrt00000001, , # JIT\Regression\CLR-x86-JIT\V2.0-Beta2\b426654\b426654\b426654.cmd  CATS: EXPECTED_PASS

are all timeouts.