dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

Many x64 checked Ubuntu GCStress=0xC test failures #10719

Closed BruceForstall closed 4 years ago

BruceForstall commented 6 years ago

https://ci.dot.net/job/dotnet_coreclr/job/master/view/x64/job/jitstress/job/x64_checked_ubuntu_gcstress0xc_flow/91/

FAILED   - CoreMangLib/cti/system/collections/generic/dictionary/DictionaryICollectionCopyTo/DictionaryICollectionCopyTo.sh
FAILED   - CoreMangLib/cti/system/collections/generic/dictionary/DictionaryICollectionCopyTo2/DictionaryICollectionCopyTo2.sh
FAILED   - CoreMangLib/cti/system/collections/generic/dictionaryenumerator/DictEnumIDictEnumget_Key/DictEnumIDictEnumget_Key.sh
FAILED   - JIT/HardwareIntrinsics/X86/Avx2/ConvertToVector256_r/ConvertToVector256_r.sh
FAILED   - JIT/HardwareIntrinsics/X86/Avx/ConvertToVector_r/ConvertToVector_r.sh
FAILED   - JIT/HardwareIntrinsics/X86/Avx2/Avx2_ro/Avx2_ro.sh
FAILED   - JIT/HardwareIntrinsics/X86/Avx/Avx_ro/Avx_ro.sh
FAILED   - JIT/HardwareIntrinsics/X86/Avx/Permute2x128.Avx_r/Permute2x128.Avx_r.sh
FAILED   - JIT/HardwareIntrinsics/X86/Bmi1/Bmi1_r/Bmi1_r.sh
FAILED   - JIT/HardwareIntrinsics/X86/Fma_Vector256/Fma_r/Fma_r.sh
FAILED   - JIT/HardwareIntrinsics/X86/Fma_Vector128/Fma_r/Fma_r.sh
FAILED   - JIT/HardwareIntrinsics/X86/Sse2/Sse2_ro/Sse2_ro.sh
FAILED   - JIT/HardwareIntrinsics/X86/Sse41/ConvertToVector128_r/ConvertToVector128_r.sh
FAILED   - JIT/HardwareIntrinsics/X86/Sse41/Sse41_ro/Sse41_ro.sh
FAILED   - JIT/HardwareIntrinsics/X86/Avx/Avx_r/Avx_r.sh
FAILED   - JIT/HardwareIntrinsics/X86/Ssse3/Ssse3_r/Ssse3_r.sh
FAILED   - JIT/HardwareIntrinsics/X86/Sse/Sse_r/Sse_r.sh
FAILED   - JIT/HardwareIntrinsics/X86/Avx2/Avx2_r/Avx2_r.sh
FAILED   - JIT/HardwareIntrinsics/X86/Sse2/Sse2_r/Sse2_r.sh
FAILED   - JIT/Methodical/ELEMENT_TYPE_IU/_il_dbgu_fld/_il_dbgu_fld.sh
FAILED   - JIT/Methodical/fp/exgen/1000w1d_cs_d/1000w1d_cs_d.sh
FAILED   - JIT/Methodical/fp/exgen/10w5d_cs_d/10w5d_cs_d.sh
FAILED   - JIT/Methodical/fp/exgen/10w5d_cs_r/10w5d_cs_r.sh
FAILED   - JIT/Methodical/fp/exgen/1000w1d_cs_r/1000w1d_cs_r.sh
FAILED   - JIT/Methodical/xxobj/ldobj/_il_relldobj_V/_il_relldobj_V.sh
FAILED   - JIT/Regression/JitBlue/DevDiv_461649/DevDiv_461649/DevDiv_461649.sh
FAILED   - tracing/runtimeeventsource/runtimeeventsource/runtimeeventsource.sh
tannergooding commented 6 years ago

FYI. @fiigii, @CarolEidt for the HWIntrinsic failures...

tannergooding commented 6 years ago

The HardwareIntrinsics failures should be resolved by https://github.com/dotnet/coreclr/pull/19141

tannergooding commented 6 years ago

Was trying to debug this locally and I get Consistency check failed: Crst Level violation: Can't take level 9 lock CrstReJITSharedDomainTable because you already holding level 3 lock CrstGCCover

I would guess that either my machine is configured differently or I am doing something fundamentally wrong, given that this doesn't happen on the CI machines -- Is there, potentially, any difference that results from building the tests on the Linux box itself (rather than restoring from a zip)?

tannergooding commented 6 years ago

Looks like, however, that the Ubuntu failures are actually because of Error: Handle is not initialized.

This seems to be a fairly sporadic failure, however. For example:

JIT/HardwareIntrinsics/X86/Fma_Vector256/Fma_r/Fma_r.sh
               BEGIN EXECUTION
               /mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_gcstress0xc_tst/bin/tests/Linux.x64.Checked/Tests/Core_Root/corerun Fma_r.exe
               Running MultiplyAdd.Double test...
               Running MultiplyAdd.Single test...
               Error: Handle is not initialized.
               Running MultiplyAddNegated.Double test...
               Running MultiplyAddNegated.Single test...
               Running MultiplyAddSubtract.Double test...
               Running MultiplyAddSubtract.Single test...
               Running MultiplySubtract.Double test...
               Error: Handle is not initialized.
               Running MultiplySubtract.Single test...
               Running MultiplySubtractAdd.Double test...
               Running MultiplySubtractAdd.Single test...
               Running MultiplySubtractNegated.Double test...
               Running MultiplySubtractNegated.Single test...
               Expected: 100
               Actual: 0
               END EXECUTION - FAILED

The following tests all use the same template, SimpleTernOpTest.template:

MultiplyAdd.Double
MultiplyAdd.Single
MultiplyAddNegated.Double
MultiplyAddNegated.Single
MultiplySubtract.Double
MultiplySubtract.Single
MultiplySubtractNegated.Double
MultiplySubtractNegated.Single

However, only two (MultiplyAdd.Single and MultiplySubtract.Double) actually fail with the assert.

That particular error message is only thrown by GCHandle if the handle was never initialized or if it was freed. However, the GCHandles are created/pinned once in the constructor and are only freed on Dispose.

Given that these aren't also failing on Windows, however, I wouldn't think there is a problem in the managed code.

BruceForstall commented 6 years ago

fyi, I see that Crst failure in one lab run, https://ci.dot.net/job/dotnet_coreclr/job/master/view/x64/job/jitstress/job/x64_checked_ubuntu_gcstress0xc_flow/95/.

BruceForstall commented 6 years ago

@jkotas @kouvel Who from the VM side should be responsible for making GCStress "clean"?

jkotas commented 6 years ago

I do not think we have any catch all person for GCStress. Since the crash is about CrstReJITSharedDomainTable, it should go to folks who are hacking on ReJIT/tiered JIT.

AndyAyersMS commented 6 years ago

Seems like pretty much all GC stress is broken now... I'll dig in, but @noahfalk @kouvel can you help?

BruceForstall commented 6 years ago

On basic x86 GCStress=0xc: https://ci.dot.net/job/dotnet_coreclr/job/master/job/jitstress/job/x86_checked_windows_nt_gcstress0xc/

From the list of commits that contributed to that run, this looks suspiciously related: https://github.com/dotnet/coreclr/pull/19054

AndyAyersMS commented 6 years ago

Wonder if we should have a gc stress smoketest in the innerloop CI? Given that it requires a special package restore and also exercises otherwise uncovered paths in the runtime...

BruceForstall commented 6 years ago

Wonder if we should have a gc stress smoketest in the innerloop CI? Given that it requires a special package restore and also exercises otherwise uncovered paths in the runtime...

Seems like a great idea. (Care to open an issue?)

AndyAyersMS commented 6 years ago

dotnet/coreclr#19411

RussKeldorph commented 5 years ago

None of these tests are failing in the latest run: https://ci.dot.net/job/dotnet_coreclr/job/master/job/jitstress/job/x64_checked_ubuntu_gcstress0xc_tst/102/

I assume existing failures are (or will be) tracked by other issues.