dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.74k forks source link

CI failure in the test System.IO.Tests.FileStream_DeleteOnClose.OpenOrCreate_DeleteOnClose_UsableAsMutex #60147

Closed tarekgh closed 3 years ago

tarekgh commented 3 years ago

Description

Sorry if this is reported before but I couldn't find it.

https://dev.azure.com/dnceng/public/_build/results?buildId=1408804&view=logs&j=9d8498d2-c5b7-54d8-6df7-a2ce7e14e68c&t=34b459a3-e64e-5ccc-0adc-77ebca2601a8&l=76

https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-60140-merge-4e7f21a19bc8454287/System.IO.FileSystem.Tests/1/console.9ded26c7.log?sv=2019-07-07&se=2021-10-27T21%3A23%3A04Z&sr=c&sp=rl&sig=RgndSl%2BsYm6%2BKHG%2BoPaY29nuRCwkAjsZ04p3ZJTxQ58%3D

    System.IO.Tests.FileStream_DeleteOnClose.OpenOrCreate_DeleteOnClose_UsableAsMutex [FAIL]
      Test cancelled
      Expected: False
      Actual:   True
      Stack Trace:
        /_/src/libraries/System.IO.FileSystem/tests/FileStream/DeleteOnClose.cs(77,0): at System.IO.Tests.FileStream_DeleteOnClose.OpenOrCreate_DeleteOnClose_UsableAsMutex()
        --- End of stack trace from previous location ---

Reproduction Steps

It is CI failure

Expected behavior

System.IO.Tests.FileStream_DeleteOnClose.OpenOrCreate_DeleteOnClose_UsableAsMutex test succeed in CI runs

Actual behavior

Test Failing

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

Runfo Tracking Issue: System.IO.Tests.FileStream_DeleteOnClose.OpenOrCreate_DeleteOnClose_UsableAsMutex Build Definition Kind Run Name Console Core Dump Test Results Run Client
1418769 runtime PR 60357 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1418060 runtime PR 58434 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1417939 runtime PR 60075 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1417474 runtime PR 57434 net7.0-Linux-Debug-x64-Mono_release-(Centos.7.Amd64.Open)Ubuntu.1604.Amd64.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:centos-7-mlnet-helix-20210714125435-dde38af console.log runclient.py
1416382 runtime PR 58674 net7.0-Linux-Debug-x64-mono_interpreter_release-(Debian.10.Amd64.Open)Ubuntu.1804.Amd64.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:debian-10-helix-amd64-20210304164434-56c6673 console.log runclient.py
1415888 runtime PR 58932 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1412266 runtime PR 59672 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1411894 runtime PR 60216 net7.0-windows-Debug-x86-CoreCLR_checked-Windows.10.Amd64.Open console.log runclient.py
1409599 runtime Rolling net7.0-windows-Release-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
1408011 runtime PR 59930 net7.0-windows-Debug-x86-CoreCLR_release-Windows.7.Amd64.Open console.log runclient.py
Build Result Summary Day Hit Count Week Hit Count Month Hit Count
4 10 10
ghost commented 3 years ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Sorry if this is reported before but I couldn't find it. https://dev.azure.com/dnceng/public/_build/results?buildId=1408804&view=logs&j=9d8498d2-c5b7-54d8-6df7-a2ce7e14e68c&t=34b459a3-e64e-5ccc-0adc-77ebca2601a8&l=76 https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-60140-merge-4e7f21a19bc8454287/System.IO.FileSystem.Tests/1/console.9ded26c7.log?sv=2019-07-07&se=2021-10-27T21%3A23%3A04Z&sr=c&sp=rl&sig=RgndSl%2BsYm6%2BKHG%2BoPaY29nuRCwkAjsZ04p3ZJTxQ58%3D ``` System.IO.Tests.FileStream_DeleteOnClose.OpenOrCreate_DeleteOnClose_UsableAsMutex [FAIL] Test cancelled Expected: False Actual: True Stack Trace: /_/src/libraries/System.IO.FileSystem/tests/FileStream/DeleteOnClose.cs(77,0): at System.IO.Tests.FileStream_DeleteOnClose.OpenOrCreate_DeleteOnClose_UsableAsMutex() --- End of stack trace from previous location --- ``` ### Reproduction Steps It is CI failure ### Expected behavior System.IO.Tests.FileStream_DeleteOnClose.OpenOrCreate_DeleteOnClose_UsableAsMutex test succeed in CI runs ### Actual behavior Test Failing ### Regression? _No response_ ### Known Workarounds _No response_ ### Configuration _No response_ ### Other information _No response_
Author: tarekgh
Assignees: -
Labels: `area-System.IO`, `untriaged`
Milestone: -
danmoseley commented 3 years ago

Seems outcome of https://github.com/dotnet/runtime/issues/55327 ? @tmds @carlossanlop

danmoseley commented 3 years ago

This is Windows 7, if it matters: "Console log: 'System.IO.FileSystem.Tests' from job 4e7f21a1-9bc8-4542-872a-418edd6d138c workitem f8066dad-180e-4fdc-befd-f4ae375379fa (windows.7.amd64.open.rt) executed on machine a0019WJ"

carlossanlop commented 3 years ago

The code fix in PR https://github.com/dotnet/runtime/pull/55327 was intended for Unix specifically. The unit test is brand new, and it should be limited to only Unix platforms. @tmds, correct me if I'm wrong, but is the test supposed to run and pass on Windows as well?

To help unblock, I'll submit a PR to limit the test to run only on Unix. @tmds if the test is supposed to run on Windows too, please let me know, and we can discuss the appropriate code fix.

cc @adamsitnik @Jozkee

jozkee commented 3 years ago

@carlossanlop I think PR https://github.com/dotnet/runtime/pull/55327 is trying to emulate Windows behavior, therefore I think the test should run for both platforms.

tmds commented 3 years ago

Yes, it emulates windows behavior and should pass there.

Test cancelled

I should have made this say "Test timed out".

In the test, 50 threads try to exclusively open a file. The test passes when the file got opened 1000 times, or when it takes longer than 30 seconds. It's a concurrency stress test.

I'm fine if we skip this on Windows because it is validating a specific part of the Unix implementation (to be compatible with Windows).

Alternatively, we can increase the timeout, and maybe move it to outerloop if it is consider to take too much time.

danmoseley commented 3 years ago

Time taken is mostly relevant for local test runs. If a test takes a couple mins in the cloud it’s not noticeable. Perhaps this is a case where the test is normally fast but we run it so often in the cloud that we will cover the distribution. If that’s the problem then just increasing the timeout is normally what we do.