Open jakobbotsch opened 1 year ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Author: | jakobbotsch |
---|---|
Assignees: | davidwrighton |
Labels: | `GCStress`, `area-CodeGen-coreclr`, `untriaged` |
Milestone: | - |
@jakobbotsch From looking at the logs, this appears to be caused by some sort of actual deadlock, not a test failure. In particular, the test is making good progress for a short period of time, and then stops making progress. I don't have easy access to the appropriate hardware to easily test this, but I think this needs to be investigated as a product failure, and not just increase the parallelism of the testing.
One thing to be aware of is that in the past, before my change, most hardware intrinsic tests were never run under GCStress. (The tests would mostly be skipped during GCStress execution.)
Another set of failures: https://dev.azure.com/dnceng-public/public/_build/results?buildId=94375&view=ms.vss-test-web.build-test-results-tab
Note that these are GCStress=3 failures, so are generally due to VM, not JIT (or timeout due to GCStress=3 being very slow).
One thing to be aware of is that in the past, before my change, most hardware intrinsic tests were never run under GCStress. (The tests would mostly be skipped during GCStress execution.)
Perhaps we should once again disable them under GCStress.
We currently have GCStressIncompatible. Perhaps we should also have GCStressIncompatible_3/GCStressIncompatible_C for more granularity. Actually, this should already be possible when using the xunit attributes. See https://github.com/dotnet/runtime/blob/main/docs/workflow/ci/disabling-tests.md and https://github.com/dotnet/arcade/blob/main/src/Microsoft.DotNet.XUnitExtensions/src/RuntimeTestModes.cs.
I don't really see what David was saying above, from the log indeed good progress is made, but we hit the 4 hour Helix timeout specified here after which the Helix job gets killed. I will disable it under gcstress again.
The hw intrinsics tests seem to time out on aforementioned platforms after #74886. Test run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=81893&view=ms.vss-test-web.build-test-results-tab&runId=1708824&resultId=220189&paneView=debug
Maybe the stripe count needs to be increased @davidwrighton?