Closed sandreenko closed 1 month ago
PTAL @echesakovMSFT I believe you were working with this test.
PTAL @echesakovMSFT I believe you were working with this test.
No, @janvorli created this test
I had no idea the test was disabled. @sandreenko where have you seen it failing with timeout?
@janvorli it was in the same job, here the log https://dev.azure.com/dnceng/public/_build/results?buildId=923829&view=ms.vss-test-web.build-test-results-tab&runId=29324470&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab&resultId=102790
Note: the test is disabled again, I have disabled it in #46162
After I've fixed the lookup for the Main in the stack trace, the ARM64 legs are failing due to the fact that we don't have the probing helper change in yet. So for large frames, the failure point is too far from the SP, the failure is not recognized as stack overflow and the sigsegv alternate stack is not large enough to execute the full stack overflow reporting. The alternate stack is about two pages large while the stack overflow needs about 8 pages. We need to wait for the stack probing helper change to reenable the tests. The OSX / Linux x64 legs are failing due to timeouts caused most likely by the fact that our test infra generates core dumps for the processes that the test launches and that are expectedly failing with the stack overflow. I'll look into a way to prevent dumps generation for the secondary processes.
is the stack probing helper change noted above merged, or is it still pending?
is the stack probing helper change noted above merged, or is it still pending?
The change was postponed to 7.0.0 - we need to fix https://github.com/dotnet/runtime/issues/47810 first. Otherwise, enabling the stack probing helper introduces regressions.
Ok thanks for the update. Moving this to 7 as well.
Another set of tests have started to fail on CoreCLR Pri0 Runtime Tests Run Linux arm64 checked
.
Starting: profiler.eventpipe.XUnitWrapper (parallel test collections = on, max threads = 4)
profiler/eventpipe/eventpipe/eventpipe.sh [FAIL]
Unhandled exception. System.Exception: Profilee returned exit code 255 instead of expected exit code 100.
at Profiler.Tests.ProfilerTestRunner.FailFastWithMessage(String error)
at Profiler.Tests.ProfilerTestRunner.Run(String profileePath, String testName, Guid profilerClsid, String profileeArguments, ProfileeOptions profileeOptions, Dictionary`2 envVars, String reverseServerName, Boolean loadAsNotification, Int32 notificationCopies)
at EventPipeTests.EventPipe.Main(String[] args)
apply_reg_state: ip and cfa unchanged; stopping here (ip=0x7fb6cd6024)
/root/helix/work/workitem/e/profiler/eventpipe/eventpipe/eventpipe.sh: line 384: 47 Aborted (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"
Return code: 1
Should we disable all these tests until https://github.com/dotnet/runtime/issues/47810 and this issues are resolved?
@am11 I am not sure I understand connection between the failing profiler test and the issue with stack probing. Can you please elaborate?
@echesakovMSFT, ah ok. The error from libunwind is "apply_reg_state: ip and cfa unchanged;", so I thought this issue is tracking that from the logs in the top post. Is that error unrelated and do we need to track it?
@am11 Yes, it looks unrelated.
@JulieLeeMSFT, Egor had pointed to this https://github.com/dotnet/runtime/issues/47810 which needs to be merged before rechecking whether this test would pass. Is it planned for 7 (its currently marked as future)?
moving this to 8.
Looks like https://github.com/dotnet/runtime/issues/47810 is still not merged. @JulieLeeMSFT @BruceForstall assume this is not planned for 8?
@mangod9 Note that this test is disabled for all Linux, as well as for win-x86 (https://github.com/dotnet/runtime/issues/84911). Issue #47810 is an optimization for arm64 only. The arm64 stack probing issue is https://github.com/dotnet/runtime/issues/13519. There is no current plan to implement it. (cc @kunalspathak)
But, as mentioned, that should only affect arm64. All the other test failures of this test (non-arm64 Linux and win-x86) could be independently investigated.
@janvorli, would your recent exceptions work handle this case? If so we can move to 9
Looks like the disabled test was enabled as part of JanV's fix. Closing now.
@mangod9 my PR was closed, not merged in and the tests are still disabled. Based on @jkotas feedback, I wanted to make the fix more bullet proof, but then it went out of my radar with all the EH work. I am reopening the issue. I'll try to get back to fixing it soon.
oh sorry, missed that the PR was closed before merging. Assuming we can enable again in 9
It was disabled so we have not seen it, the log is:
note that on some archs it fails with a timeout.
AzDo example.