dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.62k stars 4.56k forks source link

Dump file not created when process crashes #104256

Open AArnott opened 2 weeks ago

AArnott commented 2 weeks ago

Description

I have a particularly annoying bug that frequently crashes the test runner in Azure Pipelines but thus far hasn't repro'd locally on a dev box. I need a dump of the process to investigate, but no dump in produced.

Here is a sample crash.

In the cases I have, the crashes happen on linux. What additional steps must I take in my pipeline to get a dump collected?

.NET SDK 8.0.300 .NET 8.0.6

Reproduction Steps

The command that runs the test includes --blame-crash and other switches. On Windows agents I also acquire procdump and set the PROCDUMP_PATH environment variable.

Expected behavior

A dump file that Azure Pipelines artifacts can collect.

Actual behavior

The dump fails to be collected.

[createdump] Gathering state for process 9534 dotnet
[createdump] Crashing thread 2616 signal 6 (0006)
[createdump] Target process is alive
The active test run was aborted. Reason: Test host process crashed : Unhandled exception. System.InvalidOperationException: The message pump in 'JsonRpcMessagePackLengthTests.JoinableTaskFactory_IntegrationBothSides_IntraProcess' isn't running any more.
   at Xunit.Sdk.UISynchronizationContext.Post(SendOrPostCallback d, Object state)
   at Microsoft.VisualStudio.Threading.JoinableTaskFactory.PostToUnderlyingSynchronizationContext(SendOrPostCallback callback, Object state)
   at Microsoft.VisualStudio.Threading.JoinableTaskFactory.PostToUnderlyingSynchronizationContextOrThreadPool(SingleExecuteProtector callback)
   at Microsoft.VisualStudio.Threading.JoinableTask.Post(SendOrPostCallback d, Object state, Boolean mainThreadAffinitized)
   at Microsoft.VisualStudio.Threading.JoinableTask.JoinableTaskSynchronizationContext.Post(SendOrPostCallback d, Object state)
   at System.Threading.Tasks.AwaitTaskContinuation.RunCallback(ContextCallback callback, Object state, Task& currentTask)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
   at System.Threading.QueueUserWorkItemCallback.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
[createdump] Problem suspending threads: ptrace(ATTACH, 13331) FAILED No such process (3)
[createdump] Failure took 9ms

Regression?

Pretty sure, yes, since I have multiple times verified that dump collection works in the past. Probably the last time I verified this was on .NET 6.

Known Workarounds

No response

Configuration

No response

Other information

No response

dotnet-policy-service[bot] commented 2 weeks ago

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.

tommcdon commented 2 weeks ago

@AArnott do you know how the dumps are being collected? From the log it seems that the app is not suspended while createdump was launched, and by the time createdump attempted to collect a dump the process had already exited. Are the .NET crash dump environment variables set and if yes, do you know how they are configured?

AArnott commented 2 weeks ago

I don't know how the dump is triggered. @nohwnd should know more. I just know that we request crash dumps when we spawn the test runner via dotnet test and a dump is triggered (for a failure that I would expect and need a dump collected). As for the process continuing to execute during the dump collection and how to stop that, I hope @nohwnd can comment.

nohwnd commented 2 weeks ago

Yes, we set the variables here, then we wait for the process to finishe, and we grab any dump files that were collected. https://github.com/microsoft/vstest/blob/main/src/Microsoft.TestPlatform.Extensions.BlameDataCollector/BlameCollector.cs#L157-L166

tommcdon commented 2 weeks ago

Thanks @nohwnd and @AArnott! This might be related to https://github.com/dotnet/runtime/issues/103000.
@mikem8361, assigning over to you for further investigation