dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.36k stars 4.74k forks source link

Process.WaitForExit() deadlock if there are childprocess still running. #51277

Open Kuinox opened 3 years ago

Kuinox commented 3 years ago

Description

This bug was first reported in the msbuild repo but I believe it's a bug in the Process class itself and not in msbuild.
The issue is that WaitForExit() will synchronously wait for the stderr/stdout to complete, the summoned process has exited, but it has childs process still running (the nodereuse processes in msbuild case). In this case, it looks like that the stderr/stdout pipes wont close. So we are stuck in the waits there:
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Windows.cs#L186

We hit this code path because the timeout is Timeout.Infinite. There is hope for a beginning of a workaround: We can avoid the deadlock if we call WaitForExit(int milliseconds). But there is an issue in this case

When standard output has been redirected to asynchronous event handlers, it is possible that output processing will not have completed when this method returns. To ensure that asynchronous event handling has been completed, call the WaitForExit() overload that takes no parameter after receiving a true from this overload.

Now there is no way to retrieve the data in the stdout/stderr without a concurrency issue that would cause data to be lost in case of a race condition.

Other information

This bug happens on Windows but by looking at the *nix implementation of WaitForExitCore, it may be affected too (I have no idea how the lifecycle of the pipes are supposed to work on linux/windows, so it's probably wrong)

Edit: I tested on WSL with ubuntu and it doesn't have a problem, so it's a Windows only problem.

ghost commented 3 years ago

Tagging subscribers to this area: @carlossanlop See info in area-owners.md if you want to be subscribed.

Issue Details
### Description [This bug](https://github.com/dotnet/msbuild/issues/2981) was first reported in the msbuild repo but I believe it's a bug in the Process class itself and not in msbuild. The issue is that `WaitForExit()` will synchronously wait for the stderr/stdout to complete, the summoned process has exited, but it has childs process still running (the nodereuse processes in msbuild case). In this case, it looks like that the stderr/stdout pipes wont close. So we are stuck in the waits there: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Windows.cs#L186 We hit this code path because the timeout is `Timeout.Infinite`. There is hope for a beginning of a workaround: We can avoid the deadlock if we call `WaitForExit(int milliseconds)`. But there is an issue in this case > When standard output has been redirected to asynchronous event handlers, it is possible that output processing will not have completed when this method returns. To ensure that asynchronous event handling has been completed, call the WaitForExit() overload that takes no parameter after receiving a true from this overload. Now there is no way to retrieve the data in the stdout/stderr without a concurrency issue that would cause data to be lost in case of a race condition. ### Other information This bug happens on Windows but by looking at the [*nix implementation](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Unix.cs#L213) of `WaitForExitCore`, it may be affected too (I have no idea how the lifecycle of the pipes are supposed to work on linux/windows, so it's probably wrong)
Author: Kuinox
Assignees: -
Labels: `area-System.Diagnostics.Process`, `untriaged`
Milestone: -
Kuinox commented 3 years ago

I wrote a workaround thats allow to retrieve the stdout/stderr of a process that may cause a deadlock like this.
Only time will tell if it really fixed the deadlock without dataloss (I don't know what will happen if we receive a lot of data when the program exit , and the CTS of the AsyncStreamReader is triggered.).

process.WaitForExit( int.MaxValue ); //First we need for the program to exit.
// Here the program exited, but background may still pump messages
ReflectionHack( process ); // This will cancel the reads in the pipes background loop.
process.WaitForExit(); // This allow to wait for the 2 pipes async to finish looping and flushing the last messages.
// Here you shouldn't receive any message.

/// <summary>
/// This method shut down things in <see cref="Process"/>. Call it when you know that the process has exited.
/// Sometimes the <see cref="Process"/> class deadlock itself.
/// See this for more info: https://github.com/dotnet/runtime/issues/51277
/// </summary>
/// <param name="process"></param>
static void ReflectionHack( Process process )
{
    const BindingFlags bindingFlags = BindingFlags.NonPublic | BindingFlags.Instance;
    FieldInfo? outputField = typeof( Process ).GetField( "_output", bindingFlags );
    FieldInfo? errorField = typeof( Process ).GetField( "_error", bindingFlags );
    ((IDisposable)outputField!.GetValue( process )!).Dispose();
    ((IDisposable)errorField!.GetValue( process )!).Dispose();
}

Sadly the cleanest way I found to work around this issue is with this little reflection hack.

Edit: I now think that this reflection hack may lead to data loss at the end of the process.

Kuinox commented 2 years ago

Hello, By lookin up for the code of the new WaitForExitAsync, I think it will be prone to the same issue. Can this issue be looked up ?

Kuinox commented 2 years ago

I made a reproduction here: https://github.com/Kuinox/DotnetProcessBugRepro (Run ProcessStarter)

Kuinox commented 2 years ago

I found a SO post from 2016 about this bug: https://stackoverflow.com/questions/26713373/process-waitforexit-doesnt-return-even-though-process-hasexited-is-true

Evengard commented 9 months ago

https://github.com/dotnet/runtime/pull/79817 Did that solve the issue I wonder?

Kuinox commented 9 months ago

79817 Did that solve the issue I wonder?

I just tested on WSL-ubuntu and it doesn't have the problem at all, so unix was just unaffected.
The patch is also Unix specific so I don't think it solved the issue.
I also upgraded my reproduction repo from 5.0 to 8.0 and it still deadlock itself in 8.0
https://github.com/Kuinox/DotnetProcessBugRepro

MichalPavlik commented 4 months ago

Could you please investigate this issue as it seems to be impacting our team (MSBuild)? Your attention to this matter would be greatly appreciated.

cc:@baronfel

Kuinox commented 2 months ago

@MichalPavlik hey, this issue is a bit in a limbo state due to the "need-further-triage" tag, none in the runtime team looked at this issue for the past 3 years.

MichalPavlik commented 2 months ago

@Kuinox we are still trying to resurrect this issue :) @baronfel was you able to find someone who would be willing to take a second look?

philasmar commented 1 week ago

I am running into this issue as well! Any updates?