dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.13k stars 4.7k forks source link

System.Diagnostics.Process.Kill(entireProcessTree: true) doesn't kill entire tree if an intermediate child process is in job object with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE #107992

Open 13thirteen opened 4 weeks ago

13thirteen commented 4 weeks ago

Description

Assume you have this process tree:

Process 1 creates process 2 and adds it to a new Windows job object with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE such that process 2 is automatically killed when process 1 gets terminated. Process 2 then creates process 3 which is not part of that job object.

In this scenario, calling Kill(entireProcessTree: true) on process 1 terminates process 1 and process 2 but NOT process 3.

This scenario occurs for example with the Windows Python Launcher py.exe. py.exe starts python.exe in a job object, but subprocesses started by the Python script are not in that job object, by design:

the launcher will execute its command in a child process, remaining alive while the child process is executing

the Win32 Job API will be used to arrange so that the child process is automatically killed when the parent is terminated (although children of that child process will continue as is the case now.)

Reproduction Steps

  1. Create DotnetKillTree.exe:
    dotnet new console -n DotnetKillTree -o .
    echo System.Diagnostics.Process.GetProcessById(int.Parse(args[0])).Kill(entireProcessTree: true); > Program.cs
    dotnet publish -r win-x64 --sc -p:PublishSingleFile=true -p:PublishTrimmed=true -p:DebugType=None -p:DebugSymbols=false -o .
  2. Download and install Python for Windows from https://www.python.org/downloads/
  3. Download PsList from https://learn.microsoft.com/en-us/sysinternals/downloads/pslist
  4. Start the process tree in a new console:
    py.exe -c "import subprocess;subprocess.run(['ping.exe', '-t', 'localhost'])"
  5. In another console show the process tree:
    
    C:\Users\WDAGUtilityAccount\Desktop\Test>pslist -t

PsList v1.41 - Process information lister Copyright (C) 2000-2023 Mark Russinovich Sysinternals - www.sysinternals.com

Process information for A1ACB1A2-C520-4:

Name Pid Pri Thd Hnd VM WS Priv Idle 0 0 16 0 8 8 60 ... explorer 4244 8 59 2497 4194303 164184 64388 cmd 1356 8 4 81 4194303 4800 5468 pslist 4596 13 4 223 63620 7752 2436 conhost 5788 8 7 201 4194303 18800 7236 cmd 4796 8 1 84 4194303 4860 2272 conhost 720 8 4 202 4194303 18880 7148 py 5416 8 4 144 67540 7012 1596 python 4724 8 4 90 4194303 10936 7216 PING 4268 8 6 94 4194303 4528 996

6. Kill the `py` process subtree:

C:\Users\WDAGUtilityAccount\Desktop\Test>DotnetKillTree.exe 5416

7. Check the process tree:

C:\Users\WDAGUtilityAccount\Desktop\Test>pslist -t

Name Pid Pri Thd Hnd VM WS Priv Idle 0 0 16 0 8 8 60 ... explorer 4244 8 55 2483 4194303 164800 64732 cmd 1356 8 2 84 4194303 4896 3892 pslist 3608 13 4 223 63620 7748 2428 conhost 5788 8 8 204 4194303 18880 7276 cmd 4796 8 1 83 4194303 4860 2532 conhost 720 8 4 200 4194303 18880 7148 PING 4268 8 6 94 4194303 4536 1004


### Expected behavior

The `PING` process `4268` is also killed.

### Actual behavior

The `PING` process `4268` remains running.

### Regression?

I don't know. Probably not.

### Known Workarounds

Avoid creating processes in job objects.
For example directly start `python.exe` instead of `py.exe`:

python.exe -c "import subprocess;subprocess.run(['ping.exe', '-t', 'localhost'])"



### Configuration

.NET 8.0.202
Windows 10 Pro 22H2
x64
It's only an issue on Windows (because of job objects).
But it's probably not specific to this configuration.

### Other information

Not sure, but the problem might be that the [KillTree(SafeProcessHandle handle)](https://github.com/dotnet/runtime/blob/v8.0.8/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Win32.cs#L386) method first [kills a process](https://github.com/dotnet/runtime/blob/v8.0.8/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Win32.cs#L397) (so that no further children can be created) and then [lists its children](https://github.com/dotnet/runtime/blob/v8.0.8/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Win32.cs#L404) and [recursively kills them](https://github.com/dotnet/runtime/blob/v8.0.8/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Win32.cs#L409).

I suspect that in the above scenario killing process 1 closes the job object which already kills process 2 (but not process 3). And then process 2 is no longer a child of process 1 (and neither is process 3) and therefore the recursion ends and process 3 remains running.

To inspect the job object of each process (in a different test, hence the PIDs don't match) I used [ProcessExplorer](https://learn.microsoft.com/en-us/sysinternals/downloads/process-explorer):

![repro_2_py](https://github.com/user-attachments/assets/e9acc127-729e-418a-9745-a23b5270ee63)
![repro_3_python](https://github.com/user-attachments/assets/9a5bda44-8458-48a7-add5-b34e252e2131)
![repro_5_ping](https://github.com/user-attachments/assets/74e7e3da-7958-4daf-a0ec-c5770c9d17e3)
dotnet-policy-service[bot] commented 4 weeks ago

Tagging subscribers to this area: @dotnet/area-system-diagnostics-process See info in area-owners.md if you want to be subscribed.