docker / for-win

Bug reports for Docker Desktop for Windows
https://www.docker.com/products/docker#/windows
1.86k stars 289 forks source link

System Process High CPU use #12939

Closed hernandp closed 2 years ago

hernandp commented 2 years ago

Actual behavior

System Process CPU use >15%. Using ProcExp shows a thread of NTOSKRNL!Ordinal25+0x1730 address consuming cycles continuosly.
No containers running. Docker does not exit immediately if I request by UI.

Expected behavior

I expect low CPU use without containers running.

Information

Output of & "C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe" check

[2022-09-01T03:24:50.490070700Z][com.docker.diagnose.exe][I] set path configuration to OnHost
Starting diagnostics

[PASS] DD0027: is there available disk space on the host?
[PASS] DD0028: is there available VM disk space?
[PASS] DD0031: does the Docker API work?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0001: is the application running?
[SKIP] DD0018: does the host support virtualization?
[PASS] DD0002: does the bootloader have virtualization enabled?
[PASS] DD0017: can a VM be started?
[PASS] DD0024: is WSL installed?
[PASS] DD0021: is the WSL 2 Windows Feature enabled?
[PASS] DD0022: is the Virtual Machine Platform Windows Feature enabled?
[PASS] DD0025: are WSL distros installed?
[PASS] DD0026: is the WSL LxssManager service running?
[PASS] DD0029: is the WSL 2 Linux filesystem corrupt?
[PASS] DD0035: is the VM time synchronized?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0003: is the Docker CLI working?
[PASS] DD0013: is the $PATH ok?
[PASS] DD0005: is the user in the docker-users group?
[PASS] DD0007: is the backend responding?
[FAIL] DD0014: are the backend processes running? 1 error occurred:
        * vpnkit.exe is not running

[PASS] DD0008: is the native API responding?
[PASS] DD0009: is the vpnkit API responding?
[PASS] DD0010: is the Docker API proxy responding?
[PASS] DD0006: is the Docker Desktop Service responding?
[FAIL] DD0012: is the VM networking working? network checks failed: failed to ping host: exit status 1
[2022-09-01T03:25:05.314677400Z][com.docker.diagnose.exe][I] ipc.NewClient: c3802608-diagnose-network -> \\.\pipe\dockerDiagnosticd diagnosticsd
[common/pkg/diagkit/gather/diagnose.runIsVMNetworkingOK()
[       common/pkg/diagkit/gather/diagnose/network.go:34 +0xd9
[common/pkg/diagkit/gather/diagnose.(*test).GetResult(0xdecfe0)
[       common/pkg/diagkit/gather/diagnose/test.go:46 +0x43
[common/pkg/diagkit/gather/diagnose.Run.func1(0xdecfe0)
[       common/pkg/diagkit/gather/diagnose/run.go:17 +0x5a
[common/pkg/diagkit/gather/diagnose.walkOnce.func1(0x2?, 0xdecfe0)
[       common/pkg/diagkit/gather/diagnose/run.go:140 +0x77
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x1, 0xdecfe0, 0xc000479728)
[       common/pkg/diagkit/gather/diagnose/run.go:146 +0x36
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x0, 0xcb?, 0xc000479728)
[       common/pkg/diagkit/gather/diagnose/run.go:149 +0x73
[common/pkg/diagkit/gather/diagnose.walkOnce(0x7a3a40?, 0xc0002df890)
[       common/pkg/diagkit/gather/diagnose/run.go:135 +0xcc
[common/pkg/diagkit/gather/diagnose.Run(0xded5e0, 0x82b4fed00000010?, {0xc0002dfb20, 0x1, 0x1})
[       common/pkg/diagkit/gather/diagnose/run.go:16 +0x1ca
[main.checkCmd({0xc00007a3d0?, 0xc00007a3d0?, 0x4?}, {0x0, 0x0})
[       common/cmd/com.docker.diagnose/main.go:133 +0x105
[main.main()
[       common/cmd/com.docker.diagnose/main.go:99 +0x288
[2022-09-01T03:25:05.315334700Z][com.docker.diagnose.exe][I] (1cce39e7) c3802608-diagnose-network C->S diagnosticsd POST /check-network-connectivity: {"ips":["172.25.160.1","192.168.56.1","192.168.0.30","172.19.80.1","172.29.176.1","172.17.224.1","172.22.224.1"]}
[2022-09-01T03:25:05.883559100Z][com.docker.diagnose.exe][W] (1cce39e7) c3802608-diagnose-network C<-S dca7b135-diagnosticsd POST /check-network-connectivity (568.272ms): failed to ping host: exit status 1

[FAIL] DD0032: do Docker networks overlap with host IPs? network bridge has subnet 172.17.0.0/16 which overlaps with host IP 172.17.224.1
[SKIP] DD0030: is the image access management authorized?
[PASS] DD0033: does the host have Internet access?

Please investigate the following 3 issues:

1 : The test: are the backend processes running?
    Failed with: 1 error occurred:
        * vpnkit.exe is not running

Not all of the backend processes are running.

2 : The test: is the VM networking working?
    Failed with: network checks failed: failed to ping host: exit status 1

VM seems to have a network connectivity issue. Please check your host firewall and anti-virus settings in case they are blocking the VM.

3 : The test: do Docker networks overlap with host IPs?
    Failed with: network bridge has subnet 172.17.0.0/16 which overlaps with host IP 172.17.224.1

If the subnet used by a Docker network overlaps with an IP used by the host, then containers
won't be able to contact the overlapping IP addresses.

Please try configuring the IP address range used by networks: in your docker-compose.yml.
See https://docs.docker.com/compose/compose-file/compose-file-v2/#ipv4_address-ipv6_address

Steps to reproduce the behavior

Run Docker/WSL2 session e.g Terminal and wait until System process starts eating CPU.

hernandp commented 2 years ago

The kernel thread 108 is using the CPU by calling repeatedly KeBalanceSetManager, which can be caused by memory pressure. I analyzed a memory dump of my system and realized there were a lot (over 30000) of WSL.EXE processes seemingly dead -but each with 80kb allocated mem.

They were all created by docker backend process, e.g:

0: kd> !process 688 Searching for Process with Cid == 688 PROCESS ffffe784296c5080 SessionId: 1 Cid: 0688 Peb: 4683bf000 ParentCid: 4498 DirBase: 4fa68e000 ObjectTable: 00000000 HandleCount: 0. Image: wsl.exe VadRoot 0000000000000000 Vads 0 Clone 0 Private 15. Modified 5. Locked 0. DeviceMap ffffa90f59d5b1f0 Token ffffa90f72351060 ElapsedTime 20:36:27.640 UserTime 00:00:00.000 KernelTime 00:00:00.000 QuotaPoolUsage[PagedPool] 0 QuotaPoolUsage[NonPagedPool] 0 Working Set Sizes (now,min,max) (16, 50, 345) (64KB, 200KB, 1380KB) PeakWorkingSetSize 1902 VirtualSize 0 Mb PeakVirtualSize 2101308 Mb PageFaultCount 2021 MemoryPriority BACKGROUND BasePriority 8 CommitCharge 19 Job ffffe784332c20a0

Parent of that process?

0: kd> !process 4498

Searching for Process with Cid == 4498 PROCESS ffffe7842c84b300 SessionId: 1 Cid: 4498 Peb: ad5622d000 ParentCid: 2d74 DirBase: 513da3000 ObjectTable: ffffa90f63765800 HandleCount: 41579. Image: com.docker.backend.exe VadRoot ffffe7842c6bbd20 Vads 179 Clone 0 Private 6125. Modified 85188. Locked 5. DeviceMap ffffa90f59d5b1f0 Token ffffa90f633525f0 ElapsedTime 20:51:33.928 UserTime 00:00:49.046 KernelTime 00:02:00.531 QuotaPoolUsage[PagedPool] 929952 QuotaPoolUsage[NonPagedPool] 35440 Working Set Sizes (now,min,max) (10980, 50, 345) (43920KB, 200KB, 1380KB) PeakWorkingSetSize 14367 VirtualSize 4947 Mb PeakVirtualSize 4959 Mb PageFaultCount 872947 MemoryPriority BACKGROUND BasePriority 8 CommitCharge 10038 Job ffffe784332c20a0

Look at handle count :

image

Maybe this is caused by some problem where docker backend creates/kills a lot of dummy WSL.EXE processes in rapid succession.

hernandp commented 2 years ago

Related issue https://github.com/docker/for-win/issues/12916

nicks commented 2 years ago

does the problem persist if you upgrade to Docker Desktop v4.12? This sounds a lot like - https://github.com/docker/for-win/issues/12877

hernandp commented 2 years ago

Upgraded. Several hours without any issue. Let's hope 4.12 fixes it , thank you. I'll look again the next week as I'll be using Docker+WSL a lot.

nicks commented 2 years ago

Hooray! Closing as resolved.

docker-robott commented 1 year ago

Closed issues are locked after 30 days of inactivity. This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle locked