docker / for-mac

Bug reports for Docker Desktop for Mac
https://www.docker.com/products/docker#/mac
2.43k stars 118 forks source link

com.docker.supervisor uses 100% CPU when sending NTP requests to launchdns #3565

Open mistydemeo opened 5 years ago

mistydemeo commented 5 years ago

Apologies for the lack of diagnostics - I haven't yet determined the precise trigger for the bug, even though it happens frequently, but I will add diagnostics once it repros again for me.

Expected behavior

Docker should not use 100% CPU.

Actual behavior

Docker consistently uses 100% CPU even when doing nothing.

Information

References this comment earlier: https://github.com/docker/for-mac/issues/2923#issuecomment-443321345

That issue was closed, but this issue still exists. As noted in that comment:

The supervisor is sending 100,000s of (NTP?) packages per second to my local launchdns daemon

launchdns is a launchd-based local development DNS server which is popular for setting up domains for local development.

The stacktrace they provided was the following:

69279  com.docker.supervisor (18169) :0
 7472  <Unnamed Thread> 0x275557 :0
 6700  github.com/docker/pinata/mac/backend/ntp.NewServer.func1
 6684  github.com/docker/pinata/mac/backend/ntp.(*Server).Serve
 2910  net.(*UDPConn).WriteToUDP
 2882  net.(*UDPConn).writeTo
 2822  net.(*netFD).writeTo
 2787  internal/poll.(*FD).WriteTo
 2621  syscall.Sendto
 2589  syscall.Syscall6
   65  runtime.exitsyscall
   17  runtime.exitsyscallfast
    1  runtime.exitsyscallfast_reacquired

Diagnostic logs

Docker for Mac: version...

Steps to reproduce the behavior

  1. Install Docker for Mac.
  2. Install launchdns and start the service
  3. Wait. Eventually, Docker will start using 100% CPU.
mistydemeo commented 5 years ago

@ltm You wouldn't happen to have a diagnostic ID from when this has happened to you, would you?

ltm commented 5 years ago

@mistydemeo Sorry, I don't. I looked at the diagnostics myself but didn't submit it and I've since uninstalled launchdns to avoid the issue. I'll reinstall launchdns and submit the Docker diagnostics if the issue occurs again.

josh commented 5 years ago

I'd be curious to hear if the issue does/doesn't occur when launchdns is configured not to use launchd's socket activation feature. If it's the presence of any TLD DNS configuration file that triggers the issue.

christianclinton commented 5 years ago

I've captured a diagnostic report. Some interesting excerpts:

2019-03-13 16:23:07.593059-0700  localhost com.docker.vpnkit[62030]: DNS lookup docker-for-desktop.localdomain A: NoSuchRecord
2019-03-13 16:23:17.604396-0700  localhost com.docker.vpnkit[62030]: DNS lookup docker-for-desktop.localdomain A: NoSuchRecord
2019-03-13 16:23:27.617615-0700  localhost com.docker.vpnkit[62030]: DNS lookup docker-for-desktop.localdomain A: NoSuchRecord
2019-03-13 16:24:00.439940-0700  localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:01.662947-0700  localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:01.760843-0700  localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:02.110746-0700  localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:02.548039-0700  localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:02.982140-0700  localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:03.112082-0700  localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:03.849373-0700  localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:03.959842-0700  localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected

tcpdump:

16:43:50.058653 pid launchdns.31199 svc BE IP localhost.55353 > localhost.61595: UDP, length 48
    0x0000:  4500 004c 1f0f 0000 4011 0000 7f00 0001  E..L....@.......
    0x0010:  7f00 0001 d839 f09b 0038 fe4b 2201 8103  .....9...8.K"...
    0x0020:  0000 0000 0000 0000 4c4f 434c e034 1436  ........LOCL.4.6
    0x0030:  0f00 79a2 e034 1436 0ef9 a8cd e034 1436  ..y..4.6.....4.6
    0x0040:  0f00 79a2 e034 1436 0f00 79a2            ..y..4.6..y.
16:43:50.058684 pid launchdns.31199 svc BE IP localhost.55353 > localhost.61595: UDP, length 48
    0x0000:  4500 004c e2db 0000 4011 0000 7f00 0001  E..L....@.......
    0x0010:  7f00 0001 d839 f09b 0038 fe4b 2201 8103  .....9...8.K"...
    0x0020:  0000 0000 0000 0000 4c4f 434c e034 1436  ........LOCL.4.6
    0x0030:  0f02 a348 e034 1436 0efc dee3 e034 1436  ...H.4.6.....4.6
    0x0040:  0f02 a348 e034 1436 0f02 a348            ...H.4.6...H
16:43:50.058690 pid com.docker.supe.62028 svc BE IP localhost.61595 > localhost.55353: UDP, length 48
    0x0000:  4500 004c 09e8 0000 4011 0000 7f00 0001  E..L....@.......
    0x0010:  7f00 0001 f09b d839 0038 fe4b 2201 81ec  .......9.8.K"...
    0x0020:  0000 0000 0000 0000 4c4f 434c e034 1436  ........LOCL.4.6
    0x0030:  0f05 100a e034 1436 0efe c56d e034 1436  .....4.6...m.4.6
    0x0040:  0f05 100a e034 1436 0f05 100a            .....4.6....
16:43:50.058717 pid com.docker.supe.62028 svc BE IP localhost.61595 > localhost.55353: UDP, length 48
    0x0000:  4500 004c 805d 0000 4011 0000 7f00 0001  E..L.]..@.......
    0x0010:  7f00 0001 f09b d839 0038 fe4b 2201 81ec  .......9.8.K"...
    0x0020:  0000 0000 0000 0000 4c4f 434c e034 1436  ........LOCL.4.6
    0x0030:  0f07 1822 e034 1436 0f00 79a2 e034 1436  ...".4.6..y..4.6
    0x0040:  0f07 1822 e034 1436 0f07 1822            ...".4.6..."

The diag report contains a lot of data that i'm not comfortable uploading (lots to vet), but I've saved it and happy to upload specific pieces, or pair with someone to review it.

mistydemeo commented 5 years ago

I've also been able to reproduce and have logs. Like @christianclinton, my logs contain some private data so I haven't uploaded everything, but I'm glad to privately share whatever might be useful.

ltm commented 5 years ago

Screen Shot 2019-03-18 at 1 12 09 PM

This happened to me again this morning. I'm guessing that it could be related to switching from one network to another network (from my home network to my office network).

Diagnostics ID: CC9F1ADF-9A8A-4DF9-B333-0963EDDF6FA4/20190318170117

ltm commented 5 years ago

I'm staring to have suspicion about what's going on. I've tried to illustrate it below. The components in play are:

The sntpc client will send an NTP request to gateway.docker.internal. Since vpnkit is configured to forward NTP traffic, it will bind an ephemeral port and relay the NTP request to the supervisor. This ephemeral port will be the range from 49152 to 65535 and thus there's a chance it could be 55353*.

The supervisor now receives an NTP request from localhost:55353. If at this point launchdns has managed to bind port 55353, it will receive the NTP response from the supervisor. Since launchdns simply echos invalid DNS requests (see https://github.com/docker/for-mac/issues/2923#issuecomment-443328091) it will send the NTP response back to the supervisor and the snowball has started rolling.

Therefore I believe one workaround would be to configure launchdns to use a port outside the ephemeral range (49152-65535).

sntpc                   vpnkit                supervisor
  |                 udpv4-forwards                | UDP:50000
  |              123:127.0.0.1:50000              |
  |                       |                       |
  |     NTPv4, Client     |                       |
  | --------------------> |                       |
  | src:40000    dst:123  |                       |
  |                     bind()                    |
  |                       |     NTPv4, Client     |
  |                       | --------------------> |
  |                       | src:55353   dst:50000 |
  |                       ?                       |
  |                                               |
  |                   launchdns                   |
  |             UDP:55353 |     NTPv4, Server     |
  |                       | <-------------------- |
  |                       | dst:55353   src:50000 |
  |                       |                       |
  |                       |     NTPv4, Server     |
  |                       | --------------------> |
  |                       | src:55353   dst:50000 |

* I'm not sure how this would happen while launchdns is bound to that port

pedrocesarti commented 5 years ago

Same here, I only notice the behavior on my home network and I'm not using launchdns particularly, I think.

Screen Shot 2019-04-05 at 09 28 25

default 09:19:40.840060 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:19:50.932793 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:00.980572 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:11.210931 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:21.337961 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:31.471990 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:41.514779 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:51.554500 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:21:02.090590 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:21:13.580025 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:21:23.680790 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord

Also saved the diagnostic logs if something needed later I'm happy to have a chat about it!

mistydemeo commented 5 years ago

Checking in again - haven't heart back on this yet.

dannystaple commented 5 years ago

I am not 100% sure, but a factor in this may be that I am switching network. I am on a MacBook, which I dock in one location with physical ethernet and then switch to WiFi when not docked. I am seeing it frequently.

docker-robott commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale comment. Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale

mistydemeo commented 5 years ago

/lifecycle frozen

mistydemeo commented 5 years ago

This issue is not stale. We're still waiting for any kind of response from a Docker for Mac maintainer or Docker employee.

mistydemeo commented 5 years ago

/remove-lifecycle stale

bmusat commented 4 years ago

any update on this? I just had this scenario happen on my Mac with Docker ### 2019-11-14 2.1.0.5. I noticed two processes running com.docker.supervisor using 100% cpu for two users that are logged into my system. One the active user, and the other a user logged in but in the background.

mistydemeo commented 4 years ago

It's been nine months without a response from a Docker employee. I'd really appreciate having some kind of acknowledgment of this bug. 💖

BERRAMOU commented 4 years ago

Same issue still persist in 19.03.5 version

$ docker --version
Docker version 19.03.5, build 633a0ea
Screenshot 2019-12-26 at 10 08 17