Open mistydemeo opened 5 years ago
@ltm You wouldn't happen to have a diagnostic ID from when this has happened to you, would you?
@mistydemeo Sorry, I don't. I looked at the diagnostics myself but didn't submit it and I've since uninstalled launchdns to avoid the issue. I'll reinstall launchdns and submit the Docker diagnostics if the issue occurs again.
I'd be curious to hear if the issue does/doesn't occur when launchdns is configured not to use launchd's socket activation feature. If it's the presence of any TLD DNS configuration file that triggers the issue.
I've captured a diagnostic report. Some interesting excerpts:
2019-03-13 16:23:07.593059-0700 localhost com.docker.vpnkit[62030]: DNS lookup docker-for-desktop.localdomain A: NoSuchRecord
2019-03-13 16:23:17.604396-0700 localhost com.docker.vpnkit[62030]: DNS lookup docker-for-desktop.localdomain A: NoSuchRecord
2019-03-13 16:23:27.617615-0700 localhost com.docker.vpnkit[62030]: DNS lookup docker-for-desktop.localdomain A: NoSuchRecord
2019-03-13 16:24:00.439940-0700 localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:01.662947-0700 localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:01.760843-0700 localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:02.110746-0700 localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:02.548039-0700 localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:02.982140-0700 localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:03.112082-0700 localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:03.849373-0700 localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
2019-03-13 16:24:03.959842-0700 localhost com.docker.vpnkit[62030]: Socket.Stream: caught Socket is not connected
tcpdump:
16:43:50.058653 pid launchdns.31199 svc BE IP localhost.55353 > localhost.61595: UDP, length 48
0x0000: 4500 004c 1f0f 0000 4011 0000 7f00 0001 E..L....@.......
0x0010: 7f00 0001 d839 f09b 0038 fe4b 2201 8103 .....9...8.K"...
0x0020: 0000 0000 0000 0000 4c4f 434c e034 1436 ........LOCL.4.6
0x0030: 0f00 79a2 e034 1436 0ef9 a8cd e034 1436 ..y..4.6.....4.6
0x0040: 0f00 79a2 e034 1436 0f00 79a2 ..y..4.6..y.
16:43:50.058684 pid launchdns.31199 svc BE IP localhost.55353 > localhost.61595: UDP, length 48
0x0000: 4500 004c e2db 0000 4011 0000 7f00 0001 E..L....@.......
0x0010: 7f00 0001 d839 f09b 0038 fe4b 2201 8103 .....9...8.K"...
0x0020: 0000 0000 0000 0000 4c4f 434c e034 1436 ........LOCL.4.6
0x0030: 0f02 a348 e034 1436 0efc dee3 e034 1436 ...H.4.6.....4.6
0x0040: 0f02 a348 e034 1436 0f02 a348 ...H.4.6...H
16:43:50.058690 pid com.docker.supe.62028 svc BE IP localhost.61595 > localhost.55353: UDP, length 48
0x0000: 4500 004c 09e8 0000 4011 0000 7f00 0001 E..L....@.......
0x0010: 7f00 0001 f09b d839 0038 fe4b 2201 81ec .......9.8.K"...
0x0020: 0000 0000 0000 0000 4c4f 434c e034 1436 ........LOCL.4.6
0x0030: 0f05 100a e034 1436 0efe c56d e034 1436 .....4.6...m.4.6
0x0040: 0f05 100a e034 1436 0f05 100a .....4.6....
16:43:50.058717 pid com.docker.supe.62028 svc BE IP localhost.61595 > localhost.55353: UDP, length 48
0x0000: 4500 004c 805d 0000 4011 0000 7f00 0001 E..L.]..@.......
0x0010: 7f00 0001 f09b d839 0038 fe4b 2201 81ec .......9.8.K"...
0x0020: 0000 0000 0000 0000 4c4f 434c e034 1436 ........LOCL.4.6
0x0030: 0f07 1822 e034 1436 0f00 79a2 e034 1436 ...".4.6..y..4.6
0x0040: 0f07 1822 e034 1436 0f07 1822 ...".4.6..."
The diag report contains a lot of data that i'm not comfortable uploading (lots to vet), but I've saved it and happy to upload specific pieces, or pair with someone to review it.
I've also been able to reproduce and have logs. Like @christianclinton, my logs contain some private data so I haven't uploaded everything, but I'm glad to privately share whatever might be useful.
This happened to me again this morning. I'm guessing that it could be related to switching from one network to another network (from my home network to my office network).
Diagnostics ID: CC9F1ADF-9A8A-4DF9-B333-0963EDDF6FA4/20190318170117
I'm staring to have suspicion about what's going on. I've tried to illustrate it below. The components in play are:
The sntpc client will send an NTP request to gateway.docker.internal. Since vpnkit is configured to forward NTP traffic, it will bind an ephemeral port and relay the NTP request to the supervisor. This ephemeral port will be the range from 49152 to 65535 and thus there's a chance it could be 55353*.
The supervisor now receives an NTP request from localhost:55353. If at this point launchdns has managed to bind port 55353, it will receive the NTP response from the supervisor. Since launchdns simply echos invalid DNS requests (see https://github.com/docker/for-mac/issues/2923#issuecomment-443328091) it will send the NTP response back to the supervisor and the snowball has started rolling.
Therefore I believe one workaround would be to configure launchdns to use a port outside the ephemeral range (49152-65535).
sntpc vpnkit supervisor
| udpv4-forwards | UDP:50000
| 123:127.0.0.1:50000 |
| | |
| NTPv4, Client | |
| --------------------> | |
| src:40000 dst:123 | |
| bind() |
| | NTPv4, Client |
| | --------------------> |
| | src:55353 dst:50000 |
| ? |
| |
| launchdns |
| UDP:55353 | NTPv4, Server |
| | <-------------------- |
| | dst:55353 src:50000 |
| | |
| | NTPv4, Server |
| | --------------------> |
| | src:55353 dst:50000 |
* I'm not sure how this would happen while launchdns is bound to that port
Same here, I only notice the behavior on my home network and I'm not using launchdns particularly, I think.
default 09:19:40.840060 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:19:50.932793 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:00.980572 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:11.210931 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:21.337961 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:31.471990 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:41.514779 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:20:51.554500 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:21:02.090590 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:21:13.580025 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord default 09:21:23.680790 +1300 com.docker.vpnkit DNS lookup docker-for-desktop.fritz.box A: NoSuchRecord
Also saved the diagnostic logs if something needed later I'm happy to have a chat about it!
Checking in again - haven't heart back on this yet.
I am not 100% sure, but a factor in this may be that I am switching network. I am on a MacBook, which I dock in one location with physical ethernet and then switch to WiFi when not docked. I am seeing it frequently.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
comment.
Stale issues will be closed after an additional 30d of inactivity.
Prevent issues from auto-closing with an /lifecycle frozen
comment.
If this issue is safe to close now please do so.
Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale
/lifecycle frozen
This issue is not stale. We're still waiting for any kind of response from a Docker for Mac maintainer or Docker employee.
/remove-lifecycle stale
any update on this? I just had this scenario happen on my Mac with Docker ### 2019-11-14 2.1.0.5. I noticed two processes running com.docker.supervisor using 100% cpu for two users that are logged into my system. One the active user, and the other a user logged in but in the background.
It's been nine months without a response from a Docker employee. I'd really appreciate having some kind of acknowledgment of this bug. 💖
Same issue still persist in 19.03.5 version
$ docker --version
Docker version 19.03.5, build 633a0ea
Apologies for the lack of diagnostics - I haven't yet determined the precise trigger for the bug, even though it happens frequently, but I will add diagnostics once it repros again for me.
Expected behavior
Docker should not use 100% CPU.
Actual behavior
Docker consistently uses 100% CPU even when doing nothing.
Information
References this comment earlier: https://github.com/docker/for-mac/issues/2923#issuecomment-443321345
That issue was closed, but this issue still exists. As noted in that comment:
launchdns is a launchd-based local development DNS server which is popular for setting up domains for local development.
The stacktrace they provided was the following:
Diagnostic logs
Steps to reproduce the behavior