docker / for-mac

Bug reports for Docker Desktop for Mac
https://www.docker.com/products/docker#/mac
2.43k stars 117 forks source link

Docker Dekstop 4.16.2 stopped #6700

Open wslawski-printify opened 1 year ago

wslawski-printify commented 1 year ago

Expected behavior

Docker Desktop should be running all the time, not randomly stop.

Actual behavior

Docker Destkop has stopped and I need to reboot to start using it again properly.

Information

Output of /Applications/Docker.app/Contents/MacOS/com.docker.diagnose check

Please note the following 4 warnings:

1 : The check: are the LinuxKit services running?
    Produced the following warning: failed to ping VM diagnosticsd with error: Get "http://ipc/ping": dial unix diagnosticd.sock: connect: no such file or directory

The Docker engine runs inside a Linux VM as a service. Therefore the services must have started.

2 : The check: is the Docker engine running?
    Produced the following warning: Get "http://ipc/docker": dial unix lifecycle-server.sock: connect: no such file or directory

The Docker engine manages all containers and images on the host. Check the dockerd.log to see why it failed to start.

3 : The check: does the Docker API work?
    Produced the following warning: Cannot connect to the Docker daemon at unix://docker.raw.sock. Is the docker daemon running?

If the Docker API is not available from the host then Docker Desktop will not work correctly.

4 : The check: do Docker networks overlap with host IPs?
    Produced the following warning: Cannot connect to the Docker daemon at unix://docker.raw.sock. Is the docker daemon running?

If the subnet used by a Docker network overlaps with an IP used by the host, then containers
won't be able to contact the overlapping IP addresses.

Try configuring the IP address range used by networks: in your docker-compose.yml.
See https://docs.docker.com/compose/compose-file/compose-file-v2/#ipv4_address-ipv6_address

Please investigate the following 2 issues:

1 : The test: are the LinuxKit services running?
    Failed with: failed to ping VM diagnosticsd with error: Get "http://ipc/ping": dial unix diagnosticd.sock: connect: no such file or directory

The Docker engine runs inside a Linux VM as a service. Therefore the services must have started.

2 : The test: are the backend processes running?
    Failed with: 4 errors occurred:
    * vpnkit-bridge is not running
    * com.docker.vpnkit is not running
    * com.docker.driver.amd64-linux is not running
    * com.docker.virtualization is not running

Steps to reproduce the behavior

None, it happens pretty randomly, most of the time when there is some inactivity on my mac. It seemed at first that disabling automatic updates of docker desktop improved a thing for a bit, but after sometime it happened anyway.

This is pretty much continuation of https://github.com/docker/for-mac/issues/6472 which was closed as completed when it's not completed - issue is still happening, at least for me. This is pretty new mac - one month old, and this issue is happening for me from the first day I started working on it.

jeffadavidson commented 1 year ago

I am also still seeing this issue across multiple machines and users, same as in #6472

roger6106 commented 1 year ago

I just had this occur again as well. I'm also on M1 Pro with Docker Desktop 14.16.2, Mac OS 13.1, even after disabling experimental features. I first recall seeing this issue around a month ago.

My Diagnostic ID is B2221360-2C19-483C-859C-58BDF87D201D/20230123193345.

I'm not sure if it's relevant, but I'm using Virtualization Framework and VirtioFS. I believe I was already experiencing this before enabling VirtioFS.

For anyone wanting to avoid the computer restart, it's also possible to Quit Docker from the Activity Monitor:

  1. Open Activity Monitor.
  2. Search for Docker.
  3. Highlight all processes.
  4. Click the Stop button.
  5. Choose "Quit". It is not necessary to force it.
Palwisha-18 commented 1 year ago

After updating mac OS to Ventura 13.0, my Docker Desktop stopped working. Tried uninstalling Docker app. Installed through both Docker website and cmd line (brew), keep getting docker daemon error and Docker Desktop GUI doesn't open as well. Activity monitor shows that Docker app is not responding.

roger6106 commented 1 year ago

@Palwisha-18: That's something else. This is for when Docker runs, but after a while stops automatically. With this issue, however, it gets stuck in the state of trying to stop.

gondalez commented 1 year ago

In case it helps diagnosis I have been having this issue ever since 4.13.0.

I have tried every release up to and including 4.16.2 and the issue persists. I have tried factory reset also but the issue persists.

I'm running MacOS Ventura/13.1 on an m1 max mbp.

Installing 4.12.0 stops the crashes, but I am hoping to see it resolved in a new build so I benefit from the nice new features offered :)

nuvolapl commented 1 year ago

It seems that running softwareupdate --install-rosetta helps (even if you executed it earlier). At least in my case.

gondalez commented 1 year ago

Edit: this was not the cause, see subsequent comments.

I may have a lead on this!

Since changing my gc setting from true to false, I have not had the crash under 4.16.2 (95914):

image

This setting was false by default for a teammate.

I did not actively set it this setting to true. However, I have used a couple of docker mac dev builds from other github threads so my hunch is that the settings was changed to true for one of those builds and it survived the upgrade to 4.16.2. 🀷

GC being the cause seems to fit since my crashes always seemed to happen when I left docker idle for a while. (My assumption is that GC happens on idle.)

So far docker mac 4.16.2 has survived a afternoon of dev, overnight and a morning of dev without crashing which it had not previously 😌

wslawski-printify commented 1 year ago

@gondalez tried this but doesn't seem to be working, it again stopped for me this morning

domstubbs commented 1 year ago

Disabling GC doesn’t seem to have helped here either. I’m running VirtioFS as well, in case that makes any difference.

gondalez commented 1 year ago

@wslawski-printify @domstubbs I can report the crash has returned for me after a couple of days after disabling GC also. It must have been a coincidence. I got a bit excited - sorry about that πŸ˜… 😞

gondalez commented 1 year ago

Adding my diagnostic id too: 656B9598-A8FA-495B-92E4-81F699FB2D75/20230127070943 Maybe @djs55 can help? πŸ™

beau-ronin commented 1 year ago

Same issue for me. Need a fix ASAP.

roger6106 commented 1 year ago

I have not had this issue reoccur since updating to macOS 13.2, although that may just be a coincidence.

beau-ronin commented 1 year ago

I have not had this issue reoccur since updating to macOS 13.2, although that may just be a coincidence.

I'll give that a shot and report back.

domstubbs commented 1 year ago

It’s also been fine for me all day following a full shut down over the weekend. If you’re in the habit of just hitting sleep then a full reboot might also be worth a go. I’m still on 12.6 for now.

muldos commented 1 year ago

In case it helps diagnosis I have been having this issue ever since 4.13.0.

I have tried every release up to and including 4.16.2 and the issue persists. I have tried factory reset also but the issue persists.

I'm running MacOS Ventura/13.1 on an m1 max mbp.

Installing 4.12.0 stops the crashes, but I am hoping to see it resolved in a new build so I benefit from the nice new features offered :)

Exactly the same here

domstubbs commented 1 year ago

Well I spoke too soon – nearly 2 days without crashing, but it’s just fallen over again. Docker has been a huge boost to my dev workflow in many respects, but it’s hard not to feel a bit wary when its reliability is so frequently an issue.

beau-ronin commented 1 year ago

It's been 24 hours since I upgraded MacOS, and docker is still running. Good news so far.

roger6106 commented 1 year ago

I just had this issue occur on MacOS 13.2, so upgrading to that is not a fix.

joshriverscambia2019 commented 1 year ago

Just reverted to 4.10.1 after losing another half-day to this bug. Really hoping that the team can express some concern about this.

gondalez commented 1 year ago

My workflow is to sleep my mac at the end of the work day, and throughout the day.

Since rebooting for the first time in some months a couple days ago the issue has not recurred for two full days of fairly heavy use. A new PB πŸ˜…

Anecdotal of course. Curious if there are any daily rebooters that are still experiencing the issue, just to rule sleeping out as the cause.

wslawski-printify commented 1 year ago

@gondalez

I am pretty much rebooting every day. And I am using mac first time in my life, for about 2 months(it's also a totally new mac, fresh hardware I got) so yea, great working experience so far.

delfuego commented 1 year ago

I'm seeing this exact issue as of today, running 4.16.2 on macOS 13.2. I too have VirtioFS enabled; I will try to move to gRPC FUSE to see if this fixes anything, but I'm curious if anyone thinks that this is truly related or just a red herring.

djs55 commented 1 year ago

There are some stability fixes for Mac in the latest developer builds if you'd like to try them:

joshriverscambia2019 commented 1 year ago

There are some stability fixes for Mac in the latest developer builds if you'd like to try them:

Any details or changelogs available here? I ask because this set of issues is pretty frustrating to troubleshoot/replicate/resolve from an end-user perspective. The issue is intermittent, can go into remission for days, and presents to the user as 'stopped working' with no real detail of what has failed. Knowing what is being made more stable may help us troubleshoot and get a better picture of what is wrong.

delfuego commented 1 year ago

Yeah β€” I turned off VirtioFS, but just encountered the issue again β€” so it's not VirtioFS. The diagnostics ID for the issue I just ran into is F92C7CF8-E93F-46D6-992C-12D3F8EFC6CB/20230201175414, if that's worth anything.

And I'm with @joshriverscambia2019 β€” I can't really justify installing developer builds on my primary work machine unless I have some understanding of what I'm getting into.

djs55 commented 1 year ago

@joshriverscambia2019 @delfuego I totally understand-- if you're not blocked then feel free to stick with the released builds. The particular fix I'm thinking of will be in the 4.17 update anyway.

I suspect the "stopped" state is coming from the engine shutting down after the com.docker.vpnkit component segfaults. If so there would be evidence in the log ~/Library/Containers/com.docker.docker/Data/log/host/com.docker.backend.log.

delfuego commented 1 year ago

@djs55 I don't see any evidence of a com.docker.vpnkit segfault in that file, but my diagnostics ID is above, so I presume Docker has that file (and more) from my installation when Docker had decided to stop on its own, and was failing to do so with Docker Desktop spinning and say "Docker Desktop is stopping" interminably.

I'd say, though, that yes, I'm blocked β€” I can't keep Docker running properly on my Mac right now, meaning that my dev cycles keep getting interrupted by having to note that Docker has crashed, then kill all active Docker processes, restart Docker, wait for all my containers to start back up, and restart my dev work. But again, without knowing what else might be in those dev builds, it's hard to feel like I can just give them a try. We don't even know whether moving to one of those dev builds would prevent us from then being able to move back to 4.16.2 if we experience even more instability...

roger6106 commented 1 year ago

I do have a com.docker.vpnkit segfault occurring at the time I had this issue:

[com.docker.backend][I] com.docker.vpnkit with pid: 17644 shutdown by signal: segmentation fault

In my case, I had to look at com.docker.backend.log.0 to find the log for that time.

delfuego commented 1 year ago

@roger6106 and @djs55 Yep, if I go back to the com.docker.backend.log.0 file, I see the com.docker.vpnkit segfaults. (But I again presume that Docker has all that info in the diagnostics dumps that we've provided.)

djs55 commented 1 year ago

Since you have the segfaults the development build should help. It's the same build I've been sharing on other tickets with other users. It will upgrade as normal to 4.17 when that is released in a few weeks. Downgrade to 4.16.3 is not tested but usually works. From time to time when downgrade from a development build fails it's necessary to delete the "settings.json" file because some change there confuses older builds. In the worst case it would require a "reset to factory defaults" which would lose current containers and images requiring them to be rebuilt.

gondalez commented 1 year ago

Thanks @djs55 πŸ™

I can confirm I have the segmentation fault also:

Β» grep segmentation ~/Library/Containers/com.docker.docker/Data/log/host/com.docker.backend.log
[2023-02-01T09:00:28.672796000Z][com.docker.backend][I] com.docker.vpnkit with pid: 1701 shutdown by signal: segmentation fault

I will try the development build, thanks!

Also I have to say that the recent performance gains (virtio, rosetta, ...) of late have made my dev work so much more enjoyable. Even with this little crash it feels streets ahead. So thank you to the docker team for your work on that πŸ™

wslawski-printify commented 1 year ago

Well, that's interesting, I don't have segmentation fault on other hand. Only have this one that looks suspicious:

Docker.log:time="2023-01-26T08:53:12Z" level="info" msg="IPCSession: (5849f6b0) d3370cdf-SwiftAPI S<-C 7208b735-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 8614 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log:time="2023-01-27T13:11:53Z" level="info" msg="IPCSession: (b8db985b) 310f1b8a-SwiftAPI S<-C 1f8a1fe4-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 3559 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log:time="2023-01-30T09:58:35Z" level="info" msg="IPCSession: (e77e3b6e) 03e7d843-SwiftAPI S<-C 3fc9ae28-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 6006 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log:time="2023-02-01T07:56:38Z" level="info" msg="IPCSession: (00933b29) 2b8551dd-SwiftAPI S<-C 16aff56b-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 10288 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log.0:time="2022-12-29T13:43:20Z" level="info" msg="IPCSession: (946b3a08) 382b4c70-SwiftAPI S<-C d1665204-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 32468 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log.0:time="2023-01-03T11:42:48Z" level="info" msg="IPCSession: (f4561583) e9b556dc-SwiftAPI S<-C e5f1ce96-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 6430 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log.0:time="2023-01-04T16:06:44Z" level="info" msg="IPCSession: (74ff5b64) e49c4aa9-SwiftAPI S<-C b780037b-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 3189 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log.0:time="2023-01-05T12:14:03Z" level="info" msg="IPCSession: (f55c1ca1) e6dec1e5-SwiftAPI S<-C 5a64cf31-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 3986 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log.0:time="2023-01-09T09:21:34Z" level="info" msg="IPCSession: (b36f0c99) 068799a9-SwiftAPI S<-C 78f390f8-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 6195 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log.0:time="2023-01-11T14:49:06Z" level="info" msg="IPCSession: (cf5863b8) dc41b720-SwiftAPI S<-C 97e8cf19-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 11066 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log.0:time="2023-01-19T14:00:55Z" level="info" msg="IPCSession: (ffd0c9b0) a7e05205-SwiftAPI S<-C e239740e-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 4511 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log.0:time="2023-01-20T16:30:35Z" level="info" msg="IPCSession: (9c36693f) d8f5c0d4-SwiftAPI S<-C b16a8246-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 4570 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log.0:time="2023-01-23T09:37:33Z" level="info" msg="IPCSession: (4123a364) bed69888-SwiftAPI S<-C 0ccfd530-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 10095 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log.0:time="2023-01-24T08:41:11Z" level="info" msg="IPCSession: (343107fd) b7a9f9bd-SwiftAPI S<-C d73c8a69-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 5356 failed"", diagnose: Optional(""failed to run backend processes""))"
Docker.log.0:time="2023-01-25T09:10:02Z" level="info" msg="IPCSession: (8eb2a3bb) 1a94e21b-SwiftAPI S<-C 0149cf7d-BackendCMD bind: FatalRequest(message: ""supervising tasks: task com.docker.vpnkit with pid: 6126 failed"", diagnose: Optional(""failed to run backend processes""))"

I was greping in all log files.

ianjukes commented 1 year ago

@djs55 Huge relief here. I have been running 4.17 on my Mac Studio M1 Max for over 24 hours and absolutely no crashing. It has been perfectly stable.

verluci commented 1 year ago

Sticking to 4.12 works for exactly 4 weeks and then starts crashing. I've had to come back to this thread and https://github.com/docker/for-mac/issues/6472 three times now, last time was Dec 5th and yesterday my containers started crashing again.

wslawski-printify commented 1 year ago

I had 4.17 running through whole weekend, coming back at Monday and it's still running so it seems good for now.

gondalez commented 1 year ago

4.17 has had no crashes for me too 😌 5 days and counting.

flamedmg commented 1 year ago

Guys where did you get 4.17? according to this: https://docs.docker.com/desktop/release-notes/ latest is 4.16.3

wslawski-printify commented 1 year ago

https://desktop-stage.docker.com/mac/main/arm64/96578/Docker.dmg From this link, seems it's beta version?

djs55 commented 1 year ago

@flamedmg @wslawski-printify yes that's a good link.. It's a development build which I've been testing. The fix will be released in 4.17 in around 1-2 weeks. Feel free to use that build in the meantime. It should upgrade ok to 4.17 when that is released.

delfuego commented 1 year ago

I too have now been running the dev build of 4.17 and haven't seen a crash. FWIW.

webuniverseio commented 1 year ago

I have an M1 chip so above link should work, but it always says that docker is damaged. Does anyone else run into this?

Update: Restart fixed that

webuniverseio commented 1 year ago

I may have a lead on this!

Since changing my gc setting from true to false, I have not had the crash under 4.16.2 (95914):

image

This setting was false by default for a teammate.

I did not actively set it this setting to true. However, I have used a couple of docker mac dev builds from other github threads so my hunch is that the settings was changed to true for one of those builds and it survived the upgrade to 4.16.2. 🀷

GC being the cause seems to fit since my crashes always seemed to happen when I left docker idle for a while. (My assumption is that GC happens on idle.)

So far docker mac 4.16.2 has survived a afternoon of dev, overnight and a morning of dev without crashing which it had not previously 😌

"enabled": true is a default setting for me after hitting reset to factory defaults

bkielbasa commented 1 year ago

I have a similar problem, here's my diag ID: FF8E0C25-ACCA-41D8-86BA-C68C68AA2AFB/20230215075429

I tried downgrading, removing all docker-related files, restarting, and reinstalling but no luck. The CLI diagnostic says

Please note the following 7 warnings:

1 : The check: can a VM be started?
    Produced the following warning: vm has not started: failed to open kmsg.log: open log/vm/kmsg.log: no such file or directory

The Docker engine runs inside a Linux VM. Therefore we must be able to start Virtual Machines.

2 : The check: is the LinuxKit VM running?
    Produced the following warning: vm is not running: failed to open kmsg.log: open log/vm/kmsg.log: no such file or directory

The Docker engine runs inside a Linux VM. Therefore the VM must be running.

3 : The check: are the LinuxKit services running?
    Produced the following warning: failed to ping VM diagnosticsd with error: Get "http://ipc/ping": dial unix diagnosticd.sock: connect: no such file or directory

The Docker engine runs inside a Linux VM as a service. Therefore the services must have started.

4 : The check: is the Docker engine running?
    Produced the following warning: Get "http://ipc/docker": dial unix lifecycle-server.sock: connect: no such file or directory

The Docker engine manages all containers and images on the host. Check the dockerd.log to see why it failed to start.

5 : The check: are the binary symlinks installed?
    Produced the following warning: looking for /usr/local/bin/docker: lstat /usr/local/bin/docker: no such file or directory

The symlinks to the docker CLI etc are needed for docker commands to work.

6 : The check: does the Docker API work?
    Produced the following warning: Cannot connect to the Docker daemon at unix://docker.raw.sock. Is the docker daemon running?

If the Docker API is not available from the host then Docker Desktop will not work correctly.

7 : The check: do Docker networks overlap with host IPs?
    Produced the following warning: Cannot connect to the Docker daemon at unix://docker.raw.sock. Is the docker daemon running?

If the subnet used by a Docker network overlaps with an IP used by the host, then containers
won't be able to contact the overlapping IP addresses.

Try configuring the IP address range used by networks: in your docker-compose.yml.
See https://docs.docker.com/compose/compose-file/compose-file-v2/#ipv4_address-ipv6_address

Please investigate the following 3 issues:

1 : The test: can a VM be started?
    Failed with: vm has not started: failed to open kmsg.log: open log/vm/kmsg.log: no such file or directory

The Docker engine runs inside a Linux VM. Therefore we must be able to start Virtual Machines.

2 : The test: are the binary symlinks installed?
    Failed with: looking for /usr/local/bin/docker: lstat /usr/local/bin/docker: no such file or directory

The symlinks to the docker CLI etc are needed for docker commands to work.

3 : The test: is the $PATH ok?
    Failed with: unable to find docker executable on PATH

It looks like I have problems with the VM but has no idea why. Docker just stopped working one day. I did not change anything lately.

wslawski-printify commented 1 year ago

@bkielbasa docker 4.17 works for me, so far few days in row and didn't stop, link above

bkielbasa commented 1 year ago

@wslawski-printify thanks for your message! I installed the version you suggested but didn't help

image
wslawski-printify commented 1 year ago

But did you restart your machine? I think I had to do it first, then after starting it works without any issues for me.

bkielbasa commented 1 year ago

I restarted a few times. Downgrounding to 4.12 helped, lol :)

joshriverscambia2019 commented 1 year ago

@joshriverscambia2019 @delfuego I totally understand-- if you're not blocked then feel free to stick with the released builds. The particular fix I'm thinking of will be in the 4.17 update anyway.

I suspect the "stopped" state is coming from the engine shutting down after the com.docker.vpnkit component segfaults. If so there would be evidence in the log ~/Library/Containers/com.docker.docker/Data/log/host/com.docker.backend.log.

After a week, this fix has resolved my set of issues. There may be others, but I've had a stable desktop daemon on my company VPN since updating. (Downgrading to a much lower version also seemed to work, but I love living in the future).

joshriverscambia2019 commented 1 year ago

@joshriverscambia2019 @delfuego I totally understand-- if you're not blocked then feel free to stick with the released builds. The particular fix I'm thinking of will be in the 4.17 update anyway. I suspect the "stopped" state is coming from the engine shutting down after the com.docker.vpnkit component segfaults. If so there would be evidence in the log ~/Library/Containers/com.docker.docker/Data/log/host/com.docker.backend.log.

After a week, this fix has resolved my set of issues. There may be others, but I've had a stable desktop daemon on my company VPN since updating. (Downgrading to a much lower version also seemed to work, but I love living in the future).

Sadly I spoke too soon, perhaps. After a reboot I'm just spinning waiting for the dashboard to connect. The backend log has a lot of statements like: fetching disk stats: Get "http://ipc/vm/disk-usage": dial unix lifecycle-server.sock: connect: no such file or directory

Killing the backend process and restarting seems to have restored function.