docker / for-mac

Bug reports for Docker Desktop for Mac
https://www.docker.com/products/docker#/mac
2.43k stars 118 forks source link

Docker not releasing files not in use #6783

Open Smith8154 opened 1 year ago

Smith8154 commented 1 year ago

Expected behavior

Docker should release the file after it is no longer needed.

Actual behavior

Docker appears to be holding on to files even after the container accessing the file is stopped. The only way to release the file is to restart the Docker engine.

I am passing a network share through to my Plex docker container, but after a short time of the container running, I begin to see these issues on my macOS host: fts_read: Too many open files. I have increased the open file limit by following this guide. When I check the file limits using launchctl limit maxfiles, this is the output: maxfiles 524288 524288. Using Activity Monitor to check what files the Virtual Machine Service has open, this is what I see when no containers have been started:

/
/System/Library/Frameworks/Virtualization.framework/Versions/A/XPCServices/com.apple.Virtualization.VirtualMachine.xpc/Contents/MacOS/com.apple.Virtualization.VirtualMachine
/Library/Preferences/Logging/.plist-cache.fB3OjRRy
/usr/share/icu/icudt70l.dat
/private/var/db/timezone/tz/2022g.1.0/icutz/icutz44l.dat
/dev/null
/dev/null
/dev/null
/Applications/Docker.app/Contents/Resources/linuxkit/kernel
/Applications/Docker.app/Contents/Resources/linuxkit/initrd.img
->0x4756f2816747024e
->0x9366f745c4a5cead
/Users/wsmith/Library/Containers/com.docker.docker/Data/vms/0/data/Docker.raw
/Users
/Volumes
/private
/private/tmp
/private/var/folders
->0xdf417ba25c1bf191
->0xdf417ba25c1c1a31
->0xdf417ba25c1b82c1
->(none)
/Users
->(none)
/Volumes
->(none)
/private
->(none)
/private/tmp
->(none)
/private/var/folders
->0xdf417ba25c1b85e1

After starting my Plex container for a few minutes and then stopping it, I see that the Virtual Machine Service has 1,010 files opened, with all of the opened files being the Plex configuration files and media files on the network volume, despite no containers running. Below is a snippet of lines 855-906 of the open files. Again, no container are running. The only way to release the lock on these files is to restart the Docker service.

/Volumes/plex-data/Movies/Guardians of the Galaxy/Guardians of the Galaxy.mp4
/Volumes/plex-data/Movies/Ready Player One/Ready Player One.mkv
/Volumes/plex-data/Movies/Iron Man 3/Iron Man 3.mp4
/Volumes/plex-data/Movies/The Dark Knight Rises/The Dark Knight Rises.mp4
/Volumes/plex-data/Movies/Futurama Benders Game/Futurama Benders Game.mkv
/Volumes/plex-data/Movies/Now You See Me 2/Now You See Me 2.mkv
/Volumes/plex-data/Movies/Futurama Into The Wild Green Yonder/Futurama Into The Wild Green Yonder.mkv
/Volumes/plex-data/Movies/Star Wars_ Revenge of the Sith/Star Wars_ Revenge of the Sith.mp4
/Volumes/plex-data/Movies/Toy Story 3/Toy Story 3.mkv
/Volumes/plex-data/Movies/X-Men_ Days of Future Past/X-Men_ Days of Future Past.mp4
/Volumes/plex-data/Movies/X-Men_ The Last Stand/X-Men_ The Last Stand.mp4
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Scanners
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Cache/CloudAccessV2.dat
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Cache/CloudUsersV2.dat
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Cache/CloudUsersSubscriptionsV2.dat
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Plug-in Support/Data/tv.plex.agents.movie
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Plug-in Support/Data/tv.plex.agents.movie/DataItems
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Logs/PMS Plugin Logs/tv.plex.agents.movie.log.1
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Logs/PMS Plugin Logs/tv.plex.agents.movie.log.5
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Logs/PMS Plugin Logs/tv.plex.agents.movie.log.4
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Logs/PMS Plugin Logs/tv.plex.agents.movie.log.3
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Logs/PMS Plugin Logs/tv.plex.agents.movie.log.2

Information

Output of /Applications/Docker.app/Contents/MacOS/com.docker.diagnose check

[2023-03-28T21:27:09.866482000Z][com.docker.diagnose][I] set path configuration to OnHost
Starting diagnostics

[PASS] DD0027: is there available disk space on the host?
[PASS] DD0028: is there available VM disk space?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0013: is the $PATH ok?
[PASS] DD0003: is the Docker CLI working?
[PASS] DD0038: is the connection to Docker working?
[PASS] DD0014: are the backend processes running?
[PASS] DD0007: is the backend responding?
[PASS] DD0008: is the native API responding?
[PASS] DD0009: is the vpnkit API responding?
[PASS] DD0010: is the Docker API proxy responding?
[SKIP] DD0030: is the image access management authorized?
[PASS] DD0033: does the host have Internet access?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0032: do Docker networks overlap with host IPs?
segment 2023/03/28 17:27:13 ERROR: sending request - Post "https://api.segment.io/v1/batch": dial tcp [::]:443: connect: connection refused
segment 2023/03/28 17:27:13 ERROR: 1 messages dropped because they failed to be sent and the client was closed
No fatal errors detected.

Steps to reproduce the behavior

  1. Pass a volume through to a container.
  2. Start the container, and access files from the volume inside the container.
  3. Stop the container and check Activity Monitor open files for the Virtual Machine Service.
martinml commented 1 year ago

Information

I'm seeing the same behavior here.

After several days of normal dev work inside Docker, things start to fail because the host runs out of file handles. Even native macOS apps start crashing.

Comparing the output of lsof -Pn inside the Docker VM (using this) and in the macOS host, one can see there are tens of thousands of files opened in the host by the Virtual Machine Service that are not opened anymore by the Docker VM.

martinml commented 1 year ago

As a workaround, I just found that using gRPC FUSE doesn't trigger this behavior. It's only with VirtioFS when files remain open by the Virtual Machine Service.

martinml commented 1 year ago

Docker Desktop 4.21.1 (which now uses VirtioFS as default) shows the same behavior:

  1. Do some file-heavy work with Docker containers in a directory shared with the host. For example, npm install with a shared node_modules.
  2. Stop and delete the containers. docker ps -a shows 0 containers.
  3. Use Sloth to see that the Virtual Machine Service keeps a handle to every file opened in step 1, and it will until the Docker VM is restarted.
mattmacleod commented 1 year ago

This has been hitting me too with VirtioFS enabled. Minimal case for recreation just involves touching or creating loads of files:

→ lsof +c0 -n | awk '{print $1}' | sort | uniq -c | grep com.apple.Virtualization
36 com.apple.Virtualization.Virtua

→ mkdir -p testfiles && docker run -v./testfiles:/testfiles --rm -it ubuntu bash
root@8eb8a09639f2:/# seq 1 100000 | split -l 1 -a 5 -d - testfiles/file
split: testfiles/file57256: Too many open files in system
root@8eb8a09639f2:/# exit
exit

→ lsof +c0 -n | awk '{print $1}' | sort | uniq -c | grep com.apple.Virtualization
51569 com.apple.Virtualization.Virtua

This results in lots of very random broken behaviour on the host.

BHSPitMonkey commented 1 year ago

I've gotten reports that this is a problem in 4.22.1 and 4.23.0 as well.

bwalendz commented 1 year ago

Makes VirtioFS practically unusable and actually negatively impacts the host machine after some time.

ucyo commented 9 months ago

Just for those who stumple upon this: Current workaround is to switch from VirtioFS to gRPC FUSE

Screenshot 2023-12-11 at 11 34 19 AM
ryancurrah commented 7 months ago

I've submitted feedback to Apple and reported it to Apple support. https://developer.apple.com/forums/thread/741572

nem75 commented 7 months ago

Reporting in with Docker 25.0.3 on M1 Pro Mac with Sonoma 14.2.1. This is still a problem. Anything on this from the Docker Mac maintainers?

bmmass commented 7 months ago

Same issue with Docker@4.27.2

Macbook M1 Pro Sonoma 14.3.1

lawxen commented 4 months ago

I'm facing the same problem with
m1 max/ mac os 14.4.1 (23E224)
docker version 4.30.0 (149282) with VirtioFS settings

lawxen commented 2 months ago

Docker 4.32.0 on m1 max macbook pro still has this problem

chris-miaskowski commented 2 months ago

Hi, I'm on

Still having the issue. Switched to gRPC. Solves the problem but is way slower.

Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: Mac15,11
      Model Number: MRW33ZE/A
      Chip: Apple M3 Max
      Total Number of Cores: 14 (10 performance and 4 efficiency)
      Memory: 36 GB
      System Firmware Version: 10151.121.1
      OS Loader Version: 10151.121.1
ryancurrah commented 2 months ago

Theres only one way to fix this at the moment. Take your IT dollars and switch to Linux desktops, it's what we are doing. Docker on Linux does not suffer from this issue and you don't need Docker Desktop to boot! If we stop giving Docker and Apple our money they will eventually listen and fix Docker on Mac once and for all.

nem75 commented 2 months ago

Pretty sure this is not a problem of Docker but of VirtioFS. In combination with the comically low default file descriptor limit in MacOS.

If you don't want to switch your whole dev platform just because of this issue you can always up the file limit manually, e.g.

sudo launchctl limit maxfiles 65536 1048576

Been running with this for nearly half a year now without any problems.

Needs System Integrity Protection to be disabled though, so maybe not everyone's cup of tea.

To persist you can edit the values in /Library/LaunchDaemons/limit.maxfiles.plist.

You can check the currently effective limit with launchctl limit maxfiles.

Smith8154 commented 2 months ago

That is a bandaid at best, and depending on what you are running, this will only buy you a bit of time before you run into the limit once again. In my case, upping the file limit took it from breaking within 2 minutes, to breaking in about 10 minutes. Not saying it's not worth pointing out, but this really needs to be addressed by the Docker team. At this point, I have given up hope that the Docker team cares about this issue at all, considering they haven't replied to this issue since it was opened over a year ago.

nem75 commented 2 months ago

Of course it's a workaround, but it's running stable for me for months with multiple heavy yarn/npm operations in Docker volumes daily.

And of course having a real solution would be preferable. Until that happens we can give up, use bandaids or even sledgehammers (like using a whole different platform altogether). Endless possibilities. 😁

ryancurrah commented 2 months ago

It's not Dockers problem but have they been using their relationship with Apple to fix it? Has it been a topic of conversation on any of their meetings? They are charging us for this software that doesn't work well they should at least try to work with Apple to fix it.

chris-miaskowski commented 2 months ago

@bsousaa any updates on this issue?