docker / roadmap

Welcome to the Public Roadmap for All Things Docker! We welcome your ideas.
https://github.com/docker/roadmap/projects/1
Creative Commons Zero v1.0 Universal
1.45k stars 244 forks source link

[Docker Desktop] Improve Mac File system performance #7

Closed nebuk89 closed 3 months ago

nebuk89 commented 4 years ago

Update Feb 6, 2024 - Released as part of Docker Desktop 4.27 - https://www.docker.com/blog/announcing-synchronized-file-shares/

Update Nov 9, 2023 - As announced in June, Docker has acquired Mutagen IO, Inc.. We are hard at work integrating it into Docker Desktop and working to roll it out as part of a limited early access program.

Update: we are now looking at using GRPCFuse rather than mutagen as a simpler path for perf improvement.

Tell us about your request Integrate the mutagen pluggin within Docker Desktop to provide users with a file caching option to improve performance on modern web frameworks like PHP Symphony

Which service(s) is this request for? Desktop

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? File system performance is big issue on Mac,our goal improve web page refresh for web languages like PHP from 2.8 seconds to 0.2 seconds

Are you currently working around this issue? N/A Additional context N/A Attachments https://github.com/docker/for-mac/issues/77

airs0urce commented 3 years ago

I got 300% cpu usage all the time without reason:

IMAGE 2020-10-07 2:36:48 AM

This is v2.4.0.0.

And I don't have idea how to troubleshoot it, as containers don't eat CPU, only docker itself. MacOS restart and docker's "Clean / Purge Data" don't help. It makes docker not possible to use. I had to disable grpc-FUSE.

baohx2000 commented 3 years ago

Curious why the switch from mutagen to grpc? It sounded like mutagen was a clear winner.

metaskills commented 3 years ago

First, I've run non-trivial projects on GitHub and I appreciate everyone's time here, thank you. From the v2.3.5.0 notes it states:

Docker Desktop now uses gRPC-FUSE for file sharing by default. This has much faster file sharing and uses much less CPU than osxfs, especially when there are lots of file events on the host.

I did a very basic test and here were my numbers of a very simple Rails application with a very small JavaScript frontend, so it has both vendor and node modules. The timing was to load the rails console and results are in seconds.

Stable (osxfs) Edge2.4.1.0 (gRPC-FUSE) Edge2.3.4.0 (mutagen)
125s 123s 4s

So at this time I would like to echo @itaylor and @cweagans comments, I'm not sure why gRPC-FUSE is in the mix here. I mean, who tested the switch and made the statement "much faster file sharing" in the release notes? The consensus seems to be that gRPC-FUSE is either a flop or not implemented correctly.

There are a lot of more technical folks here than myself. But at the end of the day I just want to focus on my work and producing value for the company I work for. To do that I need a fast file system integrated into Docker for Mac or some very basic yaml/config files I can add to my project. Please. Pretty please.

coredumperror commented 3 years ago

Thank you @metaskills !! By switching to v2.3.4.0, I was able to bring my page request times down from 6+ seconds with osxfs (and 12+ with gRPC) to barely 1s with mutagen.

Why the hell the Docker devs switched to gRPC, when they had a fantastically performant mutagen-based release already, is a real mystery...

stephen-turner commented 3 years ago

Apologies, I realise I updated the tickets in the for-mac repo with our reasons for not graduating Mutagen from the Edge channel, but I never put a pointer here. The long explanation is at https://github.com/docker/for-mac/issues/1592#issuecomment-678397258, but in summary: although it improved performance for a lot of people, there were too many cases where it made performance worse, and too many issues we weren't able to solve with the cache not keeping up and leading to bugs in people's scripts. So in the end we we decided we couldn't release it out of experimental. We were not happy to make that decision, but we felt it wasn't of a quality we could stand behind.

gRCP FUSE is not faster in all cases, but it is faster in cases where the file sharing is CPU bound, and it has solved most of the cases where people were seeing 100% CPU. So overall it's a clear improvement over osxfs. We are still committed to improving file sharing speed on Mac, and we are still working on other approaches. But Mutagen, at least in the way we had implemented it, wasn't the answer.

GrahamCampbell commented 3 years ago

Why not ship all 3 drivers and let people choose?

mtibben commented 3 years ago

@stephen-turner will the osxfs or gRPC-FUSE drivers be open sourced? There are many talented developers frustrated by this issue and want to help solve it.

pikeas commented 3 years ago

Why not ship all 3 drivers and let people choose?

Strong agree! Mutagen is working well for many people, why take it away? Is there a downside to providing gRPC Fuse by default and Mutagen behind a flag with no stability guarantees?

metaskills commented 3 years ago

gRCP FUSE is not faster in all cases, but it is faster in cases where the file sharing is CPU bound

What does that mean? Asked another way, how can I get a basic Rails project integrated into the tests that the Docker for Mac team measures with some... "this is what we see" benchmarks? Maybe I can test against that? Maybe all my Rails v5, v6, simple API projects, etc are all just borked in some way 🤷‍♂️ and over years Docker for Mac has not been usable for me.. for some reason. I just feel disconnected from the statements above and Mutagen was the only thing that "just worked" for me. It was amazing where nothing was even usable prior and I'm sensing it that is not going to change and I have no way share that information technically.

jaequery commented 3 years ago

@metaskills I felt the same, it was kind of confusing and it sounded to me like the Docker team is putting a higher priority on CPU-bound use cases (Services? ML?), rather than Disk IO (web/app developers). Makes you wonder who they are really targeting with Docker for Mac. I think Docker for Mac should be just targeting developers. For production use cases, no one is even going to be running it on their Macs.

jaequery commented 3 years ago

I just hope we can at least have some sort of a build around NFS or perhaps even a comeback of Mutagen in the next couple months. Or like the other poster said, how about just giving us an option to choose a FS method?

leehambley commented 3 years ago

@pikeas to your question:

Strong agree! Mutagen is working well for many people, why take it away?

The answer is simple, it didn't work (in general) correctly, although it was very, very fast for some people. There's extensive info in the other thread, but a common issue is that mutagen itself needs to be "in sync" in order to be fast. That meant that one-shot stuff like docker exec .... <my tests, for example> was slow and often didn't correctly sync the filesystem, as Docker was exiting before Mutagen finished, or some other issue.


To those asking for Docker to ship all the filesystems and let people choose, how do you propose to do that? The Docker-Compose YAML format is shared across a dozen projects who would have to agree on the meaning of any new annotations, plus they'd have to rebuild the UI, and handle then a multiplying mess of configuration options, this isn't as simple as "mutagen fast for Node" or "osxfuse fast for python", the profile of how a certain (configuration of) a driver performns for a workload is really, really complicated.


To those asking about being "CPU bound", I think what they are referring to, is that in the absence of a "real" filesystem with real events (even between Linuxes that isn't an entirely portable concept) you have to come up with something else. MacOS and Linux both have a concept of filesystem events (inotify), but they have different trade-offs, I don't recall which way around, but one implementation has problems to observe files in newly created directories (simple race condition) and struggles with large numbers of watches (I think the limit is something like 10k before you hit warnings/issues).

In lieu of a proper "native" solution, there are a number of stand-in projects which try to emulate fielsystem events by recursively scanning, and hashing files, and emitting faux events when they notice a checksum has changed, that in turn, then is CPU intensive, because hashing thousands of files, plus, that the osxfuse filesystem implementation itself is in Go (not renowned for low latency) could be factors.


I think it's profoundly important for us to understand the three pillars of performance here.

Filesystem events are used extensively in development mode code hot reloaders in Node and Ruby, for e.g - I believe because of the typical Python and PHP models, filesystem events are less important, so maybe people from those ecosystems are more willing to compromise those points? I personally consider NFS (the "high performance" configurations most people talk about) to be broken, because you don't get filesystem events. That is, if you touch a file on the host, the container doesn't get notified.

The performance cost of syscalls in general, osxfs is something like 8-10× slower in raw syscalls (stat, open, etc) than a native filesystem. stat is how filesystems let you check if a file exists, or what the permissions are, a call in Ruby to something like File.exists?("some path") is literally a stat syscalls a few lines of code under the hood. Native Linux filesystems answer stat calls in 5-8μsec, osxfs is more like 190-130 μsec with way more deviation than the native one.

I mentioned in the other thread, but in a Rails app (< Rails 6.x) then a single require "something" will make a number of calls to open, approximately one per directory on the LOAD_PATH, which is approximately ~10+<half number of gems in the bundle> (my app has 477 paths on the load path, most open calls finally succeed after an average of ~250 look-ups), for us that means a single require can easily cost 250(directories)×0.000100(seconds), so about 25 milliseconds... for a single require (which Rails handles implicitly when constants are missing, for example). On a native filesystem that would be more like <~ 2ms.

Note, I'm still talking about the cost of opening a file, I haven't even touched on what reading and writing syscalls actually take.

I noticed on a native filesystem that I can read from a file into a buffer in about 8μsec, it doesn't matter that much how much I'm reading, I would imagine that upto a certain page size, it doesn't matter, see:

[pid   138] read(8, "\2\0\0\0P2Qe\322\362\344<q\376\0\0\330\6\0\0\0\0\0\0\274\366\36_\0\0\0\0"..., 64) = 64 <0.000013>
[pid   138] read(8, "YARB\2\0\0\0\3\0\0\0\221+\0\0\0\0\0\0\r\0\0\0+\0\0\0B\0\0\0"..., 11153) = 11153 <0.000012>
[pid   138] read(8, "\2\0\0\0P2Qe\322\362\344<q\376\0\0*\23\0\0\0\0\0\0\274\366\36_\0\0\0\0"..., 64) = 64 <0.000069>
[pid   138] read(8, "YARB\2\0\0\0\3\0\0\0\336t\0\0\0\0\0\0\35\0\0\0N\0\0\0\252\0\0\0"..., 29918) = 29918 <0.000032>
[pid   138] read(8, "\2\0\0\0P2Qe\322\362\344<q\376\0\0f\4\0\0\0\0\0\0\274\366\36_\0\0\0\0"..., 64) = 64 <0.000010>
[pid   138] read(8, "YARB\2\0\0\0\3\0\0\0006$\0\0\0\0\0\0\f\0\0\0\33\0\0\0:\0\0\0"..., 9270) = 9270 <0.000010>
[pid   138] read(8, "\2\0\0\0P2Qe\322\362\344<q\376\0\0\36\3\0\0\0\0\0\0\274\366\36_\0\0\0\0"..., 64) = 64 <0.000012>
[pid   138] read(8, "YARB\2\0\0\0\3\0\0\0z\31\0\0\0\0\0\0\t\0\0\0\23\0\0\0(\0\0\0"..., 6522) = 6522 <0.000010>

(the = 1234 shows how many bytes were read, the 8 in the beginning is the file descriptor ID, in this case those are files from the shared volume between my host (linux) and my machine. (Adding -y to strace will print file names, not FD ids)

All in all, I suspect not many of us have a very similar set-up.

If you run a MySQL instance in a container with the volume shared to your host, I would imagine you get bound on CPU (filesystem events) for things you really, really don't care about.

If your framework/language caches constants, and hot-reload files in development mode (e.g PHP) then probably you don't care about the performance of open and stat that much, just read.

Ruby's constant look-up (especially in Rails, prior to v6) spends 96% of it's time in my app, looking to open files which don't exist, and even natively on Linux<>Linux Docker, it still takes 8 seconds to boot the app.

I'm not sure how well this post contributes to the discussion, on another thread related to this I posted some hints on how people can generate their own strace reports on what their apps/frameworks are doing using the filesystem, with those things in mind, and with an appreciation for how one of the tree pillars (I hope a filesystem/docker expert would agree with me on dividing things that way) affects their experience.

I don't believe there is a good trade-off to be made in general here. NFS is screaming fast for many use-cases because it's in the Kernel, and long-time supported with a local cache (attributes, so that stat/lstat syscalls are fast, mostly), open is also REALLY fast on NFS because you're staying in the kernel. I believe the grpc system probably benefits from the http/2 transport (reuse of connections), I believe osxfs (written in Go) probably suffers from unpredictable latency, plus it's in userspace, which means expensive context switches to the kernel whenever it needs to do any work.

I've not found any info on how mutagen works, but from what little I could glean, it seemed to be more like NFS than the others; though it runs in userspace as a daemon, I believe it essentially manipulates the fielsystem in the container based on the stream of activity it is getting from the host (or alpha/beta, as they call them). So there's no shared filesystem, per-se, but the host and container get the initial state, then changes from outside applied inside. So you get filesystem events for free because something is really editing your files.

But I believe it's illogical to stop the investigation at " seconds to boot my app" (without examining the strace profiles), or discuss "CPU or I/O bound", or to make people pay the price of drivers with fielsystem events support, if they don't need them (e.g in a "real" filesystem you could turn those off it you wanted)

metaskills commented 3 years ago

I think it's profoundly important for us to understand the three pillars of performance here.

I tend to think it is more profoundly important to understand who your users are and the average use case before working on a problem. The fact this issue is on GitHub shows to me that most Mac users downright can not work with Docker at all unless they go to extreme work arounds. I could be wrong on this but I've seen this with my peers at work and on Twitter for many years. My sense is there is a fundamental disconnect when release notes say "... gRPC-FUSE... has much faster file sharing and uses much less CPU than osxfs..." and who that is supposed to help. It certainly did not help me.

The ticket states that "GRPCFuse rather than mutagen" and that was done and it does not work for me. So I want to echo the spirit and desc of this issue has no effect for me and maybe many others. I would also like to know how can I run a benchmark to back up @nebuk89 statement of "like PHP from 2.8 seconds to 0.2 seconds" or how can I add a test case of "like Rails from 120 seconds to something sensible"

leehambley commented 3 years ago

I think I'd like to work with some of you to get profiles of syscalls doing "normal stuff" with Docker, give me a :+1: on this and I'll reach you by your profile information, and see if we can equip the Docker team with arguments about which syscalls/etc we need to perform, and how.

I'd love to find time myself, to implement at least a "no frills" driver based on libfuse, and the same as a kernel module for linux then I can guess with my own cruddy skills at what the lower-bound on performance could be. I guess it's a rocky road to implement those in Docker, in any case. For both those options, there's not actually that many syscalls you have to implement, something like 15, and if you can eliminate a lot of overhead (I'm not sure what the grpc overhead is, but it's not nothing, compared to a raw socket) then I'm sure you can approach the performance of the native filesystem. Translation of fsevents in a way that make sense is also tricky, in general, but a development focused filesystem that maybe forgoes some of the usual guarantees can't be that hard to build.

The reason mutagen is so fast for people is that it's not a shared filesystem, I spent the morning reading into it, and it's basically rsync with significant smarts going on on-top, so the filesystem "shared" using muatgen is entirely local to the Docker4Mac managed VM, and changes to that directory being pushed in over a unix socket to reflect edits outside the VM to the filesystem inside the VM. With that in mind, no wonder it performs so much better, it's basically however fast the IO is on xhyve.

0x3333 commented 3 years ago

Just my 1 cent, I'm on version 2.4.0.0 (48506) and gRPC-FUSE is much worse than osxfs... don't know why, but just disabling it I saw a huge improvement, may be my setup, can't check and I doubt it will make any differences 🤣

metaskills commented 3 years ago

I'll share a sample on GitHub for a typical Rails on Lambda Docker project later today and some benchmarks. Hopefully that can help too. Thanks y'all.

metaskills commented 3 years ago

OK, I think I have my head around this. Here is what I have and am asking.

Mutagen (Good, Bad, Feature Request)

Test Project

https://github.com/customink/docker-rails-lambda

I do some open source work for Rails on AWS Lambda which also uses the lambda container for development. Here is a test project I made from our quickstart guide and the timings.

Edge2.3.4.0 (mutagen) Edge2.4.1.0 (gRPC-FUSE)
3s 34s

It would be nice if the Docker team made some headway on those numbers and/or allowed some easy way to opt into a filesystem that meet a lot of other users' expectations.

stephen-turner commented 3 years ago

@metaskills That was not the only problem with our Mutagen implementation. For one thing, as @leehambley explained (https://github.com/docker/roadmap/issues/7#issuecomment-706032439), it broke docker exec. Also, it caused problems even in large project directories, not just top level directories. Yes, we know it was faster for many people, but it also broke too many use cases. We might return to Mutagen one day, but for now we are concentrating on other approaches.

leehambley commented 3 years ago

I spent an hour and a half on the phone last night with the Mutagen author, he's hoping to provide a few easier shortcuts to using Mutagen without needing first-class support from Docker that should be a significant quality of life improvement for many of us.

I'm also looking at writing a benchmarking utility to focus on the overhead of the pure systemcalls themselves (read, and write performance once you have a file open is actually pretty OK), and also whether or not, and with what latency filesystem events actually propagate.

That would give us all more data to work with, whether using a "real" shared fielsystem or something like mutagen which is synching.

coredumperror commented 3 years ago

To those asking for Docker to ship all the filesystems and let people choose, how do you propose to do that?

How about they do that in exactly the same way that they do the gRPC-FUSE / osxfs choice? A checkbox/radio button set in the Docker Desktop preferences.

I literally never do docker run, using only docker-compose up -d and docker exec -ti IMAGE_NAME /bin/bash. So mutagen works fantastically for me, as I apparently do not do anything that is has issues with.

Thus, my situation sounds like a perfect use case for a system-level preference.

stephen-turner commented 3 years ago

We have no intention to ship several filesystems because we cannot support them: we'd just end up with three buggy ones. The grpcfuse/osxfs checkbox is only an interim measure until we're sure that grpcfuse is working well: we plan to remove osxfs completely after a transition period.

leehambley commented 3 years ago

@stephen-turner thanks for the clear statement on that. Can you say much about if/when, or why not the osxfs and/or the GRPC implementations may be open sourced? I'm sure there's enough expertise here in lower level coding that we may be able to do a better job as a community improving these filesystems than Docker can do on it's limited resources?

stephen-turner commented 3 years ago

That's a business question not a technical one, which I'm less qualified to answer. But my understanding is that we regard Docker Desktop as "value add" on top of the open source engine and CLI, so we don't have any plans to open source it.

mtibben commented 3 years ago

@stephen-turner from your docs

We plan to eventually open source all of our shared file system components. At that time, we would be very happy to collaborate with you on improving the implementation of osxfs and related software.

stephen-turner commented 3 years ago

That was written four years ago and is not the intent now. In fact, that page should be completely removed now we've moved away from osxfs.

mtibben commented 3 years ago

That's really disappointing news 😥

What changed @stephen-turner?

stephen-turner commented 3 years ago

I can't speak to what the plan was at that time as I wasn't in Docker then and the original authors of osxfs left Docker some years ago. But in any case, we are removing osxfs because of its excessive CPU consumption (#12) so it's irrelevant now.

xenoscopic commented 3 years ago

Mutagen author here. I fully support Docker's decision and direction here, but I also appreciate the near-term pain that some developers are facing during this transition. To help out, I'd like to offer a workaround to (a) allow developers continue to use Mutagen with Docker for Mac and (b) allow this thread to focus on discussion of gRPC-FUSE development.

I've created a discussion on the Mutagen issue tracker that details how to use Mutagen manually to replicate the previous Docker for Mac caching functionality. This isn't quite as elegant as Docker's UI, but it will give power users more granular control over file synchronization and allow for continued discussion/experimentation.

In any case, I'd like to emphasize that any discussion on this workaround should take place on the Mutagen issue tracker—this isn't Docker's burden (though I'm happy if Docker developers want to join that discussion!).

I'm hoping this helps all parties involved.

doublesharp commented 3 years ago

Only supporting gRPC fuse is a bit concerning for me as it currently makes docker unusable on my machine which is less than a year old - I had to go back to oxsfs, again. Are there at least plans to make sure it's usable before switching over entirely? It's not just slower, my containers won't even start with gRPC at the moment. The number of new issues in github and comments here make me think it's not an insignificant number of people who are having a worse experience.

TBH at the end of the day I don't care which filesystem is used, I just want it to work.

On Tue, Oct 13, 2020, 1:15 AM Stephen Turner notifications@github.com wrote:

That's a business question not a technical one, which I'm less qualified to answer. But my understanding is that we regard Docker Desktop as "value add" on top of the open source engine and CLI, so we don't have any plans to open source it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/docker/roadmap/issues/7#issuecomment-707573845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAELQUHWFDO7RYIIYVBZGJ3SKQEBNANCNFSM4LC5WHNQ .

stephen-turner commented 3 years ago

Thank you, @havoc-io, we appreciate that.

stephen-turner commented 3 years ago

@doublesharp We know there are some bugs with gRPC FUSE in a few cases. We believe they're rare based on the number of reports we've had, and also because the same gRPC FUSE code has been in the Windows version of Docker Desktop for nine months now, but of course that doesn't help you if you're encountering one of them. We will not remove osxfs until we've fixed them: correctness is even more important than performance.

cweagans commented 3 years ago

@stephen-turner I'd like to again reiterate that Docker's insistence on rolling a custom filesystem is really problematic here. Neither of the options that you've included in the product have been unreliable, but both osxfs and grpc-fuse/nfs have been horrendously slow.

Plain NFS (which is built in to mac os, battle tested in probably millions of configurations, and is widely understood from a performance/troubleshooting/configuration standpoint) would be a great option here. The engineering effort would be minimal (seeing as how the problems to solve are 1. updating /etc/exports to match the docker config, 2. forwarding FS events, and 3. deciding on a default behavior for the attributes) and it would be a marked improvement over the non-mutagen solutions that have already been tried. Can you please help me understand why this is not an option? Both of the built-in solutions that have come to the stable channel have been weird, proprietary things that nobody can troubleshoot or improve and many people cannot use them in a realistic day-to-day work environment (every team that I've worked on or with has had some kind of file-related workaround for Docker for Mac users. I checked in with people that are still on each of those teams and every one of them ended up using NFS or Mutagen. n=12 so take that as you will).

I was among the earliest adopters of Docker for Mac when it first came out, but it still doesn't even approach the level of performance that I could get out docker-machine with https://github.com/machine-drivers/docker-machine-driver-xhyve and https://github.com/adlogix/docker-machine-nfs back in 2016 on an even older machine (it worked great on a 2013 MBP to my recollection).

Can we please -- pretty please -- not commit Docker for Mac to another opaque, proprietary solution for another 4 years?

leehambley commented 3 years ago

Plain NFS (which is built in to mac os, battle tested in probably millions of configurations, and is widely understood from a performance/troubleshooting/configuration standpoint) would be a great option here.

Except it doesn't support filesystem events, so languages such as Ruby which relies on filesystem events and prompt sync of filesystem attributes does not work correctly, as the development mode doesn't reload code properly. For languages such as PHP and maybe even JavaScript (if not using a bundler, which watches the filesystem) maybe this is fine, but it's distinctly not fine for Ruby users, and therefore cannot be the default.

You touched on "forwarding fs events", but I've never seen a working solution, can you recommend anything where I can start researching? Half my team (of Rubyists) is severely affected by this performance, and we maintain two parallel development environments (everything in Docker, and just some things in Docker) and expect people to pick and choose whichever works best for them.

cweagans commented 3 years ago

@leehambley NFS doesn't support it natively, but that doesn't mean that FS events cannot work. Forward the FS events and decide how/if you're going to sync attributes and you're good to go. It worked just fine in docker-machine for the same kind of use case (FS watching). The teams I've worked with have primarily been PHP/JS teams with a couple of exceptions, and webpack in particular has been a pain point until we started doing FS event forwarding.

There are various solutions out there for forwarding FS events (https://github.com/mhallin/notify-forwarder, https://github.com/codekitchen/dinghy has it built in somewhere), but the premise is usually to have something watching for FS events on the host, and then mimic the corresponding event inside of the VM/container/whatever (notify-forwarder works in multiple configurations) by manually emitting events. Even if notify-forwarder wouldn't work OOTB for the docker team, the lift for building some reliable FS event forwarding system is way less than building an entire filesystem.

I'd be curious if https://github.com/codekitchen/dinghy works for you, since that's essentially the setup that I'm suggesting here: NFS with FS event forwarding. Could you give it a try and let us know if it works for your Ruby projects?

jaequery commented 3 years ago

some insight on that. dinghy worked fine for me up until after i tried out Docker for Mac back a few months ago. now i try to go back and install Dinghy, it's not working for some reason. during creation of xhyve, i get an error about obtaining ip from docker machine. when i have time, i'll try out on my other macbook.

but when it did work, i would like to elaborate and say that i had zero problems with hot-reloads for development purposes, it would pick up changes immediately. btw, i am a Ruby developer (although been moving to Node lately)

leehambley commented 3 years ago

I can speak only for my team, but our docs, guidelines, shell aliases and 3 years of muscle memory have us assuming to type $ docker-compsoe ... so anything that is not that, is going to be a non-starter for our team.

When I chatted to @havoc-io I suggested an easier way to integrate Mutagen would be actually just a shell alias that checks mutagen, and then starts docker-compose.

cweagans commented 3 years ago

@leehambley I'm not suggesting that you do anything different. Dinghy is just a different way to run a VM where Docker lives. It sets up all the shell variables and such so that docker-compose just works. I'm also not suggesting you roll it out to your team -- I'm suggesting that you try it out to see if it works for your Ruby app (as a way to validate the idea of using plain NFS with an event forwarder). Sounds like @jaequery had it working at one point, so I imagine it'll work for your app too (assuming ruby apps don't vary too much from one another).

There would be nothing stopping you from using Mutagen in lieu of NFS -- it's significantly faster by almost every measure, but the sync delay doesn't work too well for some use cases. You'd just not mount files into the directory (and use https://mutagen.io/documentation/orchestration/compose instead, presumably).

jaequery commented 3 years ago

@leehambley i'd say, at least for me, dinghy was as close to providing a native docker experience. there was no need for adding anything on top like adding, :delegated or :cached, and the GUI stuff as you see with mutagen. you'd just docker-compose up, like you normally do on native docker. similar to you, I also dislike adding anything on top of the traditional docker-compose workflow, that is why docker-sync (or some other rsync solutions) was never an option for me. mutagen was close to being a deal breaker as well due to having to add :delegated, etc, which cluttered up the docker-compose.yml files, but just learned to live with it.

my only issue with dinghy is that it creates its own separate network space, so instead of localhost, you get a separate network. I recall they were going to investigate forwarding to localhost, but that was a long time ago and they never got around it. it shouldn't be a problem for most, but it could be a problem for those who just explicitly need their containers to be on localhost for some reason. shouldn't be an issue for most and with dinghy due to their built-in proxy+dnsmasq, you can just set your project to utilize a virtual host (some-project.docker instead of ip) by setting VIRTUAL_HOST=some-project.docker in the environments. that latter is the biggest selling point of Dinghy for me, since i hate having to deal with ports all the time when you have many different projects running and you have to manage port conflicts.

i've tried all kinds of solutions priort and everything pales in compare to ease of setup dinghy had. but looks like the maintainer of the project ran out of steam, i think he was under the assumption that Docker for Mac will eventually provide equal to or better than NFS performance, judging from looking up the past Github issues.

On Wed, Oct 14, 2020 at 11:11 AM Cameron Eagans notifications@github.com wrote:

@leehambley https://github.com/leehambley I'm not suggesting that you do anything different. Dinghy is just a different way to run a VM where Docker lives. It sets up all the shell variables and such so that docker-compose just works. I'm also not suggesting you roll it out to your team -- I'm suggesting that you try it out to see if it works for your Ruby app (as a way to validate the idea of using plain NFS with an event forwarder). Sounds like @jaequery https://github.com/jaequery had it working at one point, so I imagine it'll work for your app too (assuming ruby apps don't vary too much from one another).

There would be nothing stopping you from using Mutagen in lieu of NFS -- it's significantly faster by almost every measure, but the sync delay doesn't work too well for some use cases. You'd just not mount files into the directory (and use https://mutagen.io/documentation/orchestration/compose instead, presumably).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/docker/roadmap/issues/7#issuecomment-708572924, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGB7C67Q5Q26QTPVQXP4MDSKXSV3ANCNFSM4LC5WHNQ .

-- Kind Regards, Jae

leehambley commented 3 years ago

@jaequery thanks for taking the time to change my view, when I was being outright dismissive of a technology I hadn't properly understood. That shouldn't have been your responsibility, but i'm thankful for the time you took.

I think for me, as a Linux user, my main contribution can be writing this benchmarking tool, and helping us understand the trade-offs. For Ruby particularly (on the shared filesystems) there might be a 3× speed-up on the table by changing Ruby itself to use stat or access to check for a file, rather than open, which is something like a 5 line change that I can even put behind a Ruby compile-time feature flag.

I've not found any time this week to begin to work on any kind of benchmark that we could then run on the 4/5 common configurations, but I don't expect it to be that much work, I just need to break some ground, and dust off my C skills.

jgonera commented 3 years ago

I'd like to report that gRPC FUSE is completely unusable for me too. I have a project that runs 5 containers, 3 of them have mounted volumes. It's all in TypeScript and using filesystem events (CRA, Gatsby, ts-node-dev).

When using gRPC FUSE performance is noticeably slower when restarting any of the servers but most importantly file syncing and filesystem events randomly stop working. Very often it happens after modifying one file just a few times.

My apps either stop picking up file changes or crash because somehow they get incomplete content for that modified file (and compilation fails). The only remedy seems to be restarting the container (docker-compose down && docker-compose up). Sometimes even that doesn't help and I have to restart the Docker daemon.

Let me know if I can help debug this somehow. Please don't remove osxfs until this is resolved.

I'm using maOS Catalina 10.15.7 and Docker Desktop 2.4.0.0.

stephen-turner commented 3 years ago

@jgonera Functional bugs are off topic for this roadmap item. Please file a bug report at docker/for-mac, including a diagnostics id. Thanks.

metaskills commented 3 years ago

Given the 20.10.0 beta1 release notes today, should we assume that gRPC FUSE is the direction forward?

coredumperror commented 3 years ago

Yeah... So if that's the case, my team is going to be stuck on 2.3.4.0 for the rest of time. Or until someone comes up with a Mac Docker solution that doesn't have horrible file sharing performance. gRPC FUSE is twice as slow as osxfs for us, and osxfs was already pretty freaking bad.

leehambley commented 3 years ago

Checking in to say I have made a few hours today to begin work on a filesystem benchmarking tool as promised a couple of weeks ago. I've got nothing worth sharing yet, but I have a client (in alpine Docker) and server, both parts are written in C and the server is creating about 10,000 files across something like 3..15 levels of nesting in the hierarchy. Both parts are communicating over a socket and the client is receiving file system events using fanotify.

All this means now I have all the groundwork laid, and need only to come up with a sensible way to report the results. Bootstrapping this test is done in about 1.5 seconds, and will need some rewriting of the server part for mac. (I'm on Linux, and just doing what works for me, my C is rusty, so I'll need to me on my macOS machine to write the macOS port of that, many of the GNU/BSD file-system functions are quite different).

My plan is to "race' the host and the client against each-other, by having one thread do some action on a file, and both the host (and the client via the socket to the host) will capture the file-system event they have seen, and compare timestamps to check the relative latency.

  1. Server sets up 10,000 files
  2. Server waits for client
  3. Client connects, listening to filesystem events and forwarding to server on a socket
  4. Server starts listening to filesystem events, sending to its own socket
  5. Server starts modifying files, and waiting for the responses from self and client.

Both sides are listening for filesystem events, and will timestamp the stuff they send to the socket, so we can see the relative time differences. I'm currently just researching how to do high precision clocks/etc across macOS and Linux, it's a bit of a topic, given how many microseconds a filesystem operation usually takes.

Along side this latency check for filesystem events (a check I expect NFS in the "fast" configuration to fail...) which should help people at least choose between Unison and Mutagen, I plan to also do a general exercise of the filesystem, and produce a comparison report (relative hardware, after all) of the performance of the operations on both sides.

I plan to reuse some of the fsevent archietcure for this, since the client knows what the did on the filesystem, I can repeat the same operation on the client and time it, here dealing with timers is easy, as it's not about absolute wall time comparison, but the number of cycles it takes to run a block of code.

This is all pretty experimental for me so far, but I think the overall design makes sense. I hope to post back later this week with a build I could invite people to run to get some relative results.

renanwilliam commented 3 years ago

I have updated my docker for last edge version (2.4.2) and it automatically activated gRPC FUSE option. It have fully degraded a PHP project (using laradock as base), running on macOS Catalina 10.15.7 with 4 CPUs, 6GB for memory and 120GB for disk dedicated to Docker Machine. It can't process parallel requests and the time of each request as multiplied by 3x or 4x. Deactivating gRPC FUSE option everything backs to work.

osteel commented 3 years ago

Same experience here, my PHP (Laravel) projects are noticeably slower with gRPC FUSE. Please don't make this the only option and allow your users to switch back to osxfs

doublesharp commented 3 years ago

@renanwilliam @osteel I was also having issues with 2.4.2 but seem to be having more luck with the 2.4.3 installer that was posted here https://desktop-stage.docker.com/mac/edge/49130/Docker.dmg (from https://github.com/docker/for-mac/issues/4953).

metaskills commented 3 years ago

Saw that Edge v2.5.1.0 was released a few days ago and decided to benchmark my basic Rails app (https://github.com/customink/docker-rails-lambda#benchmark) and things are getting worse again.

Edge Eeleases:

Edge2.3.4.0 (mutagen) Edge2.4.1.0 (gRPC-FUSE) Edge2.5.1.0 (gRPC-FUSE)
3s 34s 52s

So the fastest file system for Rails Projects on Mac seems to be stable release with osxfs. What's happening y'all?

Saphyel commented 3 years ago

@metaskills can you benchmark it in the new MacBook Air/MacBook Pro??

metaskills commented 3 years ago

Uh no, I do not have one. Is that play? Everyone has to buy a M1 chip to work with Docker because they won't implement an additional filesystem that is better than osxfs? Or are you just curious?