Closed ms-ati closed 6 years ago
Forthcoming: we are going to experiment with running on regular Docker on Vagrant and report back if that suffers the same problem.
Is it possible that it has anything to do with this aspect of how APFS behaves, or any other aspect of APFS in particular?
@ms-ati We (myself and coworkers) are all running High Sierra with dinghy and we haven't seen this issue across vim, macvim, neovim, atom, or vs code. Note: we all did a clean install instead of an upgrade.
@ryansch We are seeing this universally -- every developer. We all did the upgrades of existing fairly late-model Macbook Pros, rather than clean installs. We all had our drives converted to APFS.
Can you specify: In Disk Utility, is the disk where your code lives an APFS volume?
@ms-ati Yes all of my macos volumes are inside of an APFS container.
@ryansch I'm in the same org as @ms-ati , one thing I just thought of, are your APFS volumes encrypted?
@mizhi They are!
To add a bit more mystery to this, I setup a vagrant machine with docker and ran our setup in that, and did not experience the issue described above. To make sure I wasn't going crazy, I shut that down and started up everything using dinghy and immediately started getting the file modification issue.
Going to dig some more on the dinghy side to see if I can suss out more answers (or better observations).
I haven't seen this myself, but it's almost surely something to do with https://github.com/codekitchen/fsevents_to_vm which Dinghy uses to make filesystem events work across the NFS mount. So I'm not surprised that it doesn't happen with vagrant-flavored docker, since vagrant doesn't make any attempt to make fsevents work.
...to make the above even more specific. Is it reasonable @codekitchen to hypothesize the issue is with
https://github.com/codekitchen/fsevents_to_vm/blob/master/lib/fsevents_to_vm/ssh_emit.rb#L14
...specifically, that then unfsd
is observing the touch
inside the vm, and propagating back to the APFS file system a new value? Implying a hypothesis that the change to APFS in High Sierra, in some cases, is causing an unexpected mtime to propagate back out via NFS?
That does seem the most likely culprit, though I'm not up to speed on APFS and I haven't seen this myself, yet.
A quick glance through the unfs3 source code shows it uses utime for setting timestamps, which only has second-level precision, so it doesn't look likely that fsevents_to_vm can just switch to replicating over higher-resolution timestamps as a fix.
@codekitchen I've investigated a bit and this definitely makes sense -- but it doesn't compute how this could affect some, but not all users. How is it possible that not everyone using Dinghy on APFS is experiencing this?
Also, in principle a solution to this issue should be possible and in fact not too hard, right? I ask because, I observe that the filesystem inside dinghy is ext4
, which DOES itself support nanosecond times, ultimately.
Strawman of the steps one could take here:
fsevents_to_vm
, if host is on APFS, to propagate the full sub-second timestamp (this is probably just modifying the ruby code to read the fractional timestamp, and the touch
command that we run inside the VM to write the fractional timestampunfsd
to stat and to write nanosecond timestamps where supportedOf the above 3 steps, I feel least confident about (3).
Thoughts?
cc @michaeldiscala
Yeah that's why I dug into the unfs3 source first -- patching that would be the only difficult part, though hopefully not more than a few hours work. I don't know how much work on top of that it'd be to get the patch accepted upstream, or whether they'd require parity in the Windows back-end. It does look like NFSv3 supports nanosecond-precision timestamps, so it should hopefully be possible. It's not something I'll be able to tackle myself in the near future unfortunately.
@codekitchen Ok, lines up with my thinking as well. Our first step is to see if we can isolate and identify what is different in the setups of developers who are seeing this impact vs those who are not. You and I are discussing a reasonable hypothesis for how to fix a problem, but which doesn't seem to affect everyone.
Do you have any hypothesis on how the above could possibly not impact all users?
@ms-ati @codekitchen i suggest you to just stick with standard nfs server implementation (es: https://github.com/adlogix/docker-machine-nfs) we are using that with Dinghy and it works fairly well (https://github.com/sparkfabrik/sparkdock)
@paolomainardi Thanks for those links, it looks like similar issues on High Sierra were faced in docker-machine-nfs
in https://github.com/adlogix/docker-machine-nfs/issues/79 , and they fixed it with parameters to the NFS daemon?
It was another kind of problem, mainly related to an operating system bug then fixed
One thing that still confuses me: why aren't we all seeing this bug? What's different about your workflow/environment (etc)?
@ryansch Yes! We are furiously invalidating hypotheses as quickly as we think of them. We are (un-)lucky enough to have multiple developers who see, and who are not seeing, this issue. So we are working hard to prove any hypothesis about what is different between their setups.
I am using APFS without the encryption + native nfs server and i can't face any issues here.
@ms-ati That's quite interesting that you have devs on both sides of the fence there. Feel free to ask for help testing any hypotheses that pan out!
Hey all! I wanted to chime in here with some info and reproduction as well. I similarly am on an APFS volume, with encryption resulting from an in-place upgrade to High Sierra.
When dinghy is not running, I do not see any file modified errors while saving files.
When dinghy is running, I do see file modified errors, if I am saving while typing.
This seems to lend credence to the fact that it's related to the timestamp differences between modified timestamps mentioned above. If I save a file while I am not actively typing, my editor does not prompt with any errors:
Hope that helps, happy to provide more info as needed!
Wow @seckenrode that's excellent. There's no bug quite like a race condition!
To clarify an additional thing that we're finding interesting:
The test_file.rb
is in my /Users/seckenrode/Desktop
directory. I do not have any docker containers actively running at the time of this test, and I do not have any docker volumes mounted AFAIK. The only difference between the first and second/third GIFs here is that I ran dinghy up
between the two.
I'm not sure what that means, w.r.t. the fsevents_to_vm and the unsfd server here.
Thanks!
Another clarification: I thought that I wasn't seeing the same issue as @seckenrode. But in fact I am! It's just that I wasn't typing and saving as quickly together as he was.
Steps to reproduce on my machine:
dinghy up
Cmd-S
to save, like "HelloWhat I observe is, if I type quickly the Cmd-S
, I see the same as @seckenrode gif above
It appears that the crucial reproduction step is: typing in the editor within 1 second after saving.
This appears to trigger the error in Sublime Text 3 every time. It appears the difference between our developers with and without the issue was more in their typing and saving speed and habits than in their machine configurations?
...it appears that Emacs has this worse, as by default it asks the user every time a file modification time is changed, not just when the buffer has changes that differ from disk.
This behavior can be changed by the user to do what Sublime Text 3 does (compare the buffer contents to the disk contents to avoid warning when no changes) through the use of Emacs lisp, I think: https://emacs.stackexchange.com/questions/21137/fix-changed-on-disk-reread-from-disk-when-file-has-not-changed
Nice. Right now Dinghy replicates all FS events in your home dir to the docker-machine VM, it doesn't try and work out which folders are currently mounted as volumes and only replicate those. So it makes sense that all you have to do is dinghy up
to see the issue.
I think I've covered in the FAQ why using the native NFS server isn't possible for our use cases, but I bet it works well for those who don't also need NFS for other purposes such as Vagrant shares. How much did you have to modify Dinghy to make that setup work?
Using the native NFS server might be a good way to confirm that modifying fsevents_to_vm to touch with sub-second timestamps does fix the issue, before trying to modify unfs3, assuming the native NFS server does do sub-second timestamps.
Makes a lot of sense @codekitchen. We are evaluating on our side whether we have the time and appetite to push this further, will let you know.
It seems that you are aware of the problem. Just to say that we encounter the same issue in applications like Paw or Coderunner. But everything works fine with Visual Studio Code…
If it can help we also have strange behaviour with Dropbox. Some files, like *.key (Keynote), are modified without any user actions, can not be opened by others or generates conflicted versions…
If you want more informations, just tell me.
And, last but not least, thanks for the hard work! :+1:
Regarding a path forward for fixing UNFS3 on High Sierra to support the nanosecond timestamps: I've filed an "Is this project alive?" bug with UNFS3 on SourceForge: https://sourceforge.net/p/unfs3/bugs/10/
If no response in one week, what would that mean? Well, it would mean that in addition to making and testing a High Sierra fix for UNFS3, we'd also have to become the de-facto upstream for the version published for Mac OS via Homebrew.
Please let me know ^ if you have thoughts on the above @codekitchen ?
Unfortunately, as you can see in the discussion with UNFS3 authors on SourceForge, that project appears to be effectively dead with no upstream authors to collaborate with.
Unless someone else emerges who wants to take on development of UNFS3 to the extent that they can collaborate with us in adding support for nanosecond timestamps on APFS on High Sierra, and take on responsibility for an upstream repository and Homebrew recipe, then we are probably not going to pursue this any further.
We are investigating whether Docker for Mac has improved sufficiently in recent months to be a viable replacement for Dinghy for our use cases.
Despite @ms-ati's claims that there are no upstream developers to collaborate with, I spent some time this weekend digging into this issue and came up with something. I've got a git branch of unfs3 with some code that should allow for microsecond or nanosecond resolution access/modify timestamps, on platforms where this is supported. I haven't really tested the code much yet, so beware of bugs.
Hi @derfian this is fantastic news! We're so happy to hear about this. What sort of collaboration might be most helpful to you to get your branch towards something ready for production? Is there a test suite we could adapt for it, for example?
During my lunch break, what I am trying today: Following the Homebrew instructions to make a proper pull request against Dinghy in order to use @derfian 's new branch for unfs3
.
I'll report back here with
Any thoughts?
Ok. To see my progress on testing @derfian UNFS3 changes in Dinghy thus far, please see this gist: https://gist.github.com/ms-ati/aaafdf091e1b04c4621aeec18cbfab4c
I will update as I progress.
Based on dtruss -n unfsd
output, I am seeing calls to lstat64
, fstat64
, and setattrlistat
. I am not seeing any calls to utime
, utimes
, utimesat
, nor clock_gettime
, when I edit a file and watch the dtruss output.
In order to view the local MacOS nanosecond timestamps, I did brew install coreutils
and gls --full-time
.
Here's my branch of the Dinghy Homebrew Tap where I am developing, pulling in @derfian 's branch of UNFS3, and soon to pull in my own fork of Dinghy to get the necessary support on the fsevents_to_vm
side:
https://github.com/ms-ati/homebrew-dinghy/tree/use-unfs3-branch-higher-resolution-setattr
Thoughts? Comments?
Update: It appears that the above-referenced branch of unfs3
by @derfian (Thanks! 👍) is
a. Correctly setting the nanosecond timestamps to the Mac OS server host, but b. Incorrectly reading the nanosecond timestamps from the Mac OS server host -- they appear as zeros to the client Ubuntu Linux system.
Details reported on SourceForge for UNFS3 here: https://sourceforge.net/p/unfs3/bugs/10/#f004
Hi @derfian -- just wondering if you thought you might have a chance to look at the above comments on your nanosecond-timestamps branch of UNFS3?
Have a great weekend and thanks again!
Great news everyone!
I've verified that @derfian latest code on his branch of unfs3 works, please see my latest comment on sourceforge for details and steps to reproduce correct nanosecond behavior with Dinghy.
What is left to do? Well, the correct behavior is only observed when fsevents_to_vm
is disabled. When it is running, the timestamps are set to zero nanoseconds again.
So, the next step is to get fixes into the fsevents_to_vm
stack so that nanosecond timestamps are preserved.
HUGE THANK YOU to @derfian -- do you have plans to cut a new official release of unfs3 from your branch code?
@codekitchen Is there anyone who can help investigate the fsevents_to_vm
side with me?
That's awesome. I should have time to poke at fsevents_to_vm
later next week. If anybody else wants to look before then, I think all that may need to change is the code that reads the mtime on this line. Unless I'm forgetting something, which is possible as it's been a long time.
Hi @codekitchen there's an additional wrinkle. I have a branch of fsevents_to_vm
which correctly formats the mtime
using Time#iso8601(9)
which is the right way to get nanoseconds, and on a Ubuntu container setting using touch -d
works correctly.
However, the boot2docker
host that fsevents_to_vm
is actually ssh'ing into is actually based on BusyBox, and so the touch
command is just BusyBox rather than Gnu Coreutils touch. The point of this digression is that the touch command there does NOT support nanoseconds.
One path forward that I was going to try was to have dinghy run an additional container which does contain a nanoseconds-enabled touch command, just for the purpose of replacing the ssh-to-touch with docker exec
to touch. Any thoughts on that?
Could the existing Proxy container be extended to (a) mount the NFS-share of the host home directory, and (b) support docker exec
from fsevents_to_vm
to run the touch command?
@codekitchen To restate more concisely what I was saying in previous comment. First, disable your fsevent_to_vm in .dinghy/preferences.yml
or just kill it. Then try this:
A. WORKING: Ubuntu's touch
(from gnu coreutils) accepts nanoseconds:
# Outside on host Mac OS
$ touch ~/test_file && gls --full-time ~/test_file
-rw-r--r-- 1 username staff 0 2018-05-17 10:39:00.592204127 -0400 /Users/username/test_file
$ docker run -it -v $HOME:/fstest ubuntu:16.04 bash
# Inside Ubuntu container
root@91e1e49f6bf2:/# touch -m -c -d 2018-05-17T14:41:04.730369000Z /fstest/test_file
root@91e1e49f6bf2:/# ls --full-time /fstest/test_file
-rw-r--r-- 1 501 dialout 0 2018-05-17 14:41:04.730369000 +0000 /fstest/test_file
# Outside on host Mac OS again
gls --full-time ~/test_file
-rw-r--r-- 1 username staff 0 2018-05-17 10:41:04.730369000 -0400 /Users/username/test_file
B: BROKEN: Boot2Docker's touch
(from BusyBox) rejects nanoseconds:
# Outside on host Mac OS
$ dinghy ssh
# Inside Dinghy's Boot2Docker VM
docker@dinghy:~$ touch -m -c -d 2018-05-17T14:44:12.503968000Z /Users/username/test_file
touch: invalid date '2018-05-17T14:44:12.503968000Z'
So, given that we've replaced the format_time in fsevents_to_vm
with time.utc.8601(9)
, which gives you a nanosecond representation in the right format:
In the solution space that I was referencing above, I'm aware of two main shapes to get this timestamp set correctly inside the VM to trigger the inotify events in Linux:
If we are continuing to ssh into the VM and run a command, we need something other than BusyBox's touch
. Possibly we could download a static binary here or otherwise add the command we need to the BusyBox in the VM?
Alternatively, we could use docker exec
to run a command inside a Docker container which does have the gnu coreutils version of touch. Since dinghy already runs the http proxy container, I suggested that as one alternative; however we could just as easily run a separate small container just for this purpose.
Thoughts @codekitchen and others?
Hi @codekitchen I know you're busy and hate to be a bother, but would really appreciate your thoughts on which direction above so I don't waste my time on a PR that you wouldn't accept. Make sense?
Per agreement with @derfian on the unfs3 sourceforge issue here:
I have a few portability concerns about the new code that I'd like to review before merging to master. Making a new release isn't at the top of my priority list at the moment.
No worries, I understand. In the meantime, would you be willing to commit to keeping your existing Github branch available at the same path? Could we temporarily make dinghy depend on that branch to build, to get Mac OS docker users back into good shape, and later change the dependency to be your next release?
Sure, I will do that. I don't mind having the dinghy team depend on this branch until everything else settles into place.
I will be submitting a PR to homebrew-dinghy
to officially switch the unfs3 dependency to his branch implementing nanosecond support, until such a time in the future as @derfian is able to finish testing cross-platform support and make an official release.
Hi all, I can't claim to know exactly what is causing it, but our team has identified a major issue impeding development on Dinghy since we upgraded to Mac OS X "High Sierra" recently.
When Dinghy is running, and a file in the shared folder is edited externally (e.g. in Emacs) and saved to disk. When user attempts to edit the file again, something unexpected happens:
It should be noted that the expected thing also happens: the app inside Dinghy observes the change.
We've verified this behavior occurring in all of: Emacs, Sublime Text, and Rubymine, for example. However, for Rubymine, there is a workaround by changing to nio2 file api.
Hypothesis: Is this due to an interaction between the Dinghy NFS server and the new Apple APFS file system, which has nanosecond modification times?
We've verified that
dinghy stop
allows the editors to save files unimpeded by this issue.Any ideas? Anyone else seeing this?