git-for-windows / git

A fork of Git containing Windows-specific patches.
http://gitforwindows.org/
Other
8.31k stars 2.52k forks source link

GfW mangles symlinks on UNC paths #1195

Closed corford closed 7 years ago

corford commented 7 years ago

Setup

$ git --version --build-options

git version 2.13.0.windows.1
built from commit: eba7af3dbb4c846c6303c5f64102acee696c9ab0
sizeof-long: 4
machine: x86_64
$ cmd.exe /c ver

Microsoft Windows [Version 10.0.14393]
# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

Path Option: Cmd
Plink Path: C:\Program Files (x86)\PuTTY\plink.exe
SSH Option: Plink
CURL Option: WinSSL
CRLF Option: CRLFCommitAsIs
Bash Terminal Option: MinTTY
Performance Tweaks FSCache: Enabled
Use Credential Manager: Enabled
Enable Symlinks: Enabled

Windows is running with Developer mode enabled, user account is in the Administrator group and UAC is disabled.

User account's global .gitconfig (C:\Users\username\.gitconfig) contains 'symlinks = true'

Details

Issue manifests regardless of terminal/shell used (I see it using bash.exe directly, using the git integration of vscode and using the git integration of Webstorm)

# Note: repo must contain some symlinked files and/or directories (all symlinks
# should point to other files and directories within the same repo, not to external targets).

$ C:\Program Files\Git\bin\bash.exe
$ cd /z/myrepo # [ where z is a mapped network drive to the samba share containing the repo ]
$ cd to a directory within the repo that contains either a symlinked file or a symlinked directory
$ ls -l will wrongly show the symlinks as real files and directories (which also casues git to incorrectly
mark them as changed/modified). Editing a symlink does not propagate the change to the original
target of the symlink. The symlink has in effect become its own distinct file.

# Checkout the same repository but this time to a local drive

$ C:\Program Files\Git\bin\bash.exe
$ cd /c/myrepo/directory [ containing either a symlinked file or a symlink to a directory ]
$ ls -l will correctly show symlinks pointing to targets (git does not incorrectly flag these files
as changed or modified). Editing a symlink will propagate the change to the original target file/dir.

# Note:
# The problem is with GfW (not Windows or Samba). 
# Symlinked files and directories behave correctly when browsing the network share in
# Windows Explorer. Right clicking and opening a symlink in a text editor, altering
# some text and saving it propagates the change to the target as expected.

Symlinks located on UNC paths should be treated in the same way as symlinks on local media

When working on a repository located on a UNC path (e.g. samba network share), GfW treats symlinks as copies of their target, essentially converting them in to real files.

This is the smb.conf file used on the linux guest (a Vagarant Ubuntu xenial box with private network enabled on eth1):

[global]

#### Browsing/Identification ####

workgroup = WORKGROUP
server string = %h server
dns proxy = no

#### Networking ####

interfaces = lo eth1

bind interfaces only = yes
hosts deny = all

# You may need to change the following line to correspond to your network address scheme
hosts allow = 127.0.0.1 192.168.56.0/24

#### Debugging/Accounting ####

log file = /var/log/samba/log.%m
max log size = 1000
syslog only = no
syslog = 0

#### Authentication ####

security = user
encrypt passwords = true
passdb backend = tdbsam
unix password sync = no
pam password change = no

#### Misc ####

unix extensions = no
socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192
winbind enum groups = yes
winbind enum users = yes
printcap name = /etc/printcap
load printers = no
printing = cups
printcap name = /dev/null
disable spoolss = yes
dos charset = CP850
unix charset = UTF-8

#### Share Definitions ####

[git_share]
comment = git repositories
path = /home/linus/gitrepos
valid users = linus
browsable = yes
guest ok = no
read only = no
directory mask = 0755
create mask = 0644
follow symlinks = yes
wide links = no

Samba version:

$ samba --version Version 4.3.11-Ubuntu

Create samba user by running:

$ pdbedit -a -u linus (note: the linus account must already exist in the system)

Why?

Using samba to export the repo from a guest vagrant machine is a nice way to ensure 'hot reload' works when working on frontend code (since virtualbox shared folders don't support mtime). It also makes it easy to keep file permissions correct since you can set filemode=false on the Windows host and then use git on the vagrant guest whenever you need to tweak a file's perms (e,g. make it executable). Vagrant shared folders don't let you change permissions on files or dirs after the folder is mounted in the guest.

dscho commented 7 years ago

When working on a repository located on a samba originated network share, GfW wrongly converts symlinks to real files and directories causing git to incorrectly mark them as changed/modified and making it impossible to work with the repo.

This is typically triggered by the absence of symlink support. I.e. when core.symlinks = false.

corford commented 7 years ago

Thanks @dscho but I'm not sure that's the case here since core.symlinks is set to true.

Symlinks work without issue if the repo is checked out to a local path on the Windows machine (e.g. C:\gitrepos\myrepo). They only confuse GfW when the repo is checked out on a linux guest and that location is made available to GfW via a samba share + mapped drive combo e.g. Z:\gitrepos\myrepo)

Z:\gitrepos\myrepo> git config --list
core.symlinks=true
core.autocrlf=false
core.fscache=true
color.diff=auto
color.status=auto
color.branch=auto
color.interactive=true
help.format=html
diff.astextplain.textconv=astextplain
rebase.autosquash=true
http.sslcainfo=C:/Program Files/Git/mingw64/ssl/certs/ca-bundle.crt
diff.astextplain.textconv=astextplain
filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.required=true
filter.lfs.process=git-lfs filter-process
credential.helper=manager
core.symlinks=true
core.filemode=false
user.name=Charlie
user.email=charlie@localhost
alias.co=checkout
alias.br=branch
alias.ci=commit
alias.st=status
dscho commented 7 years ago

They only confuse GfW when the repo is checked out on a linux guest and that location is made available to GfW via a samba share + mapped drive combo e.g. Z:\gitrepos\myrepo)

Maybe it is a permission issue, where somehow the permission to create symlinks does not permeate through the Samba connection? Are other symlink-creating commands working?

corford commented 7 years ago

Yes, other symlink commands performed outside of GfW work fine. For example, on the Windows host I can successfully:

1. Open Windows Explorer
2. Navigate to Z:\gitrepos\myrepo
3. Drill down further to a child directory that contains a symlink to a file
4. Right click on that symlink and open it with a text editor (e.g. Sublime)
5. Change a line within the file and save the changes
6. Return to Windows Explorer, locate the target file the symlink was pointing to in the same repo
7. Right click and open the target file with a text editor and see the changes I made in step 5 above

So, outside of GfW, symlinks are working and behaving as they should, even across a samba share. I suspect the bug is with how GfW internally handles symlinks for resources on mapped network drives.

EDIT: I just tried creating a symlink using git-bash.exe (bundled with GfW) and hit the same problem. Rather than creating a symlink, it creats a new file (containing the contents of the target). There seems to be a bug with the type of resource GfW (or the bash shell bundled with it) creates when asked to make a symbolic link to a file on a network share. Instead of creating a link, it creates a copy of the file.

cd /z/gitrepos/myrepo
ln -s README.md test.md
ls -l
-rw-r--r-- 1 Charlie 197121 24 Jun  3 23:54 README.md
-rw-r--r-- 1 Charlie 197121 25 Jun 13 23:51 test.md
corford commented 7 years ago

@dscho I finally had some time last night to try and debug a little further. Instead of samba, I setup an NFS server on a vagrant Ubuntu guest and exported a gitrepo back to the Windows 10 host. On the windows host, I mounted the NFS share and assigned it a drive letter. As with samba, everything worked fine (opening, editing, deleting, creating files). The only issue was symlinks. GfW exhibited the same bug as described above (i.e. it clobbered them in to real files).

So it looks like GfW (or the bash shell bundled with it) has a bug with how it handles symlinks on UNC paths. Basically, it doesn't and just treats them as a copy of whatever the original symlink target was.

It would be awesome if this bug could be tracked down and fixed as it's the only thing stopping (imho) a seamless Win/Linux dev setup. By keeping the source code on a linux guest vm and exporting it back to windows via a network share, you completely bypass the usual issues of file permission mismatches and lack of inode notifications for hot reload/compilation while at the same time not needing to access your editor (e.g. vscode) over a slow, flakey X tunnel. You literally get the best of both worlds.

dscho commented 7 years ago

I just tried creating a symlink using git-bash.exe (bundled with GfW) and hit the same problem. Rather than creating a symlink, it creats a new file (containing the contents of the target)

Please note that the Bash and Git have very different code paths to handle symlinks.

The best way forward may be for you to install the Git for Windows SDK and instrument the code in /usr/src/git/compat/mingw.c to see what flags are obtained by mingw_lstat() (and how the file_attr_to_st_mode() function handles them).

Of course, it is altogether possible that the has_symlinks variable is somehow forced to zero.

corford commented 7 years ago

The last time I touched C code was 17 years ago at university so instrumenting it myself is not going to be easy :)

If you can give me a hacked up version of mingw_lstat() to copy and paste over the original in /usr/src/git/compat/mingw.c I'm happy to compile, test and report back for you (I've just grabbed and setup the SDK and I should be comfortable following https://github.com/git-for-windows/git/wiki/Technical-overview to make a new build).

I guess all that might be needed is a few "print to log file (or stderr?)" lines somewhere in the function so that we can get a readout of the flags and what file_attr_to_st_mode() returns?

dscho commented 7 years ago

The last time I touched C code was 17 years ago at university

So it would not take much to brush up your C fu.

dscho commented 7 years ago

I guess all that might be needed is a few "print to log file (or stderr?)" lines somewhere in the function so that we can get a readout of the flags and what file_attr_to_st_mode() returns?

Yep, exactly. something like

error("this is my flag: %x", (int)flag);

(The error() function is defined by Git and it is shorter to type than fprintf(stderr, ...), plus: error() appends a newline itself.)

Then just call make -j5 install (using up to 5 parallel processes) and after that you can call git ....

Alternatively, you can build without installing (make -j5) and use in-place: /usr/src/git/git --exec-path=/usr/src/git ....

corford commented 7 years ago

Thanks for the hints 👍 I've successfully re-built without installing and can get debug lines printed (here's the hacked version of mingw_lstat() I used: https://gist.github.com/corford/7a9c078fff8a8e130215f2d0568c185f ).

Now I'm not sure what git command to use so it will iterate over the files and dirs in a repo and print the attributes? I tried this /usr/src/git/git --exec-path=/usr/src/git ls-files /z/myrepo/ but it only printed debug lines for .git/HEAD and nothing else...

error: getting attribs of: Z:/myrepo/.git/HEAD
error: this is my flag: 8180
error: getting attribs of: Z:/myrepo/.git/commondir
error: getting attribs of: .git/commondir
error: getting attribs of: .git/commondir
.gitconfig
.gitignore
.vscode/settings.json
README.md
...
corford commented 7 years ago

Ok got it. I used /usr/src/git/git --exec-path=/usr/src/git log myrepo/path/to/a/symlink to get the print outs.

When issued against a repo on a local disk, the attribute for a symlink that points to another directory in the repo is: a180 and the attribute for a symlink that points to another file within the repo is: a180

When issued against the same repo (but this time located on a mapped UNC path), the attribute for a symlink that points to a directory is: 4180 and the attribute for a symlink that points to a file is: 8180

corford commented 7 years ago

So it looks like either file_attr_to_st_mode() or fdata.dwFileAttributes is wrongly reporting symlinks as real files and directories when the symlink (and its target?) is located on a mapped UNC path.

dscho commented 7 years ago

When issued against a repo on a local disk, the attribute for a symlink that points to another directory in the repo is: a180 and the attribute for a symlink that points to another file within the repo is: a180

Well, 0xa180 = 0x8000 + 0x2000 + 0x100 + 0x80. Whereas 0x4180 = 0x4000 + 0x100 + 0x80, and 0x8180 = 0x8000 + 0x100 + 0x80.

So the difference is only in 0x8000, 0x4000 and 0x8000 + 0x2000.

That is funny. According to https://msdn.microsoft.com/en-us/library/windows/desktop/gg258117(v=vs.85).aspx, 0x8000 is a flag used only on ReFS, and 0x2000 means that the file is not to be indexed. The 0x4000 flag says it is encrypted...

However, the flag that is expected is 0x400 (FILE_ATTRIBUTE_REPARSE_POINT) which seems not to be present in your examples.

dscho commented 7 years ago
  error("this is my flag: %x", (int)file_attr_to_st_mode(fdata.dwFileAttributes, findbuf.dwReserved0));

Ah, I see. Your output shows the already translated attributes. Maybe you could show the untranslated attributes, i.e. the raw fdata.dwFileAttributes and findbuf.dwReserved0.

corford commented 7 years ago

Ok, here you go.

This is the output when run against the repo on a local disk (both paths are symlinks, first is pointing to a file and second is pointing to a dir):

error: getting attribs for: devops/provisioning/terraform/test/argent/common.tf
error: dwFileAttributes: 1056
error: dwReserved0: 2684354572
error: translated attribs: a180

error: getting attribs for: devops/provisioning/ansible/envs/infra/prod/group_vars/app_dbc0/00_vars
error: dwFileAttributes: 1040
error: dwReserved0: 2684354572
error: translated attribs: a180

And this is the output when run against the same symlinks (but this time the repo is on a UNC mapped drive rather than local disk):

error: getting attribs for: devops/provisioning/terraform/test/argent/common.tf
error: dwFileAttributes: 128
error: dwReserved0: 0
error: translated attribs: 8180

error: getting attribs for: devops/provisioning/ansible/envs/infra/prod/group_vars/app_dbc0/00_vars
error: dwFileAttributes: 16
error: dwReserved0: 0
error: translated attribs: 4180

Gist of updated hacked mingw.c I used to produce the above: https://gist.github.com/corford/5d67e808ec11e1fab44dd0796e12eb60 (not sure if %lu was the correct formatting choice for printing dwFileAttributes and dwReserved0?)

dscho commented 7 years ago
error: getting attribs for: devops/provisioning/terraform/test/argent/common.tf
error: dwFileAttributes: 1056

As you can see for yourself by comparing the numbers to the bits listed in https://msdn.microsoft.com/en-us/library/windows/desktop/gg258117(v=vs.85).aspx, 1056 (=0x420) corresponds to FILE_ATTRIBUTE_REPARSE_POINT | FILE_ATTRIBUTE_ARCHIVE.

error: getting attribs for: devops/provisioning/ansible/envs/infra/prod/group_vars/app_dbc0/00_vars
error: dwFileAttributes: 1040

And 1040 (= 0x410) corresponds to FILE_ATTRIBUTE_REPARSE_POINT | FILE_ATTRIBUTE_DIRECTORY.

So far, so good.


error: getting attribs for: devops/provisioning/terraform/test/argent/common.tf
error: dwFileAttributes: 128
[...]
error: getting attribs for: devops/provisioning/ansible/envs/infra/prod/group_vars/app_dbc0/00_vars
error: dwFileAttributes: 16

And here the trouble starts: 128 (=0x80) is FILE_ATTRIBUTE_NORMAL and 16 (=0x10) is FILE_ATTRIBUTE_DIRECTORY.

That means that when Git asks the Win32 API to get the file attributes, it does not receive any information at all that the reported file/directory is, in fact, a symlink!

Obviously, there is nothing Git can do to second-guess those answers.

I could imagine that Samba mistakes your "follow symlinks = yes" to imply that you want symbolic links to be transparently resolved on the server side already.

But then, even studying Samba resources for over one hour (which I hope you did, too, as it is your desire to export symlinks via Samba to Windows clients) did not offer any insights how Samba wants to be configured to represent symbolic links as reparse points.

However, when i look at the source code of Samba with git grep (or using GitHub's search functionality: https://github.com/samba-team/samba/search?utf8=%E2%9C%93&q=file_attribute_reparse_point&type=), I see no hint at any such support.

A web search turned up this link where somebody offered a patch to implement the support to represent symlinks as reparse points: http://samba.2283325.n4.nabble.com/PATCH-smbd-add-reparse-points-at-symlinks-td4695178.html. But nothing seems to have come of it.

Please note that I used open resources to figure all of this out, no magic required (not even house-elf magic).

Also, at this point I had to realize that this ticket is not about Git at all, but about Samba. Much as I like to help Open Source, there is a lot more that I have to help the Open Source project Git, so I cannot afford to spend more time on this ticket. But then, it is not even my responsibility: you're a grown-up, too :smile:

corford commented 7 years ago

Thanks for spending some time on this (and yes, believe me I have spent countless hours researching it myself too).

My goal with the issue was to try and determine where the bug lay (i.e. with git for windows or somewhere else). Sadly I'm not sure we have conclusively figured this out.

As mentioned before, GfW exhibits the same issue with NFS shares i.e. it is nothing to do with samba.

Also, in both cases (samba share or NFS share), Windows itself has no issue understanding the symlinks. If I navigate to the repo (via a UNC path) using Windows File Explorer, open a symlink, edit the contents and save it, the target of the symlink is correctly updated. This simply does not happen with G4W (since it treats the symlink as a real file and so any changes made do not propagate to the target of the symlink).

It's very hard to figure out where the problem lies. If it was the Win32 API, one would imagine Windows itself would also have the same problem with symlinks on UNC paths as GfW does. Yet this is not the case.

corford commented 7 years ago

I will continue looking for an answer and if I find anything relevant will come back to this ticket and share it. Your help with debugging mingw.c and eliminating that as the cause has been much appreciated :)

corford commented 7 years ago

Ok, borrowing some inspiration from https://blogs.windows.com/buildingapps/2016/12/02/symlinks-windows-10/ (and factoring in what has been confirmed with the mingw.c testing) it seems the issue is definitely not with GfW.

Opening an unprivileged windows command prompt and issuing dir in a directory of the repo containing a symlink correctly identifies it as such when the repo is on a local drive.

Doing the same (but this time in the repo located on a UNC path) wrongly shows the symlink as a normal file.

So the attribs returned by the Win32 API are consistent with what we saw with GfW.

Despite this, opening and editing a symlink on a UNC path (e.g. with Notepad) correctly propagates changes to the target. Clearly Windows isn't doing this (as it thinks the symlink is a real file) so I'm back to looking at samba and NFS. If I've understood http://samba.2283325.n4.nabble.com/PATCH-smbd-add-reparse-points-at-symlinks-td4695178.html it seems samba (and I guess NFS) currently present symlinks to Windows as normal files/dirs and then handle change propagation to the targets themselves. Shame that patch never got any take up from the Samba team :(

Closing this issue. Thanks for the help (and patience!)

dscho commented 7 years ago

yes, believe me I have spent countless hours researching it myself too

Good, thanks for correcting my wrong assessment.

If I've understood http://samba.2283325.n4.nabble.com/PATCH-smbd-add-reparse-points-at-symlinks-td4695178.html it seems samba (and I guess NFS) currently present symlinks to Windows as normal files/dirs and then handle change propagation to the targets themselves.

Have you seen any information beyond that patch? I only did a quick web search (on the hunch that "reparse point symlink samba" would make for good search terms in this case)...

Shame that patch never got any take up from the Samba team :(

It looked to me as if they just lost track of it, probably due to lack of time on the contributor's side. It would seem as if Jeremy's hints how to improve the patch so it can be included are the last thing in the thread.

I could imagine that it would help Samba enormously if an enthusiastic developer with the need to see this fixed (such as yourself :smile:) would reply to Jeremy's mail, asking for assistance to improve that patch.

It can be quite intimidating to work on an unfamiliar code base, on a patch that is not your own, and my experience shows that typically the active maintainers/developers of Open Source projects are quite understanding and patient. If I were you, I'd give it a try. Who knows, maybe you can get what you want (symlinks via Samba) and get to know a few Samba experts in an exciting adventure?

corford commented 7 years ago

Have you seen any information beyond that patch?

I've done a google deep dive but haven't turned anything up beyond that mailing list discussion. It seems the issue is too esoteric to attract much interest (although I suspect there are many people out there that would benefit from real symlink support being added to Samba).

I think you're right that Jeremy lost track of it despite being enthusiastic. Samba is no different from most OS projects. They need to be ruthless on where they place their priorities and I don't blame him at all for not focusing on this.

I could imagine that it would help Samba enormously if an enthusiastic developer with the need to see this fixed

I'd love to have a crack at getting a patch accepted but the sad truth is my C skills are simply not up to the task and I haven't got enough spare time (at the moment at least) to remedy this. I would feel guilty posting to the samba list asking for a feature and not being able to meaningfully contribute to make it happen.

However, if I find enough time to improve my C fu or are am able to sponsor the dev work, I'll definitely pick this up again and see if I can get a patch accepted!