git-for-windows / git

A fork of Git containing Windows-specific patches.
http://gitforwindows.org/
Other
8.32k stars 2.52k forks source link

File replaced by other repository file does not show as changed #5132

Open Blackclaws opened 3 weeks ago

Blackclaws commented 3 weeks ago

The issue at hand is extremely puzzling to me. We're running multiple CI builds of one repository in parallel on a Windows machine.

During the CI build we are copying a different file (also part of the git repository) as a replacement over a file that is being tracked by git. When I do this the index shows no changes, even though the file contents are indeed different (verified by opening the file in a text editor).

The issue also seems to pop up when only a single CI build is being done once its been triggered once.

Multiple files are being copied the same way, but there is only one file for which this issue happens.

Running:

git reset

does not show the changes.

Running

git checkout FILE

shows that 0 paths were updated from the index.

Running

git reset --hard

does not restore the original unchanged version of the file.

Deleting the file and running the either checkout or reset --hard command restores the original version as expected.

Deleting the index and recreating it shows the file as changed

rm -r .git/index
git reset

Restarting the entire system does not show the file as changed. Changing core.fscache does nothing to mitigate this issue.

Copying the directory to a different location shows the file as changed.

I am utterly lost on how to proceed on debugging this issue further. Any help would be very appreciated.

Setup

$ git --version --build-options

** insert your machine's response here **
git version 2.46.0.windows.1
cpu: x86_64
built from commit: 2e6a859ffc0471f60f79c1256f766042b0d5d17d
sizeof-long: 4
sizeof-size_t: 8
shell-path: D:/git-sdk-64-build-installers/usr/bin/sh
feature: fsmonitor--daemon
libcurl: 8.9.0
OpenSSL: OpenSSL 3.2.2 4 Jun 2024
zlib: 1.3.1
$ cmd.exe /c ver

** insert your machine's response here **
Microsoft Windows [Version 10.0.19044.1706]
(c) Microsoft Corporation. All rights reserved.
# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
> type "$env:USERPROFILE\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

** insert your machine's response here **
Editor Option: Notepad++
Custom Editor Path:
Default Branch Option:
Path Option: Cmd
SSH Option: ExternalOpenSSH
Tortoise Option: false
CURL Option: OpenSSL
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Git Pull Behavior Option: Merge
Use Credential Manager: Enabled
Performance Tweaks FSCache: Disabled
Enable Symlinks: Disabled
Enable Pseudo Console Support: Disabled
Enable FSMonitor: Disabled

The machine in question is a CI runner that runs multiple builds in parallel on multiple copies of the same remote repository

Details

Bash

dscho commented 3 weeks ago

When Git verifies that a file is up to date, it uses a couple of indicators for that, one of them being the "modified time". If it differs from what is recorded in the Git index, Git assumes that it needs to be refreshed, i.e. re-hashed.

There are a couple of other bits and pieces of the metadata that are compared, and if all of them match, Git won't even look at the contents of the file but think that the Git index contains up to date information.

I expect that this is where things go awry for you. The mtime probably matches, as does the ctime, as does the size (I suspect, but this is an educated guess).

The only information that would help is the inode number, but Git for Windows does not use that information, for performance reasons (and nowadays also for backwards compatibility).

Does the work-around to call touch <target-file> in a Bash scriptlet after copying that file work for you?

Blackclaws commented 3 weeks ago

Ok this is very interesting. And thanks a lot for enlightening me here.

The workaround of calling touch does indeed seem to work, but the behaviour that git displays here by checking only the metadata and assuming the contents are the same is indeed problematic in this case.

Indeed the files only different by a number 1 being changed for a number 2, hence why the filesize also doesn't change. But I would still consider this a bug even if it is the end result of performance enhancements.

dscho commented 3 weeks ago

Indeed the files only different by a number 1 being changed for a number 2, hence why the filesize also doesn't change.

It looks like even the mtime and ctime are identical, which makes this a problem. Granted, mtime and ctime are stored only with a 100 nanosecond granularity in NTFS, but still: It is a very narrow use case in which these can be identical as well as the file size.

In https://github.com/git-for-windows/git/issues/3707, I detailed a way out of this fix: If we manage to get the equivalent of Linux' inode numbers, then we can discern the files even in your use case.

Blackclaws commented 3 weeks ago

Would be great if we could get a fix like that in. For now the workaround at least provides a solution, but a proper fix would definitely make sense to keep others from falling into this arguably very specific pit of failure.