danny0838 / git-store-meta

Simple file metadata storing and applying for git.
MIT License
124 stars 19 forks source link

`gitk` shows "local uncommitted changes" after applying metadata #37

Closed DominikFickenwirth closed 2 years ago

DominikFickenwirth commented 2 years ago

After applying metadata with git-store-meta.pl --apply, the GUI tool gitk detects "local uncommitted changes" for all files with changed metadata. After reading this comment on StackOverflow, I added

git update-index --refresh

to my post-checkout and post-merge hooks, and it seems to work now. Unfortunately I don't know Git well enough to determine whether this is the right command here.

Are the hooks even the right place to do this, or should this be done inside --apply?

danny0838 commented 2 years ago

What fields do you save with git-store-meta?

DominikFickenwirth commented 2 years ago

only mtime

danny0838 commented 2 years ago

It seems that gitk is checking changed files in a lazy way that is related with the index file. Although git update-index --refresh seems to solve the issue, a further investigation is required to make sure what is the implication of refreshing the index and whether it's safe enough to do that...

DominikFickenwirth commented 2 years ago

TL;DR: We should use git add --refresh -- <filename>. It basically does the same as git update-index --refresh, but is more high level. I suggest to call it inside of GSM --apply for every file where metadata is applied.

The problem

To quickly determine which files changed (and which didn't), git stores file metadata in the cache/index. Git calls this metadata "stat information". One can view cached stat info with git ls-files --debug. It contains mtime and ctime.

When comparing the index and worktree versions of a file, git first compares stat info. Now, gitk seems to do this comparison, but lazily: If cached stat info and worktree stat info are the same, gitk assumes that the file content didn't change. Otherwise, gitk just assumes that the file content did change. So if GSM applies metadata to the worktree, gitk thinks that they are different.

In contrast to gitk, git status does the check better. If cached stat info and worktree stat info are the same, git assumes that the file content didn't change (same as gitk). Otherwise, the file contents get compared (index vs worktree). If they are different, the file appears in the output (as it should). But if they aren't different, git status now knows that the file is the same, but the stat info isn't. So, git status does a background refresh of the cached stat info for that file, in order to improve performance for future git commands. In our case, if GSM applied metadata to the worktree, doing git status refreshes the cached stat info with our new metadata. Note that as a side effect, gitk works just fine after that. Doing git status to fix gitk would be overkill, though.

The solution

In order to work around this gitk bug, we manually need to refresh the chached stat info. The right command for this should be

git add --refresh -- <filename>

As the documentation states, the refresh option "doesn't add the file(s), but only refreshes their stat() information in the index". git add --refresh has the advantage that you can specify the file(s). (Note that specifying paths with git update-index --refresh is possible, but does something different). Also, I checked the git source code, and git add --refresh, git update-index --refresh as well as git status all call the same function refresh_index. This implies that the refreshing logic is the same as described above: The stat info of a file only gets refreshed, if the file's stat infos are different, but the file's contents actually are identical (index vs worktree).

I would therefore suggest to call git add --refresh -- <filename> inside of GSM --apply for every file where metadata is applied.

danny0838 commented 2 years ago

Thank you for the information. Some further issues:

1. Have you made sure that those commands call refresh_index with same arguments?

2. What is the difference of git update-index --refresh with paths? We may want to update the index using a single call of an external command to prevent a performance issue, and git update-index --refresh seems to fit the need better as it supports --stdin and -z parameters than git add --refresh.

DominikFickenwirth commented 2 years ago
  1. The arguments are different: git add passes paths and the REFRESH_IGNORE_SKIP_WORKTREE flag. git update-index does not pass paths at all, and can pass several flags (see below).
  2. git update-index --refresh -- pathspec seems to do
    git update-index --refresh
    git add -- pathspec

refresh_index gets called...

DominikFickenwirth commented 2 years ago

Since git add --refresh uses REFRESH_IGNORE_SKIP_WORKTREE, we should probably not use that command after all, since that may mess up the index for users who use the skip-worktree flag.

That leaves us with

git update-index -q --ignore-submodules --ignore-missing --refresh

Note that --ignore-missing must be specified before --refresh, according to the docs.

danny0838 commented 2 years ago

This approach seems more reasonable. Thank you.

Which is the doc you have seen to mention that --ignore-missing must be specified before --refresh? The doc on SCM only says that --ignore-submodules must be passed before --refresh, but -q and --ignore-missing do seem to work only when passed before --refresh according to my test. Maybe we'd have to populate the issue to the Git developers.

DominikFickenwirth commented 2 years ago

Oh whoops, I meant ignore-submodules. sorry!