Open DominikFickenwirth opened 2 years ago
I don't use git-store-meta
yet, but I plan to.
- Adding some kind of check inside the --store and --update procedures, whether we are in the process of merging, rebasing, or just committing.
Is there any reason not to employ the above solution? eg: is it harder to implement, harder for the user to execute and/or understand, or does said (above) solution present some other downside or undesirable outcome?
Originally, I thought that having a separate command would make it easier to use on rebases, but on second thought, I don't think that's the case. Also, there might be special cases where we don't want this mechanism to kick in (but I can't name any right now...)
Originally, I thought that having a separate command would make it easier to use on rebases, but on second thought, I don't think that's the case.
Making the user remember an additional step (manual command) in a merge-like (rebase or otherwise) procedure only for git-store-meta
and requiring them to remember to do this: that does indeed seem to be asking a lot of the user.
Also, there might be special cases where we don't want this mechanism to kick in (but I can't name any right now...)
For any special cases that arise in the the future: consider making an additional --switch
parameter to handle said special cases? This approach seems reasonable to me.
I agree with both of your statements. 2. Seems to be the reasonable approach here.
Making the user remember an additional step (manual command) in a merge-like (rebase or otherwise) procedure only for git-store-meta and requiring them to remember to do this: that does indeed seem to be asking a lot of the user.
As I understand it, git-store-meta
cannot be automated during a rebase, as a rebase commit cannot be hooked. So the user needs to remember an additional step anyways.
Do you see another way of completely automating this during a rebase?
As all the issue is about when the hook should or should not run, it should make more sense to do the checking in the hooks directly.
The proposed change is available in the devel branch. In the first commit we refactored the hooks to make them easier to be manually edited or merged with other possible hook systems. The second commit adds a check to skip the hook if a merge (normal or squashing) or revert is in progress.
Rebasing (and similarly, cherry-pick
ing and am
ing) is a different story. It's about reapplying patch(es) and does not involve committing unless the user explicitly runs git commit
rather than git rebase --continue
when resolving a conflict, or for an advanced operation like splitting or inserting commits, in which case running the hook are probably desired. As a result, we don't tend to skip for rebasing.
Also note that the user can always skip the pre-commit hook by adding --no-verify
. So we basically have two options:
--no-verify
.Although it seems mostly reasonable to skip the pre-commit hook for a merge (or revert), it still adds a memory burden to the user for when the hook will run, and we may need more evaluation about whether to implement it.
Technically it's also possible to implement a more dedicated mechanism for a manual skip, such as checking for a specific environmental variable like GIT_STORE_META_SKIP_UPDPATE
or a Git config like gsm.skipupdate
or even branch.<branch_name>.gsmskipupdate
. As we currently can't think of sufficient practical use cases, we probably won't dig into it so deeply. The user can simply customize the hooks on their own if they really need such feature anyway.
First of all, thank you for your fast response! Your changes to the hooks in the devel branch work as expected in my local environment. It may be useful to echo the reason why the hook won't be executed, e.g. "Not applying metadata after squash merge".
Rebasing (and similarly,
cherry-pick
ing andam
ing) is a different story. It's about reapplying patch(es) and does not involve committing unless the user explicitly runsgit commit
rather thangit rebase --continue
when resolving a conflict, or for an advanced operation like splitting or inserting commits, in which case running the hook are probably desired. As a result, we don't tend to skip for rebasing.
I agree. I didn't think of running git commit
during a rebase. I think this is enough for the rebase case.
However, the issue is not really about when the hook should run or not. My original post wasn't clear at all - I apologize for that. I actually want the pre-commit
hook to run during a merge. My goal is to make merging metadata as automated as possible, such that the user doesn't have to worry about metadata during every single merge.
When committing a merge, I want git-store-meta.pl --update
to result in the following behavior:
.git_store_meta
.The user could then do something like:
git switch our-branch
git merge their-branch
[resolve conflicts]
git checkout --ours -- .git_store_meta (my hack for resolving the file)
git merge --continue (or git commit)
and the pre-commit
hook would do the rest. There is probably a better way to resolve conflicts in .git_store_meta
itself. I'm still fairly new to git
and lack an overview of what's possible and what's not.
Also note that the user can always skip the pre-commit hook by adding
--no-verify
. So we basically have two options:
- Do nothing and let the pre-commit hook run as now. If the user wants to skip GSM, add
--no-verify
.- Skip the pre-commit hook during a merge. If the user wants to update GSM, run manually.
Since I do want the hook to run during merges, I would prefer option 1.
Technically it's also possible to implement a more dedicated mechanism for a manual skip, such as checking for a specific environmental variable like
GIT_STORE_META_SKIP_UPDPATE
or a Git config likegsm.skipupdate
or evenbranch.<branch_name>.gsmskipupdate
. As we currently can't think of sufficient practical use cases, we probably won't dig into it so deeply. The user can simply customize the hooks on their own if they really need such feature anyway.
I agree. That seems like overkill.
I've just stumbled across merge drivers. Those seem to be an option for resolving the .git_store_meta
file automatically. We could implement a custom merge driver through git-store-meta.pl --merge baseGSM ourGSM theirGSM prmL prmP
using a behavior like 1. to 5. above.
mtime
(or atime
) was changed both in our-branch and their-branch, we use the current system datetime. If any other fields differ, mark it as a conflict..git_store_meta
itself.mtime
, whereas files that changed during the rebase would get the current system datetime.By doing this, maybe we don't even need the changes I described for --update
- Although conflicting files resolved by "use ours/theirs" would get a new mtime
then, and the user would have to --apply
the new values manually again.
It may be useful to echo the reason why the hook won't be executed, e.g. "Not applying metadata after squash merge".
Good idea. Added.
I actually want the
pre-commit
hook to run during a merge. My goal is to make merging metadata as automated as possible, such that the user doesn't have to worry about metadata during every single merge.
I get it. However this shouldn't be the work of the pre-commit hook, which is run during the user initiated commit AFTER the merge conflicts have been manually resolved and can't get the essential clues about what the conflicts are.
As you have noticed, this is more likely the job of a merge driver. However there would be many potential algorithms the user might want, such as "take the later mtime/atime if both modified". Also I'm not the expert about diff and merge algorithm, which is a nontrivial computer science problem. We probably won't—maybe actually can't—do it until some skilled body contributes on it or we've got time to research on it.
However this shouldn't be the work of the pre-commit hook, which is run during the user initiated commit AFTER the merge conflicts have been manually resolved and can't get the essential clues about what the conflicts are.
I agree. The only thing we would need the pre-commit hook for, is if we want to store new metadata for the files manually changed by the user during the merge. Also, it's the only hook where we could apply metadata to the worktree (even though I would prefer applying metadata after the commit). But this seems like a second step. The merge driver is more important I think. Also note that we can still get information about conflicts through MERGE_HEAD.
However there would be many potential algorithms the user might want, such as \"take the later mtime/atime if both modified\".
That seems reasonable as well. At least for file dates where the user resolved a conflict manually, I would prefer the date of the merge. But that's just personal taste, I guess. Maybe make both possible via an option?
Also I'm not the expert about diff and merge algorithm, which is a nontrivial computer science problem.
I would focus on the merge driver for the .git_store_meta file right now. I think diff
ing our GSM file works well enough with the standard diff algorithm.
Unfortunately, I have never written anything in perl, but our merge algorithm for .git_store_meta shouldn't be too difficult, since lines are independent and sorted. I'm working on pseudo code right now, which I will post as soon as I'm done.
Assuming that lines inside .git_store_meta
are sorted alphabetically, we could do
git-store-meta.pl --resolve <basefile> <ourfile> <theirfile> <markerlength> <destfilename>
baselines = readLines(basefile); //read files as array of strings
ourlines = readLines(ourfile);
theirlines = readLines(theirfile);
result = empty array or list of strings;
NOW = getCurrentDateTime();
i = 0; //index for baseline
j = 0; //index for ourlines
k = 0; //index for theirlines
conflicts = false;
while (j < length(ourlines) or (k < length(theirlines)) do {
// 1. Get smallest unhandled filename
//====================================
basename = getFilename(baselines, i); //should be empty string if i = array length
ourname = getFilename(ourlines, j); //should be empty string if j = array length
theirname = getFilename(theirlines,k); //should be empty string if k = array length
filename = alphabeticallySmallest(basename, ourname, theirname); //ignores empty strings
// 2. Extract stored fields from base, ours and theirs
//=====================================================
basefields = empty;
ourfields = empty;
theirfields = empty;
if basename = filename then { basefields = getFields(baselines[i]); i = i+1; };
if ourname = filename then { ourfields = getFields(ourlines[j]); j = j+1; };
if theirname = filename then { theirfields = getFields(theirlines[k]); k = k+1; };
// 3. Handle simple cases
//========================
// 3.a. equal in our branch and their branch?
if ourfields = theirfields then {
result.add(ourfields); //"add" should ignore the value "empty"
}
// 3.b. no change in theirs => take ours
else if theirfields = basefields then {
result.add(ourfields);
}
// 3.c. no change in ours => take theirs
else if ourfields = basefields then {
result.add(theirfields);
}
// --> If not handled yet, then BOTH branches changed the metadata of filename.
// 4. SOLVE conflicts only affecting atime and/or mtime
//======================================================
else if withoutAtimeMtime(ourfields) = withoutAtimeMtime(theirfields) then {
resultfields = ourfields;
if ourfields.atime != theirfields.atime then { resultfields.atime = NOW; }; //or maximum
if ourfields.mtime != theirfields.mtime then { resultfields.mtime = NOW; }; //or maximum
result.add(resultfields);
}
// 5. MARK conflicts affecting other fields
//==========================================
else {
conflicts = true
result.add("<<<<<< HEAD"); //use <markerlength> for number of "<" characters
result.add(ourfields);
result.add("======"); //use <markerlength> again
result.add(theirfields);
result.add(">>>>>> "+ getShortSha1(MERGE_HEAD)); //use <markerlength> again
}
}
writeLines(result, ourfile); //write result back to <ourfile>
if conflicts then exit(1); else exit(0);
I hope this code is understandable. If not, feel free to ask!
||||||
markers.$GIT_DIR/config
or $HOME/.gitconfig
:
[merge "git_store_meta_merger"]
name = merge driver for metadata stored with git-store-meta.pl
driver = $GIT_DIR/hooks/git-store-meta.pl --resolve %O %A %B %L %P
recursive = ?
.gitattributes
:
/.git_store_meta merge=git_store_meta_merger
Finally we decided to skip commits during all special status as it's more consistent and error-proof (the user can easily perform an update if not run; but cannot easily revert an update if it has run unexpectedly).
My current method
During merges, I want to preserve the
mtime
for all files that only changed in one branch. In this issue I will focus on themtime
field, but this issue might affect other fields as well.Right now, I do
git checkout --ours -- .git_store_meta
during the merge. After manually completing the merge,git-store-meta.pl --update
will do the rest. This preserves the datetime of files which only changed in "our" branch, but not those of "their" branch. I elaborate below.How do you guys handle merges of file dates at the moment? Just do it manually?
Merging
Let's say, I'm merging their-branch into our-branch. When the merge succeeds automatically, everything works as expected. In this case,
.git_store_meta
got successfully merged like any other file, which is what we want. The pre-commit hook does not get called in this case, so the merged version of.git_store_meta
will be committed.Problems only arise, when the merge stops before creating the merge commit. After completing the merge manually with
git merge --continue
orgit commit
, the pre-commit hook is called. Generally, this is a good thing: Files which were changed in both branches should get the current datetime, since the merge actually changed the contents of the file.There are a few cases however, where we want to preserve the datetime of their-branch:
git checkout --theirs -- filename
.Note, that the datetime of our-branch already get preserved:
git checkout --ours -- filename
. For this to work, use "my version" above.Rebasing
Let's say, I'm rebasing branch-A on branch-B. If I understand rebasing correctly, this is kind of equivalent to the following:
git-store-meta.pl
ourselves (as rebasing does not call our hooks).Thus, rebasing internally uses the same mechanism as merging. Therefore, solving this issue for merges should solve it for rebases as well.
Solution suggestion
I see two options, how we could make this work:
--store
and--update
procedures, whether we are in the process of merging, rebasing, or just committing.