danny0838 / git-store-meta

Simple file metadata storing and applying for git.
MIT License
124 stars 19 forks source link

automatically preserve mtime during merge #36

Open DominikFickenwirth opened 2 years ago

DominikFickenwirth commented 2 years ago

My current method

During merges, I want to preserve the mtime for all files that only changed in one branch. In this issue I will focus on the mtime field, but this issue might affect other fields as well.

Right now, I do git checkout --ours -- .git_store_meta during the merge. After manually completing the merge, git-store-meta.pl --update will do the rest. This preserves the datetime of files which only changed in "our" branch, but not those of "their" branch. I elaborate below.

How do you guys handle merges of file dates at the moment? Just do it manually?

Merging

Let's say, I'm merging their-branch into our-branch. When the merge succeeds automatically, everything works as expected. In this case, .git_store_meta got successfully merged like any other file, which is what we want. The pre-commit hook does not get called in this case, so the merged version of .git_store_meta will be committed.

Problems only arise, when the merge stops before creating the merge commit. After completing the merge manually with git merge --continue or git commit, the pre-commit hook is called. Generally, this is a good thing: Files which were changed in both branches should get the current datetime, since the merge actually changed the contents of the file.

There are a few cases however, where we want to preserve the datetime of their-branch:

Note, that the datetime of our-branch already get preserved:

Rebasing

Let's say, I'm rebasing branch-A on branch-B. If I understand rebasing correctly, this is kind of equivalent to the following:

  1. create a new branch branch-A2 based on branch-B
  2. merge every commit of branch-A into branch-A2 one by one. So, having a solution to the merging problem should solve rebasing as well, even though we'd have to call git-store-meta.pl ourselves (as rebasing does not call our hooks).
  3. reset branch-A to branch-A2 and delete the latter

Thus, rebasing internally uses the same mechanism as merging. Therefore, solving this issue for merges should solve it for rebases as well.

Solution suggestion

I see two options, how we could make this work:

  1. A separate command that we need to run manually during a merge or rebase
  2. Adding some kind of check inside the --store and --update procedures, whether we are in the process of merging, rebasing, or just committing.
johnnyutahh commented 2 years ago

I don't use git-store-meta yet, but I plan to.

  1. Adding some kind of check inside the --store and --update procedures, whether we are in the process of merging, rebasing, or just committing.

Is there any reason not to employ the above solution? eg: is it harder to implement, harder for the user to execute and/or understand, or does said (above) solution present some other downside or undesirable outcome?

DominikFickenwirth commented 2 years ago

Originally, I thought that having a separate command would make it easier to use on rebases, but on second thought, I don't think that's the case. Also, there might be special cases where we don't want this mechanism to kick in (but I can't name any right now...)

johnnyutahh commented 2 years ago

Originally, I thought that having a separate command would make it easier to use on rebases, but on second thought, I don't think that's the case.

Making the user remember an additional step (manual command) in a merge-like (rebase or otherwise) procedure only for git-store-meta and requiring them to remember to do this: that does indeed seem to be asking a lot of the user.

Also, there might be special cases where we don't want this mechanism to kick in (but I can't name any right now...)

For any special cases that arise in the the future: consider making an additional --switch parameter to handle said special cases? This approach seems reasonable to me.

DominikFickenwirth commented 2 years ago

I agree with both of your statements. 2. Seems to be the reasonable approach here.

Making the user remember an additional step (manual command) in a merge-like (rebase or otherwise) procedure only for git-store-meta and requiring them to remember to do this: that does indeed seem to be asking a lot of the user.

As I understand it, git-store-meta cannot be automated during a rebase, as a rebase commit cannot be hooked. So the user needs to remember an additional step anyways. Do you see another way of completely automating this during a rebase?

danny0838 commented 2 years ago

As all the issue is about when the hook should or should not run, it should make more sense to do the checking in the hooks directly.

The proposed change is available in the devel branch. In the first commit we refactored the hooks to make them easier to be manually edited or merged with other possible hook systems. The second commit adds a check to skip the hook if a merge (normal or squashing) or revert is in progress.

Rebasing (and similarly, cherry-picking and aming) is a different story. It's about reapplying patch(es) and does not involve committing unless the user explicitly runs git commit rather than git rebase --continue when resolving a conflict, or for an advanced operation like splitting or inserting commits, in which case running the hook are probably desired. As a result, we don't tend to skip for rebasing.

Also note that the user can always skip the pre-commit hook by adding --no-verify. So we basically have two options:

  1. Do nothing and let the pre-commit hook run as now. If the user wants to skip GSM, add --no-verify.
  2. Skip the pre-commit hook during a merge. If the user wants to update GSM, run manually.

Although it seems mostly reasonable to skip the pre-commit hook for a merge (or revert), it still adds a memory burden to the user for when the hook will run, and we may need more evaluation about whether to implement it.

Technically it's also possible to implement a more dedicated mechanism for a manual skip, such as checking for a specific environmental variable like GIT_STORE_META_SKIP_UPDPATE or a Git config like gsm.skipupdate or even branch.<branch_name>.gsmskipupdate. As we currently can't think of sufficient practical use cases, we probably won't dig into it so deeply. The user can simply customize the hooks on their own if they really need such feature anyway.

DominikFickenwirth commented 2 years ago

First of all, thank you for your fast response! Your changes to the hooks in the devel branch work as expected in my local environment. It may be useful to echo the reason why the hook won't be executed, e.g. "Not applying metadata after squash merge".

Rebasing (and similarly, cherry-picking and aming) is a different story. It's about reapplying patch(es) and does not involve committing unless the user explicitly runs git commit rather than git rebase --continue when resolving a conflict, or for an advanced operation like splitting or inserting commits, in which case running the hook are probably desired. As a result, we don't tend to skip for rebasing.

I agree. I didn't think of running git commit during a rebase. I think this is enough for the rebase case.

However, the issue is not really about when the hook should run or not. My original post wasn't clear at all - I apologize for that. I actually want the pre-commit hook to run during a merge. My goal is to make merging metadata as automated as possible, such that the user doesn't have to worry about metadata during every single merge.

When committing a merge, I want git-store-meta.pl --update to result in the following behavior:

  1. Metadata for files that changed in our-branch and their-branch should be extracted from the worktree
  2. Metadata for files that changed only in our-branch, but not in their-branch, should be extracted from our-branch
  3. Metadata for files that changed only in their-branch, but not in our-branch, should be extracted from their-branch
  4. The metadata extracted in 2. and 3. should be applied to the files in the worktree
  5. The metadata extracted in 1., 2. and 3. should be stored in .git_store_meta.

The user could then do something like:

git switch our-branch
git merge their-branch
[resolve conflicts]
git checkout --ours -- .git_store_meta  (my hack for resolving the file)
git merge --continue  (or git commit)

and the pre-commit hook would do the rest. There is probably a better way to resolve conflicts in .git_store_meta itself. I'm still fairly new to git and lack an overview of what's possible and what's not.

Also note that the user can always skip the pre-commit hook by adding --no-verify. So we basically have two options:

  1. Do nothing and let the pre-commit hook run as now. If the user wants to skip GSM, add --no-verify.
  2. Skip the pre-commit hook during a merge. If the user wants to update GSM, run manually.

Since I do want the hook to run during merges, I would prefer option 1.

Technically it's also possible to implement a more dedicated mechanism for a manual skip, such as checking for a specific environmental variable like GIT_STORE_META_SKIP_UPDPATE or a Git config like gsm.skipupdate or even branch.<branch_name>.gsmskipupdate. As we currently can't think of sufficient practical use cases, we probably won't dig into it so deeply. The user can simply customize the hooks on their own if they really need such feature anyway.

I agree. That seems like overkill.

DominikFickenwirth commented 2 years ago

I've just stumbled across merge drivers. Those seem to be an option for resolving the .git_store_meta file automatically. We could implement a custom merge driver through git-store-meta.pl --merge baseGSM ourGSM theirGSM prmL prmP using a behavior like 1. to 5. above.

By doing this, maybe we don't even need the changes I described for --update - Although conflicting files resolved by "use ours/theirs" would get a new mtime then, and the user would have to --apply the new values manually again.

danny0838 commented 2 years ago

It may be useful to echo the reason why the hook won't be executed, e.g. "Not applying metadata after squash merge".

Good idea. Added.

I actually want the pre-commit hook to run during a merge. My goal is to make merging metadata as automated as possible, such that the user doesn't have to worry about metadata during every single merge.

I get it. However this shouldn't be the work of the pre-commit hook, which is run during the user initiated commit AFTER the merge conflicts have been manually resolved and can't get the essential clues about what the conflicts are.

As you have noticed, this is more likely the job of a merge driver. However there would be many potential algorithms the user might want, such as "take the later mtime/atime if both modified". Also I'm not the expert about diff and merge algorithm, which is a nontrivial computer science problem. We probably won't—maybe actually can't—do it until some skilled body contributes on it or we've got time to research on it.

DominikFickenwirth commented 2 years ago

However this shouldn't be the work of the pre-commit hook, which is run during the user initiated commit AFTER the merge conflicts have been manually resolved and can't get the essential clues about what the conflicts are.

I agree. The only thing we would need the pre-commit hook for, is if we want to store new metadata for the files manually changed by the user during the merge. Also, it's the only hook where we could apply metadata to the worktree (even though I would prefer applying metadata after the commit). But this seems like a second step. The merge driver is more important I think. Also note that we can still get information about conflicts through MERGE_HEAD.

However there would be many potential algorithms the user might want, such as \"take the later mtime/atime if both modified\".

That seems reasonable as well. At least for file dates where the user resolved a conflict manually, I would prefer the date of the merge. But that's just personal taste, I guess. Maybe make both possible via an option?

Also I'm not the expert about diff and merge algorithm, which is a nontrivial computer science problem.

I would focus on the merge driver for the .git_store_meta file right now. I think diffing our GSM file works well enough with the standard diff algorithm.

Unfortunately, I have never written anything in perl, but our merge algorithm for .git_store_meta shouldn't be too difficult, since lines are independent and sorted. I'm working on pseudo code right now, which I will post as soon as I'm done.

DominikFickenwirth commented 2 years ago

Assuming that lines inside .git_store_meta are sorted alphabetically, we could do

git-store-meta.pl --resolve <basefile> <ourfile> <theirfile> <markerlength> <destfilename>

baselines  = readLines(basefile);  //read files as array of strings
ourlines   = readLines(ourfile);
theirlines = readLines(theirfile);
result = empty array or list of strings;
NOW = getCurrentDateTime();

i = 0; //index for baseline
j = 0; //index for ourlines
k = 0; //index for theirlines

conflicts = false;
while (j < length(ourlines) or (k < length(theirlines)) do {

  // 1. Get smallest unhandled filename
  //====================================
  basename  = getFilename(baselines, i); //should be empty string if i = array length
  ourname   = getFilename(ourlines,  j); //should be empty string if j = array length
  theirname = getFilename(theirlines,k); //should be empty string if k = array length
  filename = alphabeticallySmallest(basename, ourname, theirname); //ignores empty strings

  // 2. Extract stored fields from base, ours and theirs
  //=====================================================
  basefields  = empty;
  ourfields   = empty;
  theirfields = empty;
  if basename  = filename then { basefields  = getFields(baselines[i]);  i = i+1; };
  if ourname   = filename then { ourfields   = getFields(ourlines[j]);   j = j+1; };
  if theirname = filename then { theirfields = getFields(theirlines[k]); k = k+1; };

  // 3. Handle simple cases
  //========================
  // 3.a. equal in our branch and their branch?
  if ourfields = theirfields then {
    result.add(ourfields); //"add" should ignore the value "empty"
  }
  // 3.b. no change in theirs => take ours
  else if theirfields = basefields then {
    result.add(ourfields);
  }
  // 3.c. no change in ours => take theirs
  else if ourfields = basefields then {
    result.add(theirfields);
  }

  // --> If not handled yet, then BOTH branches changed the metadata of filename.

  // 4. SOLVE conflicts only affecting atime and/or mtime
  //======================================================
  else if withoutAtimeMtime(ourfields) = withoutAtimeMtime(theirfields) then {
    resultfields = ourfields;
    if ourfields.atime != theirfields.atime then { resultfields.atime = NOW; }; //or maximum
    if ourfields.mtime != theirfields.mtime then { resultfields.mtime = NOW; }; //or maximum
    result.add(resultfields);
  }

  // 5. MARK conflicts affecting other fields
  //==========================================
  else {
    conflicts = true
    result.add("<<<<<< HEAD"); //use <markerlength> for number of "<" characters
    result.add(ourfields);
    result.add("======");      //use <markerlength> again
    result.add(theirfields);
    result.add(">>>>>> "+ getShortSha1(MERGE_HEAD)); //use <markerlength> again
  }
}
writeLines(result, ourfile); //write result back to <ourfile>
if conflicts then exit(1); else exit(0);

I hope this code is understandable. If not, feel free to ask!

Notes:

danny0838 commented 2 years ago

Finally we decided to skip commits during all special status as it's more consistent and error-proof (the user can easily perform an update if not run; but cannot easily revert an update if it has run unexpectedly).