dusty-phillips / gitifyhg

Tools for using git as a client to mercurial repositories
GNU General Public License v3.0
62 stars 17 forks source link

Error trying to push when remote has more commits #6

Closed fingolfin closed 11 years ago

fingolfin commented 11 years ago

Suppose you are cloning a hg repository A into a git repository B. Suppose now that there were some commits on both A and B since you did "git pull" last. When you now try to push, this results in an exception (in both gitifyh an git-remote-hg). Here is a script to reproduce this:

#!/bin/sh -ex
rm -rf bug
mkdir bug
cd bug

mkdir hgrepo
cd hgrepo
hg init
echo a > a
hg add a
hg commit -m a
cd ..

git clone gitifyhg::`pwd`/hgrepo gitrepo

cd hgrepo
echo hg > b
hg add b
hg commit -m hg
cd ..

cd gitrepo
echo git > b
git add b
git commit -m git
git push

And this is the traceback of the resulting error:

Traceback (most recent call last):
  File "/sw/bin/git-remote-gitifyhg", line 9, in <module>
    load_entry_point('gitifyhg==0.6.1', 'console_scripts', 'git-remote-gitifyhg')()
  File "/Users/mhorn/Projekte/foreign/gitifyhg/gitifyhg.py", line 668, in main
Everything up-to-date
    HGRemote(*[x.decode('utf-8') for x in sys.argv[1:3]]).process()
  File "/Users/mhorn/Projekte/foreign/gitifyhg/gitifyhg.py", line 271, in process
    getattr(self, 'do_%s' % command)(parser)
  File "/Users/mhorn/Projekte/foreign/gitifyhg/gitifyhg.py", line 345, in do_export
    GitExporter(self, parser).process()
  File "/Users/mhorn/Projekte/foreign/gitifyhg/gitifyhg.py", line 550, in process
    self.repo.push(self.hgremote.peer, force=False, newbranch=new_branch)
  File "/sw/lib/python2.7/site-packages/mercurial/localrepo.py", line 1890, in push
    bool(inc))
  File "/sw/lib/python2.7/site-packages/mercurial/discovery.py", line 333, in checkheads
    raise util.Abort(error, hint=hint)
mercurial.error.Abort: push creates new remote head d3701f655e68!

To fix this, I can now "git pull", which will trigger a merge. So a simple fix would be to catch the exception, abort and tell the user that "the remote is newer, please pull & merge and then try again". Ideally the message should be close or identical to the corresponding message when a push to a git remote fails.

But this is not quite a complete solution... because I should also be able to "git fetch", which would only update "origin/branches/default", and then merge via that. But it seems that "git fetch" in this case will not do that (and I am not sure whether it is possible within the remote-helper framework).

fingolfin commented 11 years ago

See also https://github.com/felipec/git/issues/15

dusty-phillips commented 11 years ago

I just realized that I was subconsciously deliberately avoiding testing this situation. grins It's definitely a high priority to fix.

dusty-phillips commented 11 years ago

I added https://github.com/buchuki/gitifyhg/commit/b781c21a8dfb1076083911f45169bc7dd72fd7fb which should cause an error message if there are upstream commits. I'm leaving this issue open because:

fingolfin commented 11 years ago

Behavior is better now. However, it is prone to a race condition (at least if I read the code correctly): If the remote hg repository is modified after the check whether pushing is safe, but before the actual push, we still have a problem.

Now you could argue that this is unlikely to happen. But given enough use, this will happen (according to Murphy's law ;-).

An alternative approach might be to attempt to do a rollback: First, record the current state of the hg repo. Then, attempt the push. If any error occurs, rollback to the previous state (e.g. using "hg strip", though this requires the "mq" extension; hopefully not a problem). Of course, it is then also important that any changes to the marks are also undone.

Moreover, the current code seems to be flawed: A second "git push" still runs into an error. Here is a test case to reproduce this:

#!/bin/sh -ex
NAME=gitifyhg-push-conflict
rm -rf $NAME
mkdir $NAME
cd $NAME

mkdir repo_orig
cd repo_orig
hg init
echo a > a
hg add a
hg commit -m a
cd ..

git clone gitifyhg::`pwd`/repo_orig repo_clone

cd repo_orig
echo b > b
hg add b
hg commit -m b
cd ..

cd repo_clone
echo B > b
git add b
git commit -m B
git push || echo "try again"
git push
fingolfin commented 11 years ago

And yes, it is possible to signal errors to git. But this is currently badly underdocumented :-(. I will submit a pull request here which addresses this, and will also try to enhance the git remote helper docs to mention how to do this.

dusty-phillips commented 11 years ago

Thanks for the pull request, I much prefer letting git provide the error interface. I had to fix a couple tests to make it pass.

I agree that the race condition issue must be addressed. I'm not sure if judicious use of hg strip would work or not (though I have no problem with requiring gitifyhg to depend on the mq extension.) I worry about edge cases where changes have happened on multiple branches, but only one of them has failed. The only other solution I can think of right now is to make a 'backup' clone of the hg repository to revert to if anything goes wrong. I'm not sure if this is a good idea for extremely large repos. Hg is known to be fast but maybe not fast enough to do a clone on every push!

I have committed an expected failure test for the double push problem. I'm pretty sure this is happening because marks are being updated but not reverted, so perhaps it will be solved by the same patches to solve the other problem.

alexsydell commented 11 years ago

I spent a few days hacking on this last week, using the exact strategy you suggested - hg strip (don't really need the extension enabled because it ships with mercurial) as well as some hackery to remove the marks that git thinks it has exported. A rollback mechanism of sorts. I've also made some other changes that I think make git behave more correctly (for example, moving branches to heads/[branch] instead of heads/branches/[branch], giving git correct hashes in response to 'list' so it can figure out which commits were pushed or pulled, etc.). It's in a completely haphazard state with no new tests written yet and no old tests modified to pass, but feel free to poke around and see if there's anything useful: https://github.com/alexsydell/gitifyhg/tree/work_in_progress. The code is still crappy but I'm hoping to save you guys a bit of effort.

dusty-phillips commented 11 years ago

Alex: I do like some of those commits. Are you planning to take time to clean the commit history up for a merge into my master? If not, I'll probably sort through them myself sometime when I am less busy, but I wanted to make sure I wasn't duplicating effort. Either way, it'll probably be next week sometime before I can look at it.

fingolfin commented 11 years ago

I am glad somebody is working on this.

Regarding heads/branches/BRANCH vs. heads/BRANCH: I think it is important for mercurial branches to be distinguishable from bookmarks. But for that it is sufficient if the remote name uses /branches, as in remotes/origin/branches/my_branch

But it should be allowed to track such a branch with a local branch of any name. One thing this affects is how gitifyhg figures out during a push / export which mercurial branch to put each git commit on ...

alexsydell commented 11 years ago

Dusty: I am planning on it, but I expect a busy week at work so not sure when I'll be able to get to it. Let's coordinate here, but for now it's probably safe to assume I'm not working on it. I might fix bugs and push when I run into them, but a cleanup would take longer.

Max: I distinguish bookmarks by having them live in heads/bookmarks/[bookmark]. That said, I'm pretty sure bookmark support isn't complete/totally working after my changes because I was focused on getting branches right.

dusty-phillips commented 11 years ago

bookmark support wasn't complete/totally working before your changes. There is an xfail test or two in the test suite that illustrate the issues.

On naming: If I push a new git branch to mercurial, I don't want the default behavior to be "create a new named branch". It should be a conscious decision using the branches/ specifier. This is why normal git branches map to bookmark names. I don't think of the bookmarks as being pushed to the remote repository; hg supports this, but it's not the normal workcase. The purpose of bookmark support is therefore to map arbitrary git branches to what would otherwise be anonymous branches in mercurial.

Right now, gitifyhg simply strips unnamed anonymous branches. To prevent this, I intend to create bookmarks for each of those anonymous branches in the local hg clone so that git has something to track.

fingolfin commented 11 years ago

Hm, I thought about adding "bookmarks" (or rather, git refs) to track anonymous hg branches (= multiple heads of a hg branch), but I think there are some problems with that, as extra heads can come and go at will in the remote repo. In particular, how would you name and track these?

One approach would be the following: For a given hg branch, add a special ref for each head of that branch. Name these using the SHA-1 of the head commit (or perhaps the local revision integer?), so you might have names like "refs/hg/origin/heads/SHA1" or perhaps "refs/hg/origin/heads/BRANCHNAME/SHA1". When doing a pull, remove/add these to reflect the actual state in the remote repo. This is also why I would not put this under refs/heads, because these names would be constantly changing

Of course this changing of ref names all the time is not that nice. What would be alternatives? One would be to "invent" arbitrary names for these heads (head1, head2, ...). But then you still have to add/remove some of them when doing a "git pull"... you would have to track which heads get merged where or are created from scratch. Of course if a head just stays the same, you would want to keep its name. That then also means you could end up with just "head1" and "head3", after "head2" gets merged. Still, the names would be much more stable and hence "trackable", so I guess it could be OK to expose them under refs/heads -- as long as care is taken that no confusion with other things can happen. So perhaps refs/heads/branchheads/default/head1 ?

All doable, but great care would be need to avoid all kinds of nasty bugs, and confusion. So I am curious how you were planning to tackle this? Perhaps I am missing something basic, too?

PS: Is this issue is the best place to discuss these things? Perhaps it would be better to use a dedicated "github issue" for each, or even a mailing list or forum? Perhaps even the git mailing list, as there might be other people interested in this there with greater knowledge about git :-)

fingolfin commented 11 years ago

And here is another reason why I think hg branches and bookmarks should live in heads/branches/BRANCH and heads/bookmarks/BOOKMARK: If one of the two is moved to heads/BRANCH or heads/BOOKMARK, then clashes can occur, as it is perfectly valid to name a hg branch e.g. "bookmarks/bla". And conversely for branches. Yes, this is rare, but it can happen, and by Murphy's law, will happen ;).

dusty-phillips commented 11 years ago

Instead of naming the bookmark after the HEAD sha1 and making it constantly move, I was thinking to name it after the "foot"; that is, the sha1 of the first commit on the new anonymous branch. To quote felipec, "Anonymous heads are probably the most stupid idea ever;" and dealing with them is ugly, no matter what.

I don't know how to deal with the problem of branches disappearing with merges and the like; in git, the branch points at a commit that is on the merged tree. In hg, I guess the bookmark would still do so, and if you added a commit to either of them it would behave in the same way.

fingolfin commented 11 years ago

Hm, I am not quite sure I see how you want to define the "foot" in order to make effective use of it. Consider this scenario of a hg branch has two heads:

 A
/ \
B C

So the "foots" would be B and C, if I understand you correctly. Then we pull, and suddenly see this on the hg side:

 A
/ \
B C
\ /
 D

So what is the "foot" here? And then we pull again and now we see:

 A
/ \
B C
\ /
 D
/ \
E F

So we still have two heads... but which is which? The only way I see is to get rid of the old "head refs" with the "foots" B and C, and add two new ones for E and F.

Granted, this will probably not happen that often in "real" life, but it certainly will happen from time to time.

dusty-phillips commented 11 years ago

I'd see commit D as belonging on the original named branch, and B and C as "closed anonymous branches" when the merge is made. Then when the new branches occur, they would be new foots of new branches. Anonymous branches do not survive a merge in HG, either. if you see history like that you are usually seeing somebody merging back in from (usually) default because they are afraid default is diverging too much. This isn't a problem if you do proper git rebasing instead.

So an anonymous branch is only a linear set of commits on a named branch that has two heads. If they get merged, a head gets lost and everything ought to work normally.

alexsydell commented 11 years ago

On the topic of how to name heads for branches and bookmarks: I think bookmark considerations should be secondary to making branches work correctly. I do agree that using heads/BRANCH leaves us open to the possibility of someone naming their branch "bookmarks/foo" and confusing the import/export process, but the possibility is small and can be documented. However, heads/BRANCH allows branches to feel much more git-like - who ever ran something like 'git push -u origin new_feature:branches/new_feature'? I think the common case when creating a branch on the remote is 'git push -u origin new_feature', which creates the remote heads/new_feature", as opposed to the "heads/branches/new_feature" that you guys are arguing for. It goes without saying that none of this matters for local untracked branches.

As far as multiple heads in a branch, I'm debating how much effort is worth putting into this. The repos I use this for all have pretty strict single-head-per-branch policies and using 'hg push --force' is heavily discouraged. What few multiple heads end up in the repo are typically the result of a mistake and usually get merged together quickly or are forgotten from a long time ago. Is this something you guys have to deal with often?

So, my take is to get branches and tags right, and then think about the best way to deal with bookmarks and anonymous heads within that framework. Thoughts?

fingolfin commented 11 years ago

The argument "it can be documented" goes both ways. As I see it, this is a choice between "rare situation, but if it occurs, catastrophic" and "slightly less convenient but safe". I'd always prefer the latter choice.

Mercurial branches and git branches are two very different things. As such, I find it acceptable that if you want to create one of them from git, you have to watch out for some extra semantics in how refs are named... This can be documented, and if done wrong by the user (by accident / due to not having read the instructions) could easily be caught, and reacted to with an informative error message that tells the user how to do it right resp. where to read up on this.

On the other hand, if branches live under heads/BRANCHNAME, then things might be slightly (!) more "natural" for most people. But anybody who ever runs a repository with a branch named "bookmarks/FOO" will run into problems. Either we will have ignored this effect, since we consider it rare -- and in that case, that user is likely to be hit by weird behavior and even bad commits in their hg repository (involving bad mixups between branches and bookmarks), and/or gitifiyhg crashes. Alternatively, gitifyhg could simply refuse to work with such repositories (not nice, but IMHO preferable to randomly weird behavior). Or it could attempt to deal with that case after all, involving all kinds of special cases in the code that will complicate it greatly and will have the risk of introducing bugs by themselves.

Also, some projects avoid using mercurial branches for most parts, and use workflows that are based on bookmarks, or clones, instead. I think these should also be supported equally well if at all possible... Let's not fall into the trap of all those other remote-hg projects that announce "our project is the best and the final solution for everybody who wants to use hg from git", only to fail after 5 minutes in a basic test, when it turns out that the project author's did not even test feature X, because they happen to not use it... (like cloning via SSH, or cloning local repositories, or... ;-).

So, let's try to get branches, tags and bookmarks and anonymous heads right. Just because you don't use them doesn't mean that nobody does... at the very least, make sure to not "paint yourself" into a corner by following design decisions that will later on make it very hard to support those feature you consider low priority for now...

dusty-phillips commented 11 years ago

I think fingolfin has thought this through more than I have, so I don't have much to add. I can explain what I was thinking up until this point, though:

It has never been my priority to "hide" the mercurial implementation from the gitifyhg user, so I am personally not averse to forcing people to name a remote as branches/ if they want a named branch. I also do not want to have people accidentally creating a named branch on the upstream repository without actively thinking about it (by creating the branches/ prefix). This is a permanent operation, and the standard hg client makes it very clear that you are creating a new branch.

As for bookmarks, I was going with the hg-git and felipec idea that bookmarks are more like git branches than mercurial's branches are. Thus, if you push a head without the branches/ prefix, you want to create a bookmark. I would be ok with requiring a bookmarks/ prefix for bookmarks, but then what is supposed to happen if you push a branch that doesn't have one of those prefixes. Further, I can't think of any way to track an upstream branch without assigning it a bookmark.

I had originally made the assumption that bookmarks would be in the local repository only, and I could use them however I saw fit to match stuff up to git branches or commits. But as Max said, some projects use global bookmarks to name their branches, and this needs to be supported too. I don't think it's possible to have a set of local bookmarks AND a set of global ones in mercurial, so my original design may need some serious rethinking.

At any rate, yes alexsydell supporting named branches and tags are a higher priority for me and the projects I work on, but I do want the project to support more features in the future. Therefore the branches/ prefix has to stay. It does force users to think about what kind of branching mechanism is being used upstream. However, since they are interacting with an hg repository, they HAVE to think about this. gitifyhg can't automate the decision for them because it doesn't have access to all the assumptions about the underlying repository structure.

alexsydell commented 11 years ago

I completely understand your points, and they're probably right. I still have an urge to make the hg remote as transparent as possible (that is, not requiring people to think that they're adding a branch to mercurial as opposed to just adding a branch, and so on), so I might just stubbornly keep it that way in my fork to see how it works. I definitely don't know enough about hg or git to provide a solid argument for doing that, so I'll just treat it as an experiment. I'll definitely try to keep in sync with you guys though so we can work together on the main parts.

As far as "I also do not want to have people accidentally creating a named branch on the upstream repository without actively thinking about it", I don't think that's a problem. Since branches are local by default in git (if created with 'git branch' or 'git checkout -b' without any extra arguments), people already have to intentionally specify that they want a named branch created upstream. I can see how that might be a problem in a repo that mostly uses bookmarks instead of branches, but in that case you still already have to remember to create a bookmark instead of a branch in mercurial (does that make sense?).

As far as I can tell reading through the latest issues, it sounds like the current decision is to use heads/branches/BRANCH and heads/bookmarks/BOOKMARK. In that case, what happens if I try to push something directly under heads/? We should probably either have a reasonable default (the only reasonable thing I can think of is a branch, but obviously we're at disagreement over that), or error. Is there another option that I'm missing?

fingolfin commented 11 years ago

If you try to push something directly under heads, this should produce an error and be rejected.

So, to create a true remote hg branch, you must push to a (remote) branch following the naming convention, branches/NAME ; for a bookmark, bookmark/NAME, and anything else gives an error.

Of course you would be free to name your local (git) branches any way you like -- in git, the name of local and remote branches does not have to match, after all.

For this to work, however, we would have (AFAICT) look into the git config, to figure out the remote branch name associated a given local branch is tracking... this does feel rather like a hack, though.

So, it may indeed be worthwhile to not completely give up on the possibility using "NAME" directly instead of "branches/NAME", but if that is done, then we should not use "bookmarks/NAME", as that just seems to likely to cause a collision. I'd rather see something like "_bookmarks/NAME".

I think it would be best to discuss this on the git mailing list, too, to hear what people think about either approach

dusty-phillips commented 11 years ago

I have included some of alexsydell's commits into master now, including the one that fixes this issue. fingolfin, do you think it's closeable yet? I'm not certain the implementation is completely robust, but I think any further issues could be considered new bugs. If you want to continue discussing the branch naming discussion, I think issue #15 might be the place to do it.

Alexsydell: I suspect your repo's gonna get a bit messy if you try rebase onto master, sorry about that. I picked those commits that a) I had time to review, b) I felt were in a cherry-pickable state, c) I understood, and d) I liked.

alexsydell commented 11 years ago

@fingolfin Doesn't git give us the remote branch refs in fast-export? I think it does the mapping internally, so we should never be aware of what the local branches are called.

@buchuki No problem, I'll clean it up at some point. Thanks for getting those commits in, I sadly don't have much time to work on this.

jedbrown commented 11 years ago

The failing test case in 5840279e36cb328f37fb33ded509f24421db46ff is part of a very common workflow.

fingolfin commented 11 years ago

@alexsydell Unfortunately it seems that git does not do that -- it just gives us the local name of the branch when we do a git push. So if I clone a hg remote branch "hgbranch" under the local name "gitbranch", then pushing any changes under gitbranch is done by asking the remote helper to generate a fast-import stream updating the "refs/heads/gitbranch" (and not, say, "refs/hg/hgbranch" or anything else involving hgbranch). So we either must scan the git config; or switch away from using the import/export remote-helper API to using push/fetch. Because it turns out that the push remote-helper command is told both the local and the remote refs name. Perhaps we can ask the git team to extend the remote-helper protocol in such a way that "import" is also told both refs...

Or perhaps we should forget this and actually use the git branch name for deciding "hg branch" is burned into each commit we export. Hm...

In any case: This whole discussion, while very interesting, is now quite far from the bug I reported initially. Perhaps we can instead hold such discussions on a mailing list? I created https://groups.google.com/group/gitifyhg for this purpose, feel free to join :-)

dusty-phillips commented 11 years ago

Max, I've lost track, is there anything outstanding on this issue now?

fingolfin commented 11 years ago

Since we set HGPLAIN now, the fix for this works even in non-english locales. On a cursory look, it also looks to me as if the implemented solution is not prone to races as described before (where the remote repo gets modified after we pushed the local hg repo, but before we pushed that to the remote).

So I think we can close this, and handle any future issue related to this as new ones.