dusty-phillips / gitifyhg

Tools for using git as a client to mercurial repositories
GNU General Public License v3.0
62 stars 17 forks source link

follow remote renames #14

Closed jedbrown closed 11 years ago

jedbrown commented 11 years ago
git clone gitifyhg::somewhere/somerepo
cd somerepo
git remote rename origin othername

This is fixed easily enough by going into .git/hg and renaming the directory, but is there a feasible way to make it automatic?

fingolfin commented 11 years ago

That sounds tough... git basically only invokes the remote helper when doing a pull/fetch or push. So I guess gitifyhg would have to check for a changed remote name upon every run... and to do that right, it would have to track the "original" remote name somehow. This is of course doable as long as there is precisely one remote pointing to a hg repo, but is a lot harder when there are multiple... in that case, one could look at the git config to figure out the hg repo URL associated to the remote, and use that to perform the match. This would work as long as the hg repo URL is used in precisely one remote (which should usually be the case :-), and as long as the remote URL is not changed simultaneously (but if that happens, we have a problem anyway and should just clone again).

All in all, this sounds rather cumbersome and fragile, though :-/

Another approach would be to scan the reflog, which does record the name change... but the reflog could have been cleared, and the messages in it are (AFAIK -- I could be completely wrong) not meant to be parsed. So this sounds fragile, too.

So let's take a step backward to view the overall picture: Our problem is that we use the name of the remote to locate the local clone of the hg repo we use for this remote. When the remote name changes, we can't find it anymore. So let's not use the remote name to find it. Use something else instead -- e.g. the remote URL. In .git/config we find url = gitifyhg::HGURL

Now if we renamed .git/hg/origin to .git/hg/HGURL, the problem is solved without any extra effort. But the REMOTE_URL could contain unsafe letters, so perhaps we should use its SHA-1 instead. Note that I left out the "gitifyhg::" prefix on purpose. While we are at it, perhaps .git/hg should be renamed to .git/gitifyhg to avoid clashes with Felipe's remote-hg?

Some drawbacks to consider:

jedbrown commented 11 years ago

This is related to my observation that with multiple hg remotes, I'm fetching repository history multiple times. This happens when interacting with a project that uses separate dev and stable clones, or any time people submit pull requests.

What about having only one hg repo with everything from all remotes, translating to git-sha1 by hg-sha1 instead of sequence number? Then for each remote, just keep track of the the hg-sha1 that each tip and tag refers to. Does gitifyhg need to use the revlog sequence numbers directly?

dusty-phillips commented 11 years ago

I'm going to file this as 'patches welcome' as I have never needed to use git remote rename in my life. That doesn't mean it's not important, but with the other problems affecting me, it's not going to be high priority on my own list.

16 is about storing marks as sha1s, however it maps to an integer mark that git provides to the remote, not to git sha1s. I don't actually know what that integer references; I'm going to have to figure it out, though, because it's the source of some problems. ;)

fingolfin commented 11 years ago

The question of tracking sha1s instead of rev numbers has been discussed elsewhere, and is desirable, but will require some major changes; and it can easily lead to severe speed regressions (using revlogs allows for some trivial optimizations, like checking how many new commits a recent pull brought in; with sha1s, this is harder to implement efficiently).

Anyway, sharing the local hg repo for multiple remotes like that seems tricky to me for various reasons. Two questions regarding that:

jedbrown commented 11 years ago

Agreed about low-priority; it's easy to fix manually and it probably can't be fixed automatically without deeper changes.

  1. If two repos have no history in common, all commits will have distinct sha1s. Hg doesn't have "remotes", per se (just aliases for URLS, it doesn't remember what was there the last time you pulled and doesn't have a separate namespace for branches or bookmarks) but it can have two repositories without a common ancestor (http://mercurial.selenic.com/wiki/MergingUnrelatedRepositories). I don't know what "incoherent" means, but the capability is there.
  2. Hg does have an efficient way of figuring out what needs to be sent/received. I don't know the protocol, but it must exchange sha1s (since the sequence number is meaningless, even compared to the repository you originally cloned from).
fingolfin commented 11 years ago

Your answer to 2 is actually quite orthogonal to my question, it seems we are talking "past each other", as we say here :-). Of course hg can efficiently figure out what it needs to send / receive from a given remote, otherwise it would be rather unusual.

But the problem here is that we are cloning from two different "remotes" A and B (it doesn't matter if hg calls them remote or not, this is just a name). Now, if we these two remotes have completely disjoint history, then hg figures this out quite effectively. So if I ask it "what is not yet sent to B", I would assume that it will show me all local commits I made to B -- but also everything from A (i.e. the complete history of A), as that is, after all, missing from B. Now, if I directly setup my hg repo, I assume this would not be a problem -- I'd just push only those branches that belong to A to A, and the branches belonging to B to B, and that's that.... except that of course "default" will exist in both... so I have (at least) two heads on default now, one for A and one for B... so I (or rather, gitfyhg) somehow need to keep track which came from where, and also make sure not to accidentally push the wrong stuff to the wrong repo.

All in all, this sounds like a very difficult to maintain setup to me. I would argue that the small benefit this brings for some use cases is vanishingly small compared to the major effort that would be required to implement this right... Esp. since the risk if there is a mistake is that a remote repository gets messed up, in the worst case by dumping a lot of history into it that comes from a totally unrelated repo.

fingolfin commented 11 years ago

That said, I still think the approach I outlined for referencing the local clone by the SHA1 of the remote repo URL instead of the remote name would at least solve the problem of remote renames. It does not solve the "enhancement request" for reducing duplication of history data when using multiple remotes that share a lot of history, but I think it also does not hinder it. The only drawbacks I see right now are that a) it becomes a tad less easy to the (power) user to see which local clone belongs to which remote, and b) if the user manually changes the URL of a remote, the link to the local clone is lost.

Regarding a) I'd say that if you care about such things, then you should be able to compute the required sha1. Regarding b), I actually consider this a boon preventing some serious "mess up" potential. If a power user really, really wants to do that, they have to change the URL stored in the .hg/hgrc of the local cone, too, anyway, and if they do that, they can change the SHA1 string used for the dir name of the local clone, too.

But perhaps I am missing something, so please let me know if you have concerns. Otherwise, I might have a stab at just implementing this -- the "hardest" part should be to make sure that existing gitifyhg are transitioned smoothly to the new setup. What makes it slightly hard is that it should done in such a way that a ctrl-c or concurrent operations won't mess up the whole repo...

fingolfin commented 11 years ago

BTW, if one really cares about power users who want to e.g. change the hg URL of a remote, one could add a command for doing that. I.e. add a gitifyhg executable (or make the existing executable recognize under which name it was called) that allow doing just that.