cjb / GitTorrent

A decentralization of GitHub using BitTorrent and Bitcoin
MIT License
4.75k stars 264 forks source link

support branches #34

Closed splinterofchaos closed 9 years ago

splinterofchaos commented 9 years ago

From the commit message:

git-remote-gittorrent: Call git ls-remote without the 'HEAD' argument to get all references and build a list of them. Use a different swarm for each ref and don't quit until we have them all.

gittorrentd: Parse the branch name out of references and store the sha's in userProfile[reponame][branch] instead of userProfile[reponame].master.

WIP because it sometimes hangs, get_infohash() gets called twice instead of once, might be a quirk or two, and it's not the cleanest code I've ever written. I have a lot to learn about JS, git internals, and DHT's. Still, this branch enables me to:

$ git clone git clone gittorrent://github.com/cjb/gittorrent
$ cd gittorrent
$ git fetch origin
$ git checkout origin/pulls/33

and when I have $ ./GitTorrent/gittorrent running, I can check out origin/pull/27, which I host.

cjb commented 9 years ago

@splinterofchaos Thanks, looks great! Could I persuade you to adopt "JS Standard Style" for this diff? You can read more here, and fix up everything it complains about when you run it:

https://github.com/feross/standard

(I'm using the linter-js-standard plugin for the Atom.io editor to show these while I type.)

Also I wonder if instead of this:

            for (var i = 0; i < target.length - 1; i++) {
              path += target[i] + '/'
              if (!fs.existsSync(path)) {
                fs.mkdirSync(path)
              }
            }

you could do this:

            target.forEach(function (segment) {
              path += segment + '/'
              if (!fs.existsSync(path)) {
                fs.mkdirSync(path)
              }
            }
splinterofchaos commented 9 years ago

I rebased added a few more commits. At first, this branch made git eagerly try to fetch every branch, but now, only when git tells us to fetch a branch, do we. Mainly, git will only ask for HEAD and branches in refs/heads, and only if it doesn't already have them. The userProfile became too large to upload every branch, so for now I've disabled branches not starting with refs/heads.

One can use $ git ls-remote {name} to discover all the references it has, but if the name equals gittorrent://github.com/cjb/gittorrent, many of the sha's listed may not actually be hosted on the network since we consult github, not the dht.

This is a recent session:

 tmp$ git clone gittorrent://github.com/cjb/gittorrent
Cloning into 'gittorrent'...

origin has:

Okay, we want to get HEAD: 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5

Okay, we want to get master: 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5

Adding swarm peer: 192.168.1.7:30000 for 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5

Adding swarm peer: 192.34.86.36:30000 for 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5

Downloading git pack with infohash: 5ba1e3c62379bd48b289b5251b9225323e68ed88

Receiving objects: 100% (177/177), 27.78 KiB | 0 bytes/s, done.
Resolving deltas: 100% (95/95), done.
git update-ref origin/HEAD 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5
git update-ref origin/master 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5
Checking connectivity... done.
 tmp$ cd gittorrent/
 gittorrent git:master$ git pack-refs
 gittorrent git:master$ cat .git/packed-refs 
# pack-refs with: peeled fully-peeled 
61579b5ee99d9a51ad94ec14a205d4f96cefa6b5 refs/remotes/origin/master
 gittorrent git:master$ git ls-remote origin

origin has:
61579b5ee99d9a51ad94ec14a205d4f96cefa6b5    HEAD
61579b5ee99d9a51ad94ec14a205d4f96cefa6b5    refs/heads/master
4e440776743ac216d19a7fc53c83a0681fdbf45b    refs/pull/16/head
90a17f1bb2dfa953898a9056d5778bd5ceaa08b4    refs/pull/19/head
7d324de111b7b5711e31ef73055f385434cbf513    refs/pull/26/head
ae731139b21cac80ead9dc85c63e7aa8fe2ce26b    refs/pull/26/merge
5141cf86c3202f8cdd7ccb63776c571a2a707f75    refs/pull/27/head
8af3f08f4e515df2d7cd2f3334474a6bcf583ad8    refs/pull/27/merge
aee9dd69178f6f76ecc00ba99961e31fe140d1f6    refs/pull/28/head
c5e6b9c9bd0367813c27fee5b664ed94a1adb6aa    refs/pull/28/merge
bfe8b17eec315d38070417bf45d5cbcbbb74dc18    refs/pull/29/head
ff97c6721aaa0daed208cf99013be177fb642bdb    refs/pull/30/head
7f4f50262b15569a6e60fd69daf74aefd74bd081    refs/pull/33/head
b8249848a0655491b8e21629d240f0f761c4db2d    refs/pull/33/merge
616feea3827c295a99653446b5d799d76a4cba3c    refs/pull/34/head
593acd662e28eecccc4ff2a83a26727952cb7f0e    refs/pull/34/merge
cce6953283ed2456d15ec63787259ce84a7063af    refs/pull/7/head
 gittorrent git:master$ 

To the best of my knowledge, $ git clone should produce the same tree for gittorrent repositories as other types.

splinterofchaos commented 9 years ago

Also I wonder if instead of this: [manual for loop]

:+1:

cjb commented 9 years ago

@splinterofchaos That's awesome, thanks! Do you think it's ready to merge in now?

splinterofchaos commented 9 years ago

BTW: here's the output for gittorrentd:

 src$ ./GitTorrent/gittorrentd 
in repo GitTorrent/.git/git-daemon-export-ok
GitTorrent/.git/
{"repositories":{"GitTorrent":{"HEAD":"5c16fe1bbb35fb8fdbc5f2063a9fdc5603845079"}}}
Announcing 5c16fe1bbb35fb8fdbc5f2063a9fdc5603845079 for branching on repo GitTorrent/.git/
Announcing 5141cf86c3202f8cdd7ccb63776c571a2a707f75 for glob-cwd on repo GitTorrent/.git/
Announcing 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5 for master on repo GitTorrent/.git/
{"repositories":{"GitTorrent":{"HEAD":"5c16fe1bbb35fb8fdbc5f2063a9fdc5603845079","refs/heads/branching":"5c16fe1bbb35fb8fdbc5f2063a9fdc5603845079","refs/heads/glob-cwd":"5141cf86c3202f8cdd7ccb63776c571a2a707f75","refs/heads/master":"61579b5ee99d9a51ad94ec14a205d4f96cefa6b5"}}}
errors= []
hash= e743222bc6010e5080cba6706a282e9d756977c8
errors= []
hash= e743222bc6010e5080cba6706a282e9d756977c8
Received handshake for 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5
calling git pack-objects
exited

and I removed the line "origin has:" from git clone. (redundant)

Do you think it's ready to merge in now?

I forgot I'd marked it WIP. I would like to see it merged. There are still issues, like sending userData twice as seen above, and maybe a few miscellaneous things to work on, but, being a small and newer project, I don't think that's a bad thing and I'd like to see how it acts in the wild.

Let me know if you want me to change anything first, I'd be happy to.

wolk935 commented 9 years ago

Reaydi ---------- Переадресованное сообщение ---------- От: "Scott Prager" notifications@github.com Дата: 3 Июн 2015 г. 22:16 Тема: Re: [GitTorrent] [WIP] support branches (#34) Кому: "cjb/GitTorrent" GitTorrent@noreply.github.com Копия:

BTW: here's the output for gittorrentd:

src$ ./GitTorrent/gittorrentd in repo GitTorrent/.git/git-daemon-export-ok GitTorrent/.git/

{"repositories":{"GitTorrent":{"HEAD":"5c16fe1bbb35fb8fdbc5f2063a9fdc5603845079"}}} Announcing 5c16fe1bbb35fb8fdbc5f2063a9fdc5603845079 for branching on repo GitTorrent/.git/ Announcing 5141cf86c3202f8cdd7ccb63776c571a2a707f75 for glob-cwd on repo GitTorrent/.git/ Announcing 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5 for master on repo GitTorrent/.git/

{"repositories":{"GitTorrent":{"HEAD":"5c16fe1bbb35fb8fdbc5f2063a9fdc5603845079","refs/heads/branching":"5c16fe1bbb35fb8fdbc5f2063a9fdc5603845079","refs/heads/glob-cwd":"5141cf86c3202f8cdd7ccb63776c571a2a707f75","refs/heads/master":"61579b5ee99d9a51ad94ec14a205d4f96cefa6b5"}}} errors= [] hash= e743222bc6010e5080cba6706a282e9d756977c8 errors= [] hash= e743222bc6010e5080cba6706a282e9d756977c8 Received handshake for 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5 calling git pack-objects exited

and I removed the line "origin has:" from git clone. (redundant)

Do you think it's ready to merge in now?

I forgot I'd marked it WIP. I would like to see it merged. There are still issues, like sending userData twice as seen above, and maybe a few miscellaneous things to work on, but, being a small and newer project, I don't think that's a bad thing and I'd like to see how it acts in the wild.

Let me know if you want me to change anything first, I'd be happy to.

— Reply to this email directly or view it on GitHub.

cjb commented 9 years ago

@splinterofchaos Looks good! Merged, and added you as a collaborator -- feel free to weigh in, have some ownership, etc! (I'll push a small change to make git-remote-gittorrent output a bit less verbose.)

cjb commented 9 years ago

@splinterofchaos This usually works, but I did get one failure:

 λ git clone gittorrent://81e24205d4bac8496d3e13282c90ead5045f09ea/gittorrent
Cloning into 'gittorrent'...

Mutable key 81e24205d4bac8496d3e13282c90ead5045f09ea returned:
repositories: 
  gittorrent: 
    HEAD:                  61579b5ee99d9a51ad94ec14a205d4f96cefa6b5
    refs/heads/check-sha1: 7f4f50262b15569a6e60fd69daf74aefd74bd081
    refs/heads/master:     61579b5ee99d9a51ad94ec14a205d4f96cefa6b5
  recursers: 
    HEAD:              5fbfea8de70ddc686dafdd24b690893f98eb9475
    refs/heads/master: 5fbfea8de70ddc686dafdd24b690893f98eb9475
Okay, we want to get HEAD: 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5

Okay, we want to get check-sha1: 7f4f50262b15569a6e60fd69daf74aefd74bd081

Okay, we want to get master: 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5

Adding swarm peer: 192.34.86.36:30000 for 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5
Downloading git pack with infohash: 5ba1e3c62379bd48b289b5251b9225323e68ed88

Receiving objects: 100% (177/177), 27.78 KiB | 0 bytes/s, done.
Resolving deltas: 100% (95/95), done.
git update-ref origin/HEAD 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5
git update-ref origin/master 61579b5ee99d9a51ad94ec14a205d4f96cefa6b5
Checking connectivity... fatal: bad object 7f4f50262b15569a6e60fd69daf74aefd74bd081
fatal: remote did not send all necessary objects

It looks like our lookup for the 7f4f5.. sha1 failed, and we should retry it (issue #5) before moving on and asking Git to checkout.

Also, I guess this way means that for two branches that are near to each other, we download approx twice as much data as we need, because we get a full packfile from an empty repo for each branch?

I wonder if it would be worth getting HEAD first, and then asking for other branches and giving a "have (HEAD)". We can even parallelize it so we don't need to wait until we actually have the HEAD objects before we ask for a packfile from HEAD..somebranch, I think? Then as long as we wait to receive everything before we put it all together, we end up with every branch and don't have to download the same data a bunch of times. What do you think?

splinterofchaos commented 9 years ago

It looks like our lookup for the 7f4f5.. sha1 failed, and we should retry it (issue #5) before moving on and asking Git to checkout.

The todo variable I introduced increments only when a peer actually has the hash--each of the "Adding swarm peer" outputs. Perhaps it should increment when we look up a new hash. The only problem there is the possibility of an infinite wait on a hash from github, but not in the network. If we poll the DHT instead of github for he references, then that shouldn't be a problem.

Also, I guess this way means that for two branches that are near to each other, we download approx twice as much data as we need, because we get a full packfile from an empty repo for each branch?

I wonder if it would be worth getting HEAD first, and then asking for other branches and giving a "have (HEAD)".

I've been thinking about this a little, but I need to learn more about how the DHT network and packfiles work.

So, if I have a repository with just master, then I make branch A with a few more commits, then branch to B, I could fetch just B and get the whole tree. If A, B, and master all point to the same sha, then we will already do the right thing.

If I have multiple branches, some of them might already be in master and master might be in some of them. Optimally, any time two sha's have a common ancestor, C, I want to fetch HEAD..C and C..A/B/erc. separately and eliminate branches already included.

I think that would be best, long term, but it might have a high implementation cost. Fetching HEAD and the other branches relative to HEAD would definitely be good right now and will probably be very close to optimal in most situations (slowest at cloning repositories with branches that have branches). Is the "have" and "want" stuff already implemented so that we're not sending the whole repository each time?

Since git checks for connectivity after we exit, I wonder if we don't have to worry about pulling in objects in the wrong order.

cjb commented 9 years ago

Yeah, I don't think order matters when pulling.

I think the actual right way to do this is described in issue #10 -- piping git-fetch-pack to git-upload-pack over the ut_gittorrent transport. I think that would result in one packfile per repo with every ref inside.

splinterofchaos commented 9 years ago

:+1: neat