cjb / GitTorrent

A decentralization of GitHub using BitTorrent and Bitcoin
MIT License
4.75k stars 262 forks source link

[RFC] Use `git-upload-pack` for smaller pack files on branches. #50

Closed splinterofchaos closed 9 years ago

splinterofchaos commented 9 years ago

We did use git-upload-pack in gittorrentd for getting a listing of branches, but I switched that to git ls-remote because it's simpler and more to-the-point. Also, the git-upload-pack process never exited because we didn't communicate with it, but git ls-remote exits when it's printed all the refs. By calling publish_public_key() when the stream ends, it won't be called early (as noted in #34).

I changed the invocation of git pack-objects to git-upload-pack so that we could generate pack files assuming that the client has master (unless that's what we're fetching). I believe this is in line with https://github.com/cjb/GitTorrent/pull/34#issuecomment-108606988, but not quite the solution for #10 (overly complex; clients can't communicate haves).

I added a new module, ./git.js, and am not sure if it complies with idiomatic js, so let me know if anything should be changed. In particular, I found reading from a stream synchronously very challenging, considering the statefulness of git-upload-pack.

EDIT: I forgot to note, one known problem: sometimes git-upload-pack spawns and quits with nothing happening. It seems to only happen when more than one try to run concurrently, but informally trying to run multiple instances at once didn't seem problematic, so I'm not sure why this might happen.

cjb commented 9 years ago

@splinterofchaos Looks good to me too, thanks! :thumbsup: for merging whenever you're happy with it.

I don't think I follow your comment about #10 and being overly complex, could you elaborate? For #10, git-fetch-pack would provide haves itself, and we'd just let it talk to the remote git-upload-pack and redirect the stream to a file when git-upload-pack sends over a PACK line.

splinterofchaos commented 9 years ago

don't think I follow your comment about #10 and being overly complex, could you elaborate?

Sorry, I meant the opposite: having git-fetch-pack talk to git-upload-pack seems simpler; trying to communicate directly with git-upload-pack feels overly complex. I don't understand at this point whether the client (git-remote-gittorrent) can/will be able to communicate directly to the server (gittorrentd) using git-fetch-pack, but that would seem ideal. By "not quite the solution for #10", I meant that this seems desirable right now, but I don't know if it's a part of the long-term solution or not.

Also, I haven't done my research into git-fetch-pack yet.

One more thing I forgot to add to this PR: If the client has A and we pack B, rather than writing to B.pack, maybe we could write to A..B.pack. Before calling git.upload_pack(dir, B, A), we could check if A..B.pack exists.

splinterofchaos commented 9 years ago

One more thing I forgot to add to this PR: If the client has A and we pack B

It seems there might not be a point. fs.exists() always returns true (in the callback), even when the file's not there, and node.js.org seems to think this is a bad function.

In particular, checking if a file exists before opening it is an anti-pattern that leaves you vulnerable to race conditions: another process may remove the file between the calls to fs.exists() and fs.open().

Caching pack files seems desirable, but I'm not sure how to handle potential races right now and I don't think it's the highest priority.

I'll let this sit a day or so, then merge.

cjb commented 9 years ago

Also, I haven't done my research into git-fetch-pack yet.

Ah, okay! I found this invocation super useful for seeing how it works:

 λ export GIT_TRACE_PACKET=true
 λ export GIT_TRACE=true
 λ git clone git://github.com/cjb/gittorrent
Cloning into 'gittorrent'...
13:32:55.168266 pkt-line.c:46           packet:        clone> git-upload-pack /cjb/gittorrent\0host=github.com\0
13:32:55.204527 pkt-line.c:46           packet:        clone< 5b990d68094cd1390cb8e729d36a7974d9a0aa84 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/master agent=git/2:2.4.0~peff-bitmap-reuse-delta-1010-g0d923ac
13:32:55.204544 pkt-line.c:46           packet:        clone< 5b990d68094cd1390cb8e729d36a7974d9a0aa84 refs/heads/master
13:32:55.204549 pkt-line.c:46           packet:        clone< 4e440776743ac216d19a7fc53c83a0681fdbf45b refs/pull/16/head
13:32:55.204553 pkt-line.c:46           packet:        clone< 90a17f1bb2dfa953898a9056d5778bd5ceaa08b4 refs/pull/19/head
13:32:55.204557 pkt-line.c:46           packet:        clone< 7d324de111b7b5711e31ef73055f385434cbf513 refs/pull/26/head
13:32:55.204561 pkt-line.c:46           packet:        clone< ae731139b21cac80ead9dc85c63e7aa8fe2ce26b refs/pull/26/merge
13:32:55.204565 pkt-line.c:46           packet:        clone< 5141cf86c3202f8cdd7ccb63776c571a2a707f75 refs/pull/27/head
13:32:55.204568 pkt-line.c:46           packet:        clone< 8af3f08f4e515df2d7cd2f3334474a6bcf583ad8 refs/pull/27/merge
13:32:55.204572 pkt-line.c:46           packet:        clone< aee9dd69178f6f76ecc00ba99961e31fe140d1f6 refs/pull/28/head
13:32:55.204576 pkt-line.c:46           packet:        clone< af022cb99c4c90548e065f2883504b7193737e7e refs/pull/28/merge
13:32:55.204580 pkt-line.c:46           packet:        clone< bfe8b17eec315d38070417bf45d5cbcbbb74dc18 refs/pull/29/head
13:32:55.204584 pkt-line.c:46           packet:        clone< ff97c6721aaa0daed208cf99013be177fb642bdb refs/pull/30/head
13:32:55.204588 pkt-line.c:46           packet:        clone< 7f4f50262b15569a6e60fd69daf74aefd74bd081 refs/pull/33/head
13:32:55.204591 pkt-line.c:46           packet:        clone< b8249848a0655491b8e21629d240f0f761c4db2d refs/pull/33/merge
13:32:55.204595 pkt-line.c:46           packet:        clone< f928f13861d0eca8c204a0de1565d41989b5493d refs/pull/34/head
13:32:55.204599 pkt-line.c:46           packet:        clone< b9260e07932e19009ca4e9b35543a25c75a6759a refs/pull/42/head
13:32:55.204603 pkt-line.c:46           packet:        clone< 9754788d0837721cd2899314195b47103fddcac1 refs/pull/44/head
13:32:55.204606 pkt-line.c:46           packet:        clone< 25bcaeaa0c66cdb9219517dc479444f4d7938c83 refs/pull/50/head
13:32:55.204610 pkt-line.c:46           packet:        clone< ef192b335310bd16712e7371e6d8cff7564bb9ca refs/pull/50/merge
13:32:55.204614 pkt-line.c:46           packet:        clone< cce6953283ed2456d15ec63787259ce84a7063af refs/pull/7/head
13:32:55.204617 pkt-line.c:46           packet:        clone< 0000
13:32:55.205022 pkt-line.c:46           packet:        clone> want 5b990d68094cd1390cb8e729d36a7974d9a0aa84 multi_ack_detailed side-band-64k thin-pack ofs-delta agent=git/2.1.4
13:32:55.205032 pkt-line.c:46           packet:        clone> want 5b990d68094cd1390cb8e729d36a7974d9a0aa84
13:32:55.205035 pkt-line.c:46           packet:        clone> 0000
13:32:55.205047 pkt-line.c:46           packet:        clone> done
13:32:55.277158 pkt-line.c:46           packet:        clone< NAK
13:32:55.277212 run-command.c:341       trace: run_command: 'index-pack' '--stdin' '-v' '--fix-thin' '--keep=fetch-pack 28580 on octavius' '--check-self-contained-and-connected'
13:32:55.277337 exec_cmd.c:134          trace: exec: 'git' 'index-pack' '--stdin' '-v' '--fix-thin' '--keep=fetch-pack 28580 on octavius' '--check-self-contained-and-connected'
13:32:55.277901 git.c:349               trace: built-in: git 'index-pack' '--stdin' '-v' '--fix-thin' '--keep=fetch-pack 28580 on octavius' '--check-self-contained-and-connected'
13:32:55.280234 pkt-line.c:46           packet:        clone< \2Counting objects: 209, done.
remote: Counting objects: 209, done.
remote: Total 209 (delta 0), reused 0 (delta 0), pack-reused 208
Receiving objects: 100% (209/209), 35.24 KiB | 0 bytes/s, done.
Resolving deltas: 100% (114/114), done.
Checking connectivity... 13:32:55.300126 run-command.c:341       trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all'
13:32:55.300235 exec_cmd.c:134          trace: exec: 'git' 'rev-list' '--objects' '--stdin' '--not' '--all'
13:32:55.300749 git.c:349               trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all'
done.
 λ
splinterofchaos commented 9 years ago

@cjb Thanks for showing me that. I'll look more at using git-fetch-pack when I have time, but merge for now.

cjb commented 9 years ago

@splinterofchaos Thanks!