cjb / GitTorrent

A decentralization of GitHub using BitTorrent and Bitcoin
MIT License
4.75k stars 262 forks source link

userProfile can't grow very large #52

Open splinterofchaos opened 9 years ago

splinterofchaos commented 9 years ago

950 bytes for publishing all of one's repositories and branch names with 40-byte shas puts too many constraints on what we can store. A repo named with just one letter requires 74 bytes ({'repositories':{'X':{'HEAD':'sha'}}}), then each branch requires at least 58 bytes (including the comma: ,'refs/heads/X':'sha') which means we can store at most 11 branches (74 + x58 < 950 => x < 12). This should be sufficient for most people's local repositories, if they consistently prune branches that have been merged or become moot, but we can't host pull requests this way, which means we're still tied to github. This repository only has 3 open pulls right now, but larger projects can have hundreds.

Just some brain stormed ideas:

We can compress userProfile: Instead of having the top-level item be a field, repositories, just make userProfile be a list of repositories. We don't need to store HEAD's sha, explicitly, just which branch it points to, unless it's detached. Or we could always report refs/heads/master as HEAD, even when it's not. Every eight characters in the sha could be stored in a 32-bit integer, requiring 20 bytes total, but not using json to encode it.

Not use json: very convenient and simple, but not compact, especially not for raw integers. I've worked with msgpack before and it does have a js implementation. Not familiar with the js ecosystem, but I'm sure other equally efficient serialization libraries exist. I do think it's important, though, that we use something implemented in multiple languages for (ref: #12).

A linked list of mutable keys? We could subvert all limitations by allocating another key when we run out of space and having a next or previous field.

Let users manually decide which branches to share via git-export-ok. We could put regex patterns in this file for what to include, or what to exclude. The default could look something like this:

include: refs/heads/*
exclude: refs/remotes/*

(exclude: could be used to not add files that include: would match. include: could also just be *)

cjb commented 9 years ago

@splinterofchaos We could actually also just increase the size limit. The mainline bittorrent DHT only guarantees 1000 bytes per key (clients could store more), but we're not using the mainline bittorrent DHT itself, just its protocol.

It would be our first breaking change against mainline DHT, though, so maybe it's worth trying to avoid.

splinterofchaos commented 9 years ago

we're not using the mainline bittorrent DHT itself, just its protocol.

That makes sense, but might the libraries implementing the DHT and protocol decide to reject large messages at any time? If that ever happened, though, I suppose we could use a forked 'n patched version.

Still, if we do stop limiting to the size, a compressed userProfile would mean transferring the same amount of information using less bandwidth. Though, with network speeds what they are, maybe that's not an issue.

cjb commented 9 years ago

It's the bittorrent-dht module. It currently enforces the 1000 byte limit; we'd just ask the maintainers to add a constructor option to relax that limit (perhaps by 10-20x in our case).