Open splinterofchaos opened 9 years ago
@splinterofchaos We could actually also just increase the size limit. The mainline bittorrent DHT only guarantees 1000 bytes per key (clients could store more), but we're not using the mainline bittorrent DHT itself, just its protocol.
It would be our first breaking change against mainline DHT, though, so maybe it's worth trying to avoid.
we're not using the mainline bittorrent DHT itself, just its protocol.
That makes sense, but might the libraries implementing the DHT and protocol decide to reject large messages at any time? If that ever happened, though, I suppose we could use a forked 'n patched version.
Still, if we do stop limiting to the size, a compressed userProfile
would mean transferring the same amount of information using less bandwidth. Though, with network speeds what they are, maybe that's not an issue.
It's the bittorrent-dht
module. It currently enforces the 1000 byte limit; we'd just ask the maintainers to add a constructor option to relax that limit (perhaps by 10-20x in our case).
950 bytes for publishing all of one's repositories and branch names with 40-byte shas puts too many constraints on what we can store. A repo named with just one letter requires 74 bytes (
{'repositories':{'X':{'HEAD':'sha'}}}
), then each branch requires at least 58 bytes (including the comma:,'refs/heads/X':'sha'
) which means we can store at most 11 branches (74 + x58 < 950 => x < 12
). This should be sufficient for most people's local repositories, if they consistently prune branches that have been merged or become moot, but we can't host pull requests this way, which means we're still tied to github. This repository only has 3 open pulls right now, but larger projects can have hundreds.Just some brain stormed ideas:
We can compress
userProfile
: Instead of having the top-level item be a field,repositories
, just makeuserProfile
be a list of repositories. We don't need to storeHEAD
's sha, explicitly, just which branch it points to, unless it's detached. Or we could always reportrefs/heads/master
asHEAD
, even when it's not. Every eight characters in the sha could be stored in a 32-bit integer, requiring 20 bytes total, but not using json to encode it.Not use json: very convenient and simple, but not compact, especially not for raw integers. I've worked with msgpack before and it does have a js implementation. Not familiar with the js ecosystem, but I'm sure other equally efficient serialization libraries exist. I do think it's important, though, that we use something implemented in multiple languages for (ref: #12).
A linked list of mutable keys? We could subvert all limitations by allocating another key when we run out of space and having a
next
orprevious
field.Let users manually decide which branches to share via
git-export-ok
. We could put regex patterns in this file for what to include, or what to exclude. The default could look something like this:(
exclude:
could be used to not add files thatinclude:
would match.include:
could also just be*
)