dominictarr / npmd

MIT License
449 stars 37 forks source link

fully peer2peer #47

Open dominictarr opened 11 years ago

dominictarr commented 11 years ago

hey @floby you mentioned this on twitter the other day.

I want to do this too, but I also want npmd to work with npm, and to have a smooth(ish) upgrade path from centralized registry to p2p anarchy.

what are your thoughts about this?

Floby commented 11 years ago

Yes, i've given some though about it, but many questions remain. Some of them I can list here

While some of these are very wide questions, some others have very straightforward answers. I would like to know if you have any ideas already for some of them, and also if you have identified other questions that I might have overlooked.

My general opinion from these questions, with a goal of a first iteration, is:

What do you think? I've read a lot about the subject but never had the change to actually implement something like this. You have every right to tell me I'm full of shit =)

dominictarr commented 11 years ago

hmm, one thing missing from your list is dependencies. that is essential :)

so, there are a few things that make this kind of thing much easier.

I think some designs use DHT as a hammer. The DHT architecture is slow, because you have to do several requests to find a value, and you can't control the distribution of the packages. you can't put things that go together together.

Also, there is possible freeloading. some peers could push too much stuff into the network. there is no way to tell spam from ham. because you can only distinguish values by their hash.

That is not to say that DHT isn't a valuable tool, but just that it should be used carefully. bittorrent uses DHT well: just in the tracker layer, to find peers, and then exchanges chunks (replicates) with peers that have (an interest in) that file.

To take this idea to npm, it could mean a tracker layer (which could be a DHT) to find someone with the modules you want, and then you'd replicate those modules with them directly...

Also, you could replicate related data, now that you have a relatively fast connection.

I don't think you'd need to chunk modules. most modules are very small.


It's also important that this whole thing continues to work as a package manager the whole time. It would be cool to make a completely p2p system, but ambition has to be moderated by practicality.

I want to build something that people use, so it's important to make something that is 1) useful as is, but 2) has an upgrade path to total p2p :) this means the npm registry will have a special role for a while, at least.

The most important thing the registry does is allocate names (user names and module names)

if you could just push a new module@version into a p2p network there would be no way to tell that you are really the maintainer of module (or that there isn't another module). if a module name was just a hash that would work, but that isn't npm. that system may be hard to use, anyway.

So, you could have a naming authority, that assigns names. you'd just request to it "I want module" and it would sign a request allocating that module name to you. you could then distribute that certificate to anyone to prove that you are the official maintainer of module.

Currently npm only has a small number of modules. 5e4. The appstore has 1e6. npmd currently downloads all the package.json for all the modules. this is only a few hundred megs. With a million modules, this would be gigabytes. all the registry needs to keep is a map of module names to public keys of the authors that have been assigned that name. This list would be pretty small, so running the name registry would be light work, and it would be easy to replicate all that data to every user.

Then, you could have a tracker layer (maybe a dht) that would map modules to ip addresses of peers that use that module. I think it would be best to mostly serve modules that you also use. Then the incentive is for the system to work as well as possible, because it supports people to use it the way you do. It may also be a good idea to devote a small amount of resources to serving random modules (by a hash of your public key (because your ip address will change)) that way it would be easy to make sure that every module is available p2p. so if there where 1000 available users, and they each hosted 100 mb, that would give you 100gb of modules.

npm is currently 150 gb on disk http://isaacs.iriscouch.com/registry/ but maybe that would comedown with a compaction (if npm was a little different, it could avoid compactions being a problem, and npmd is designed not to need them) anyway, there are way more than 1000 people using npm.


So, on the other hand, it would be interesting to beable to replicate locally. then, if two users are on the same wifi network, they should be able to install from each other's cache. This would be very useful actually, and would be easy for a small company to set up and get very fast installs. but it would be super cool to make this work ad hoc, so that npmd would just detect other peers on the local network, and query their cache when resolving.

You could just connect to them, and then exchange a bloom filter/list of the modules they have. currently, I only have 455 modules in my cache, and 801 module/version pairs. which is a small enough amount of data to share with local peers, only 15k.

This would be cool, and useful for hacking in cafe's with bad wifi. and would also form the basis of much of what the p2p npm would need.


there is quite a bit of stuff that needs to happen to make this secure, but I'm moving towards this, but there are also some things that can happen with p2p before that.

you could replicate from the registry, but then pull down modules p2p, if the peer module lookup was decoupled, then much of that would be the same as local p2p.

/cc @substack since I know he's quite interested in this too.