Okay, so I've been thinking about a plan to refactor lots of npmd stuff.
This also takes into account realizations I've had since starting npmd,
and also the recent problems with the npm registry itself.
The most important change is to change how the cache works so that it's no longer /{module_name}/{version}/package.tgz, but instead it will be
a content addressable store, like git uses: /{shasum(tarball).substring(0,2)}/{shasum(tarball).substring(2)}
Resolve will work the same as it does now (from the outside),
but install will look for the tarball by it's hash and not by it's name.
npmd will have two modes, server and client.
the server replicates the data from the server, the client handles the install.
To install, the client requests a dependency tree resolution from the server,
(which is very fast for the server to do, since it has a local database)
then the client looks in it's local cache, and sees which of the tarballs it already has - and otherwise requests them from the server.
implications of this:
The server and the client may be on the same machine, but need not be.
running your own partial replica will be trivial. you just start a server, and it will cache the files you download from it. It will also be easy to have a rule that (say) caches the modules required by your favorite authors, etc.
Once I get signatures into npm (which I am also working on) then this will be very secure. With signatures it will even be possible to transfer data between replicas without requesting directly from the registry. or even using modules that have not been published globally to the registry yet.
(the registry would simply serve as a naming authority, and it should also be possible to have multiple registries!)
You could also put tarballs on a cdn, or make a dynamo style db using @mhart's https://github.com/mhart/dynalite or something, this could be much simpler than the regular dynamo, since there are no conflicts to resolve when you have a content addressable store.
Okay, so I've been thinking about a plan to refactor lots of npmd stuff. This also takes into account realizations I've had since starting npmd, and also the recent problems with the npm registry itself.
The most important change is to change how the cache works so that it's no longer
/{module_name}/{version}/package.tgz
, but instead it will be a content addressable store, like git uses:/{shasum(tarball).substring(0,2)}/{shasum(tarball).substring(2)}
Resolve will work the same as it does now (from the outside), but install will look for the tarball by it's hash and not by it's name.
npmd will have two modes, server and client. the server replicates the data from the server, the client handles the install.
To install, the client requests a dependency tree resolution from the server, (which is very fast for the server to do, since it has a local database) then the client looks in it's local cache, and sees which of the tarballs it already has - and otherwise requests them from the server.
implications of this:
The server and the client may be on the same machine, but need not be. running your own partial replica will be trivial. you just start a server, and it will cache the files you download from it. It will also be easy to have a rule that (say) caches the modules required by your favorite authors, etc.
Once I get signatures into npm (which I am also working on) then this will be very secure. With signatures it will even be possible to transfer data between replicas without requesting directly from the registry. or even using modules that have not been published globally to the registry yet. (the registry would simply serve as a naming authority, and it should also be possible to have multiple registries!)
You could also put tarballs on a cdn, or make a dynamo style db using @mhart's https://github.com/mhart/dynalite or something, this could be much simpler than the regular dynamo, since there are no conflicts to resolve when you have a content addressable store.