dominictarr / npmd

MIT License
450 stars 37 forks source link

syntax for private/multiple registries #48

Open dominictarr opened 10 years ago

dominictarr commented 10 years ago

People have talked about multiple npm registries for ages, but there is not any good support for this.

I tried to tackle this problem with a (too) clever proxy based technique https://github.com/substack/shadow-npm

(that doesn't work anymore, because you cannot get the npm users anymore)

The problem, is that the npm registry needs to be aware that it is communicating with a different registry, because it's gotta send different auth for that registry.

As a minimum, this required, to make npm(d) aware of auth to multiple registries. It will be pretty easy to add that to npmd... hmm, npm-registy-client is a separate module now too, so maybe it would be easy to add there too?

so, what does it look like in the package.json?

"dependencies": {
  "my-module": "npm:myregistry.com#~1"
}

I think this could be enough. it needs to be npm: because it's a protocol. npm(d) won't just request that module. it will make a bunch of requests to various couchdb things. so, what it needs to know is use this registry. and you can set up auth, etc for that registry in a config file somewhere.

dominictarr commented 10 years ago

/cc @jden

junosuarez commented 10 years ago

I would propose that the value of the dependency be a URI identifying the actual module by name and semver range, eg

"dependencies": {
  "my-module": "npm:myregistry.com/my-module#~1"
}

This is analogous to how npm currently handles git urls and has the benefit of keeping all of the information needed to identify the dependency in one string. Sticking with URIs also has the advantage of working with existing parsers, like node's built-in url module.

Auth for multiple registries should be based on hostname. current plain npm dependencies should be considered as using npmjs.org as the hostname. Stored credentials could then be retrieved based on hostname.

See also how the git client manages credentials internally: https://www.kernel.org/pub/software/scm/git/docs/v1.7.9/technical/api-credentials.html

/cc @raynos (I know you were interested in npm using URLs in our previous conversation)

Raynos commented 10 years ago

i like the idea of a module being an uri. npmjs://myregistry.com/my-module/0.2.3.

@gozala talked to me a while ago about the idea that a module should be an uri.

Obvouisly this suffers from the problem that the internet is mutable, so we need an immutable protocol / understanding. A version meaning only one thing is very important.

dominictarr commented 10 years ago

I like this, but the module name is redundant, what if it's different to the module name in the key?

If you want to express a module at a particular version as a single string, there is already a way to do this:

module-name@version

This already works for all resource types, module-name@https://whatever.com/module-name.tgz already works.

@jden totally agree i want it to work by just assuming that all current modules are in registry.npmjs.org

@Raynos agree. this is the benefit of a central registry. it's easier to have a protocol that enables you to version modules in a disiplined way. if everything is just a url, then that is like script tags. arguably the most flexible possible method, but too flexible and not enough guarantees.

dominictarr commented 10 years ago

I guess you could make the module name in the url optional npm:myregistry.com#~1.0.0

are the slashes really necessary?

defunctzombie commented 10 years ago

I don't think private registries should be something that is exposed to the file in the general case. I personally like the apt-get approach of stacking and maybe some .npmrc file could contain a list of registries in the order to check them.

Beyond that, I find the following syntax much more intuitive

schema://path@version
"dependencies": {
  "foo": "github://user/repo@tag",
  "bar": "github://user/repo@commit",
  "baz": "1.2.3",
  "fiz": "fiz-whatever@3.4.5",
  "foz": "file://local/file.js",
  "cat": "npm://registry@3.4.5"
}

When you don't specify a schema it will assume an npm registry and use the list of configured registries (so the last line for cat could just be 3.4.5 but think of it like doing apt-get install -t unstable foobar to specifically identify that we want foobar from the "unstable" registry.

I am a fan of the "name":"where" style which also allows you to install it under whatever name you want (like the fiz-whatever) example. But that might be too extreme or lead to other confusion. I generally consult package.json when I want to know what is the dependency anyway.

mbrevoort commented 10 years ago

The URI semantic is more understandable. I like the rationale that npm is a protocol and should be represented as such.

The resources will be mutable with range/wildcard version specification. I can't see a way around that (e.g. "~1.0.1" or something more involved like ">= 0.7.3 < 1")

One point against treating it as a URI is spaces and some characters in the version conditionals (like above) need escaping.

However, I'd be happy with anything in this ballpark at this point! :)

mbrevoort commented 10 years ago

I like the apt-get style approach but I prefer predictability of which registry I'm resolving which deps from.

defunctzombie commented 10 years ago

-50000 x 50000 for anything that is not an EXACT pin of a version. Immediate failure when you do not install what you expected to install. You cannot predict the future of your dependencies so version ranges are not the answer. Semver is a hint and not some absolute that says just cause only the patch or minor changed that there won't be bugs or feature changes.

defunctzombie commented 10 years ago

Let me put it this way, without some sort of cryptographic hash on content you expect, it would be like doing a git clone of your project and getting an random version of the code for one of the folders.

If the reaction is that this makes deeply nested dependency updates a pain then maybe the entire approach of modules or tooling needs to be made better.

mbrevoort commented 10 years ago

@defunctzombie but when you specify a dependency with a range or wildcard like ~1.0.1 you are saying up front that you don't expect the same package every time. It's not random; it's a feature. If you want the same version, specify it explicitly (1.0.1). I realize loosely defined dependencies can change underneath but that's what npm shrinkwrap intended to address. I do think there should be a native way to ensure bit-wise equality though.

I assume the goal here is to identify an approach that is compatible with the current npm registry implementation or at the very least non breaking.

junosuarez commented 10 years ago

I don't think @dominictarr was proposing to change the way dependency version ranges are resolved to specific dependency versions. This issue is relating to how to best specify a dependency that is originating from an npm registry other than the public `registry.npmjs.

The way I see it, there are two issues:

IMHO it would make sense to create separate issues for these two requirements. Since this thread has mostly discussed the second, I will continue with that.

The default npm method for specifying a dependency is "name": "~>versionRange" This assumes the protocol (npm), the registry (registry.npmjs.org). The actual location of the bits, and how to resolve the version range, is not specified in this declaration.

Git urls and local filesystem paths to tarballs look like "name":"/path/to/repo/or/tarball" This assumes a specific version and delivery mechanism (i.e., "get the bits located at this address"). It does not allow for version range dependency resolution. (And as stated above in this post, changing the way that works is beyond the scope of this issue).

Explicit registry modules need a way to encode:

It does not need to encode a specific url as an address to download a specific tarball. The proposal for URIs would be used just as identifiers and as a means of encoding this dependency <name, version, registry> triple in a way that most programmers are familiar with.

@dominictarr the slashes in URIs are not required, but they are 1) familiar and 2) well specified in rfc 3986 and well supported in parsers like node's url

chrisdickinson commented 10 years ago

Out of curiosity, from whence comes npmjs:// / npm://? If I'm not mistaken npm's protocol is just HTTP -- shouldn't we faithfully expose that (vs. promoting the host portion of the url to the protocol?)

max-mapper commented 10 years ago

http://wzrd.in/ (browserify-cdn) implements some of these ideas -- might be useful as a way to test these ideas out immediately, patches welcome!

terinjokes commented 10 years ago

Why does defining an npm package as a dependency need to include registry information? I like the approach of a hierarchy of registries, and it works pretty well in the package management world of Linux.

Think of this scenario: request has a small bug in the current version 2.27.0. I make a fix and send a pull request, but publish it to my private npm server as request@2.27.0+cloudflare1. Why shouldn't this version be picked up automatically whenever request is requested?

I don't want to have to edit the dependency tree of my entire app to now say npm://npmjs.intranet@2.27.0 then edit it again when 2.27.1 comes out with my PR merged.

(No offense to mikeal here, of course)

Gozala commented 10 years ago

i like the idea of a module being an uri. npmjs://myregistry.com/my-module/0.2.3.

@gozala talked to me a while ago about the idea that a module should be an uri.

Obvouisly this suffers from the problem that the internet is mutable, so we need an immutable protocol / understanding. A version meaning only one thing is very important.

To be clear for a while I just wanted to have require("raw.github.com/Gozala/method/v2.0.0/core") no package.json or any of that jazz. npm could be just registry that enforces versioning & package (or rather module) manager would just look at modules and install dependencies.

I don't think this is very related to what's discussed here though.

defunctzombie commented 10 years ago

Doesn't GO do something like this? On Dec 3, 2013 4:20 PM, "Irakli Gozalishvili" notifications@github.com wrote:

i like the idea of a module being an uri. npmjs:// myregistry.com/my-module/0.2.3.

@gozala https://github.com/gozala talked to me a while ago about the idea that a module should be an uri.

Obvouisly this suffers from the problem that the internet is mutable, so we need an immutable protocol / understanding. A version meaning only one thing is very important.

To be clear for a while I just wanted to have require(" raw.github.com/Gozala/method/v2.0.0/core") no package.json or any of that jazz. npm could be just registry that enforces versioning & package (or rather module) manager would just look at modules and install dependencies.

I don't think this is very related to what's discussed here though.

— Reply to this email directly or view it on GitHubhttps://github.com/dominictarr/npmd/issues/48#issuecomment-29752543 .

terinjokes commented 10 years ago

@defunctzombie it does, but it also has no registry and I don't think it even attempts semver resolution.

Gozala commented 10 years ago

@defunctzombie @terinjokes Yes it's very similar, but biggest flaw is that they are git urls and go get only installs master: http://talks.golang.org/2012/splash.article#TOC_9.

I suspect in a a future support for tags will be added to fix that issue which a lot of people had being asking for.

dominictarr commented 10 years ago

@jden okay, I am persuaded by on the URI argument.

@Gozala that is out of scope. @defunctzombie git urls currently look like this git://github.com/user/project.git#commit-ish (from npm docs) I believe # was chosen here because it's not a part of the url. But extra information, pertaining to it, that is evaluated on the client, and not sent with the request. which is pretty much how giturls are handled.

changing how npm currently works (breaking change) or making a frivolous change even if it's better to npm is completely out of the question.

@Gozala wow, does go install dependencies recursively? really surprised it only installs master!

Okay: explicitly named private registries vs. a configured chain of registries.

@terinjokes this has the considerable advantage that it is now not necessary to change anything in the format of the package.json, however, I see some disadvantages coming with that too. For example, it's now not possible to know what will be installed, given a package.json also suppose some writes a bah module in for private registry, but then someone else writes a bah module in the public registry. now there is no way to use the one from the public registry, and it's also not obvious which registry you are getting any particular repo from.

hmm, what would it allow you to do better? hmm, you could move the registry or rename it, and you'd only have to update the config, but you could also configure an alias for that repo too and use an explicit registry in the uri.

@terinjokes so, with explicit registries you could still publish your own fork to your own registry. the only difference is that it would look like:

"request": "npm://registry.cloudflair.com#~2.27.0"

@chrisdickinson I'll argue that npm isn't just http, it's a layer on top of http. it's has an http url, but it doesn't just make requests to that one url, but to possibly multiple. Also, an http url already has a meaning with npm - if a url returns a tarball, then it npm will install that tarball.

terinjokes commented 10 years ago

@dominictarr I'm not against namespaces, and I think they would be a good addition, but forcing the registry to be the namespace isn't that great. request@npm://registry.cloudflare.com#~2.27.0 is a different package than request@~2.27.0 and one would not ever be used to satisfy the other, and thus won't be deduped when they are in the same tree.

If I fix a bug in foo I want my entire codebase to use the fixed code while waiting for the PR to be merged and a new version to be packaged, not just the small sliver that's my application-specific logic.

dominictarr commented 10 years ago

@terinjokes the target usecase of multiple registries isn't really to install your own forks. that is already satisfied well enough with giturls, instead, it's to make it easy for organizations to use npm to install their own private code that isn't open source.

I think we are after different things here. do you maintain many forked modules? can you tell me more about your practices here?

terinjokes commented 10 years ago

We fork public modules to fix bugs a lot, and have multiple people working on codebases at once. It's easiest to ensure everyone's on the same page if things are published to our internal registry, alongside our private code.

Currently, out of the 52 modules on our private npm, we have 11 forks of public code bases. Most, if not all, have upstream PRs—whether or not the maintainer is responsive is another thing.

dominictarr commented 10 years ago

what are you using for your private npm currently? is it an entire mirror of registry.npmjs.org?

terinjokes commented 10 years ago

I'm using terinjokes/docker-npmjs which is using kappa to delegate to registry.npmjs.org if it doesn't exist locally.

junosuarez commented 10 years ago

@terinjokes @dominictarr I'd like to mention that the proposals in this thread don't really introduce naming, per se, but rather address assigning an "origin of record", so to speak. Module resolution would still be handled by the module's name, eg the string in package.json#name, the key part of the item in the dependencies object, and the folder name in node_modules/. If you update the source of a dependency from the public npm registry to your own private registry containing your patched fork, as long as the version (eg, with a label) still satisfies the version range in all of the other modules in your dependency tree that depend on the patched module, it is still deduped.

Example:

 jden:test  $ tree
.
├── node_modules
│   ├── a
│   │   ├── node_modules
│   │   │   └── colors
│   │   │       ├── MIT-LICENSE.txt
│   │   │       ├── ReadMe.md
│   │   │       ├── colors.js
│   │   │       ├── example.html
│   │   │       ├── example.js
│   │   │       ├── package.json
│   │   │       ├── test.js
│   │   │       └── themes
│   │   │           ├── winston-dark.js
│   │   │           └── winston-light.js
│   │   └── package.json
│   └── colors
│       ├── MIT-LICENSE.txt
│       ├── ReadMe.md
│       ├── colors.js
│       ├── example.html
│       ├── example.js
│       ├── package.json
│       ├── test.js
│       └── themes
│           ├── winston-dark.js
│           └── winston-light.js
└── package.json

7 directories, 20 files

 jden:test  $ npm dedupe
colors@0.6.2 node_modules/colors

 jden:test  $ tree
.
├── node_modules
│   ├── a
│   │   ├── node_modules
│   │   └── package.json
│   └── colors
│       ├── MIT-LICENSE.txt
│       ├── ReadMe.md
│       ├── colors.js
│       ├── example.html
│       ├── example.js
│       ├── package.json
│       ├── test.js
│       └── themes
│           ├── winston-dark.js
│           └── winston-light.js
└── package.json

5 directories, 11 files

 jden:test  $ cat ./package.json | grep colors
    "colors": "git://github.com/jden/colors.js.git"

 jden:test  $ cat ./node_modules/a/package.json | grep colors
    "colors": "~0.6.2"
terinjokes commented 10 years ago

@jden What about this scenario, quoted from above?

For example, it's now not possible to know what will be installed, given a package.json also suppose some writes a bah module in for private registry, but then someone else writes a bah module in the public registry. now there is no way to use the one from the public registry, and it's also not obvious which registry you are getting any particular repo from.

I understood the npm://npmjs.internal/#~1.2.3 syntax to rectify this by namespacing the public bah and the private bah separately.


If I am indeed wrong, it certainly removes most of my objections. Only one remains.

If I fork a module to fix a bug, and this module is only used via public modules I haven't forked, I have no choice but to make my application dependent on the fork until such time as the PR is merged. One of the nice things about npm is that I don't have the micromanage the dependencies of my dependencies, and this certainly breaks that separation.

junosuarez commented 10 years ago

@terinjokes

I understood the npm://npmjs.internal/#~1.2.3 syntax to rectify this by namespacing the public bah and the private bah separately.

What I demonstrated above is the result of the current implementation of npm dedupe. The resolution at runtime only cares about a module's location on disk, ie, in node_modules. It would be possible to write an npm dedupe-strict for example that would require modules to originate from the same registry to be eligible for deduping. The only thing that remains then would be that two distinct modules with the same name but from different registries could not be used at the same level in a dependency tree. In practice, it's probably a good idea to have "unique enough" module names to lower the likelihood of this happening.

One of the nice things about npm is that I don't have the micromanage the dependencies of my dependencies, and this certainly breaks that separation.

How so? Simply update the dependency on the module that requires your fork, and everything else that depends on that module will be just happy.

defunctzombie commented 10 years ago

If you fix something for a dep you have to fork and update the dep. This is how local module dependencies work. Sometimes it is a pain and other times it isn't On Dec 5, 2013 2:24 PM, "Jason Denizac" notifications@github.com wrote:

@terinjokes https://github.com/terinjokes

I understood the npm://npmjs.internal/#~1.2.3 syntax to rectify this by namespacing the public bah and the private bah separately.

What I demonstrated above is the result of the current implementation of npm dedupe. The resolution at runtime only cares about a module's location on disk, ie, in node_modules. It would be possible to write an npm dedupe-strict for example that would require modules to originate from the same registry to be eligible for deduping. The only thing that remains then would be that two distinct modules with the same name but from different registries could not be used at the same level in a dependency tree. In practice, it's probably a good idea to have "unique enough" module names to lower the likelihood of this happening.

One of the nice things about npm is that I don't have the micromanage the dependencies of my dependencies, and this certainly breaks that separation.

How so? Simply update the dependency on the module that requires your fork, and everything else that depends on that module will be just happy.

— Reply to this email directly or view it on GitHubhttps://github.com/dominictarr/npmd/issues/48#issuecomment-29928763 .

terinjokes commented 10 years ago

So you're saying that if I fix a package (bah) used by something popular (popA), I would have to fork popA to update the dependency array, then fork every module that uses popA to update the dependency array to point at my fork, ad nauseam.

I don't see how this is a good solution and certainly doesn't sound like a productive use of my time.

Right now I publish the fixed version to my private repository. It shadows the package from the public repository, and I can move on with my job. When it's updated on the public repository, I unpublish from my private one (though I'd actually prefer not to have to do this last step, it should shadow only that version).

Again, in the general case, I'm not against the multiple repository format suggested here. I just think there should be another way for me to say "I have patched versions at 'x'. If anyone requests a package satisfied at 'x' use it."

defunctzombie commented 10 years ago

If you want to use a shadow package with same version sure. But you are not using versioning then since maybe tomorrow you need to change something else.

This is what it means to have local deps versus global deps. On Dec 5, 2013 2:46 PM, "Terin Stock" notifications@github.com wrote:

So you're saying that if I fix a package (bah) used by something popular ( popA), I would have to fork popA to update the dependency array, then fork every module that uses popA to update the dependency array to point at my fork, ad nauseam.

I don't see how this is a good solution and certainly doesn't sound like a productive use of my time.

Right now I publish the fixed version to my private repository. It shadows the package from the public repository, and I can move on with my job. When it's updated on the public repository, I unpublish from my private one (though I'd actually prefer not to have to do this last step, it should shadow only that version).

Again, in the general case, I'm not against the multiple repository format suggested here. I just think there should be another way for me to say "I have patched versions at 'x'. If anyone requests a package satisfied at 'x' use it."

— Reply to this email directly or view it on GitHubhttps://github.com/dominictarr/npmd/issues/48#issuecomment-29930949 .

terinjokes commented 10 years ago

Indeed, but I don't have many options that seamlessly work for an entire team of developers, and it was previously asked in this thread that I explain how we currently do things.

dominictarr commented 10 years ago

@terinjokes thank you for explaining. So, my understanding is that you have not mirrored the entire registry, and your private npm only contains your forks and private modules?

The thing about this method that makes me the most uncomfortable is that it's very implicit. it's just like how $PATH works, and i consider node's system of installing locally, and not really using $PATH to be one of node's very best innovations. However, clearly, this already works, and I'm we are not suggesting breaking that in any way.

Oh, by the way, it's worth noting that in npmd, the resolve step and the install step are decoupled. npmd resolve module figures out what the entire dep tree will be, but it's output is just json, in the same format as npm shrinkwrap (except it also contains the shasums of the tarballs, if possible) it would be easy to add a custom dedupe step, or something that replaced a dep, by just dropping a filter into the pipeline.

npmd-resolve whatever | my-custom-dedupe | npmd-install

you could also say, replace modules at any depth

npmd-resolve whatever | npmd-swapdep request@2.27 request@npm:cloudflare.com#2.27.0 | npmd-install

(not to say that this isn't the way you SHOULD do this, but just that you COULD do this)

most importantly, you could also store that file,

npmd-resolve whatever | tee deps.json | npmd-install

And then anyone can check exactly what deps where in use when whatever was installed, and it's still totally explicit. Of course, this is still not to say that @terinjokes's current method wouldn't still work, just pointing out that npmd is very flexible, and it would be easy to experiment with a variety of strategies for module resolution.

I havn't gone out of my way to make npmd work like this, it's just it was easier to make it work this way.

junosuarez commented 9 years ago

This issue is still showing up on my personal "open issues mentioning you" list, limiting its utility. Could you please consider closing it?