Exclude non-essential files from published package (especially test/**)

broofa commented 2 years ago

Excluding the test directory from the npm published files will shave ~75% off the size of the package.

'Probably want to also exclude appveyor.yml and example.

$ mkdir -p /tmp/foo
$ cd /tmp/foo
$ npm install resolve
$ du -sh node_modules/resolve

4.0K    node_modules/resolve/LICENSE
4.0K    node_modules/resolve/SECURITY.md
4.0K    node_modules/resolve/appveyor.yml
4.0K    node_modules/resolve/async.js
4.0K    node_modules/resolve/bin
8.0K    node_modules/resolve/example
4.0K    node_modules/resolve/index.js
 52K    node_modules/resolve/lib
4.0K    node_modules/resolve/package.json
 12K    node_modules/resolve/readme.markdown
4.0K    node_modules/resolve/sync.js
304K    node_modules/resolve/test

ljharb commented 2 years ago

They are in fact quite essential; they’re included by design. npm install foo && npm explore foo && npm install && npm test should always work.

Duplicate of #258. Duplicate of #235. Duplicate of #207. Duplicate of #171. Duplicate of #180. See not just #82, but https://github.com/browserify/resolve/pull/149#pullrequestreview-96750692, https://github.com/browserify/resolve/pull/132#pullrequestreview-52259150, https://github.com/browserify/resolve/pull/128#pullrequestreview-47539920, https://github.com/browserify/resolve/pull/59#issuecomment-63429177.

I’d be happy to exclude appveyor.yml if those mere bytes (not 4K, that’s just the block size limits of your hard drive) are worth excluding. Excluding examples would be a breaking change, so we could do it in v2, i suppose - except that examples are ran as part of tests, so they’d need to remain.

broofa commented 2 years ago

npm install foo && npm explore foo && npm install && npm test should always work

This doesn't seem like a valid reason for shipping test files in production. I've never once had a case where a user expected this to work, nor where I felt it would have been helpful.

Bottom line: Test scripts just aren't something anyone expects to run as part of a production environment. In part because production environments often go through additional toolchains that will break this sort of thing.

ljharb commented 2 years ago

Sure - but the other bottom line is that package size doesn't matter in practice - what matters is the memory or bundle size of your program, and none of these files affect that. There's certainly tools that try to imply that package size matters, but this is just FUD.

jason-ha commented 2 years ago

@broofa Very much agree. Jordan did not reference #235 where we had a longer discussion that was eventually not replied to. It seems that Jordan is very convinced despite reoccurring requests. I am back here checking on things because security scanning tools continue to crawl though this dependency and find issues. With such demand out there it would be helpful to provide community with the option to not get the overhead (which is not just about size). There could be two packages. One with just what is needed and another with whatever extras.

ljharb commented 2 years ago

@jason-ha thanks for that; i'll update the list to include that one also.

ljharb commented 2 years ago

@jason-ha if npm offered a way to have the "default" installation be minimal, and a "full" installation be something that could be opted into, i would absolutely do that immediately - but npm doesn't offer it. Having two packages would not solve anything because then I wouldn't have the "full" one installed when I suddenly find myself without internet and needing to test dependencies.

Please feel free to file bugs on any security scanning tools that incorrectly flag these files.

jason-ha commented 2 years ago

I will direct you to the #235 thread about the tooling. Reading through some related issues it is not just tooling my team uses that is impacted.

It seems that two packages would very much solve the issue. In your use case of having everything, you would never reference the production only package. The other folks who never want to deal with the resolve tests as part of their production environment would never reference the "full" package.

ljharb commented 2 years ago

The problem is that I always want the full package, transitively.

jason-ha commented 2 years ago

Either I am not understanding you or you are misunderstanding the proposal. Proposing a new package; let's call it resolve-prod-only. It only contains the essential files. You would not use it. You keep on using resolve.

ljharb commented 2 years ago

@jason-ha I'm saying that even when resolve is 10 levels deep, I still want its test installed. The only way to achieve that is to either have the status quo, or, have every dependent use the full package.

The interest you highlight means that folks would switch to resolve-prod-only, which would defeat the point.

jason-ha commented 2 years ago

Please help me understand why you want the tests installed in all situations. I can understand there are packages you work with and would get to stick with resolve (full) dependency. Why in anyone's usage do you get to insist that tests and whatever are installed [apart from being the package owner of course :) ] ?

It might be helpful to walk through the npm "default"/"full" scenario since you are saying that would work for you. What is a case now where resolve package is N (>1) dependency levels deep, what is install process now, and how would install process change? Is the "default" or "full" option something that exists per dependency or a top-level option such that at top every package transitively involved would get the "full" install?

I am also wondering if there may be a solution where tests are packaged apart from resolve. When you get to the local npm install step or your npm install foo && npm explore foo && npm install && npm test expectation that is when the tests are really installed. From the previous conversation on 235 it is the first three steps that you do to prepare to be offline and npm test is executed later.

ljharb commented 2 years ago

First of all, this isn't just for resolve; it's for hundreds of other packages I maintain as well - resolve is just one of them.

The problem is that my local npm cache is only guaranteed to contain what's in my projects' transitive dependency graphs. Thus, resolve appears, but resolve-full doesn't. npm install --prefer-offline works fine with a primed npm cache.

The problem is that I can't predict when I'll need resolve-full, and when I do, I might not have internet available. So, the only way I can ensure it's always present - so that it's always present when I need it, internet or no - is to have it always be present.

npm could solve this potentially by allowing a package to have multiple "distributions", and then the minimal one would be the default, and I'd set an npm config value to always install the "full" one. However, short of that, this is the only practical approach.

jason-ha commented 2 years ago

Definitely understand that is it not just resolve and the issue goes to all packages built in this way. Packaging of things beyond the minimal is not a best practice for secure products. But security conversation should probably happen in issue 235 where there is more content to date.

I am not suggesting resolve-full; I want to but expect that is dead on arrival. I am suggesting resolve-prod-only. You keep using resolve; no changes for you. Please say how leaving resolve as is and providing an alternate for all those finding the minimal package advantageous is a problem.

I was looking for "official" guidance on this question and NPMJS does recommend chucking things that are not required like test data. It also indicates that a smoke test is fine, but (a) that is not a whole test suite and (b) doesn't really say it is for use once installed. And elsewhere recommendation for testing a module is to create a test external to the package and see also here.

It is not clear to me if you are completely opposed to the other solution I was wondering about where the tests are brought in via the install in package scope step. As far as I can tell that can meet your caching and offline requirements.

ljharb commented 2 years ago

My use case only works if everyone chooses the full package. That’s the problem.

jason-ha commented 2 years ago

Please explain everyone. This project has 12.6M repos / 10k packages that depend on it. I don't see how you need every single instance of those uses to have full package.

Can you help explore alternatives to the current pattern that resolve this issue for consumers and are acceptable for your situation? I want to encourage your "will try anything once" part of your profile. I see two possibilities right now (beyond the rejected two package "distributions") that don't require package manager features:

the previously listed pattern where tests are brought in via install
leverage your git cache - details below

To the extent that I understand your use case, you can before going offline ensure your local git repositories are synced with remote. When you explore into package you will know the version and may grab all of the additional files from your local repository at the same version. Now you can test. You will even have the additional option to validate that what came installed matches your repository.

browserify / resolve

Exclude non-essential files from published package (especially test/**) #277