Support fine grained dependencies on node_modules

kamalmarhubi commented 7 years ago

We have a monorepo with a vendored node_modules tree. If we evaluate bazel for this codebase, our ideal use would be to allow depending on individual modules in library targets. This would let our build take advantage of bazel's sandboxing and knowledge of dependencies to vastly reduce the work done, while maintaining reproducibility.

These rules appear to discourage fine-grained dependencies on modules, which will result in lots of wasted builds whenever a dependency is added, updated, or removed. Any suggestions on how to handle our use case / info on planned changes that might make it easier?

alexeagle commented 7 years ago

Just to clarify, you have a tree

repo
 - our_app
 - vendor
   - node_modules
     - <stuff in here is binary distributions layed out following the npm conventions, same as npm install would do>

Or is your vendored directory actually sources that need to be built?

vastly reduce the work done

do you have evidence that a smaller node_modules/* input to the build or to individual actions makes it any faster? I could imagine that for remote actions the input size is a problem, but I haven't seen this problem locally

As a workaround, the rules here accept a node_modules attribute which you could assign to a label that is filegroup-like, which would let you reduce which inputs the nodejs_binary uses.

kamalmarhubi commented 7 years ago

@alexeagle we actually have

repo
├── node_modules
│   └── <stuff in here is binary distributions layed out following the npm conventions, same as npm install would do>
└── ourstuff

where we basically commit results of npm install to our repo. (This makes the few native modules we use special snowflakes, but that's something I would hope to fix via bazel.)

The concern I originally had with the monolithic node_modules dependency is that each executable / script in our repo only depends on only a subset of the hundreds of modules in node_modules. By having a single node_modules target, any update to any module will require a rebuild / retest of the whole repo, instead of only those targets dependent on the changed or added module.

I don't have any measurements, as there are no build rules for coffeescript, which is what a substantial portion of the repo is written in. However your question made me take a closer look at our commit history. It looks like node_modules changes less often than I'd originally thought, so this is probably not a huge issue during development. From an artifact size perspective, however, if we package nodejs_binary rules for distribution they'll end up being hundreds of MB if they use even one module. There may still be some value in allowing dependencies on individual modules.

Curious to hear your thoughts!

alexeagle commented 7 years ago

for nodejs_binary deployments on the server we don't care that much, a few hundred MB is the standard size for Java programs in a large monorepo with a lot of code sharing and a deep tech stack. For a presumed npm_bundle rule (#10), none of the node_modules would be included, only the package.json file.

Note that you can have multiple node_modules_xx targets, each a filegroup with a manually curated set of glob patterns, to satisfy your use case - but I think the manual curation won't be practical over time or at scale. Maybe that can be a workaround if we do discover some reason that the whole of node_modules can't be an input.

There's a scheme in https://github.com/dropbox/rules_node to generate the bazel workspace from the package.json, which you might want to check out.

pauldraper commented 6 years ago

For an idea of "how big are we talking", take https://angular.io/guide/quickstart . Following the instructions, I get a node_modules directory of 335 MB. Given that's just the recommended "getting started" Angular app, I think it's not unlikely for a combined node_modules/ for a monolithic repo to be better part of a gigabyte.

That causes several problems with rules_nodejs:

Install times are lengthy if you use anything in node_modules.
It disobeys the law of incrementality, which is that the time rebuild after a change is proportional to the size of the change (rather than the size of the code). (@kamalmarhubi https://github.com/bazelbuild/rules_nodejs/issues/18#issuecomment-330007055)
It requires large uploads for distributed execution. (@alexeagle https://github.com/bazelbuild/rules_nodejs/issues/18#issuecomment-329598470)
It doesn't facilitate well-structured code bases. For example, I cannot easily ask "What uses this dependency?"
It results in large deployment targets. (@alexeagle https://github.com/bazelbuild/rules_nodejs/issues/18#issuecomment-330123825) On the one hand, this may not be an issue since "disk space is cheap". On the other hand, as servers become more numerous and more ephemeral, many containerized deployments fight to optimize out every MB possible. "A 100 MB here or there isn't important" is not as true as we might like it to be.

(1) can only be fixed by making Bazel-native installs, in the style of https://github.com/johnynek/bazel-deps (Maven) or https://github.com/dropbox/rules_node (npm).

(2), (3), (4), and (5) can be fixed in less drastic ways. npm/yarn can still resolve and install the dependencies, compile native bindings, etc. But Bazel, via the generated lock file, could understand how those installed dependencies are linked. Rules could depend on specific dependencies and rules_nodejs would understand which files those are. This is not a very complicated or error-prone algorithm, and it would help enormously with those issues. (Or you could opt-out of the fine grained dependency management, and basic workflow is still all the same.)

Thoughts @alexeagle ?

gregmagolan commented 6 years ago

@kamalmarhubi @pauldraper Alex and I have a PR up for fine grained npm dependencies via yarn_install/npm_install based on ideas from https://github.com/pubref/rules_node.

This PR defined targets for each node module installed and will allow for the following:

        yarn_install(
          name = "npm",
          package_json = "//:package.json",
          yarn_lock = "//:yarn.lock",
        )

        nodejs_binary(
          name = "my_binary",
          ...
          data = [
              "@npm//:foo",
              "@npm//:bar",
              ...
          ],
        )

gregmagolan commented 6 years ago

@pauldraper We've seen performance issues with large node_module filegroups as well https://github.com/bazelbuild/bazel/issues/5153

pauldraper commented 6 years ago

@gregmagolan :+1: that's exactly what I was thinking.

Looks like proposal is at https://docs.google.com/document/d/1AfjHMLVyE_vYwlHSK7k7yW_IIGppSxsQtPm9PTr1xEo/preview

alexeagle commented 6 years ago

This is in now, note that a breaking change is still coming in https://github.com/bazelbuild/rules_nodejs/pull/337

bazel-contrib / rules_nodejs

Support fine grained dependencies on node_modules #18