Closed kamalmarhubi closed 6 years ago
Just to clarify, you have a tree
repo
- our_app
- vendor
- node_modules
- <stuff in here is binary distributions layed out following the npm conventions, same as npm install would do>
Or is your vendored directory actually sources that need to be built?
vastly reduce the work done
do you have evidence that a smaller node_modules/*
input to the build or to individual actions makes it any faster? I could imagine that for remote actions the input size is a problem, but I haven't seen this problem locally
As a workaround, the rules here accept a node_modules
attribute which you could assign to a label that is filegroup
-like, which would let you reduce which inputs the nodejs_binary
uses.
@alexeagle we actually have
repo
├── node_modules
│ └── <stuff in here is binary distributions layed out following the npm conventions, same as npm install would do>
└── ourstuff
where we basically commit results of npm install
to our repo. (This makes the few native modules we use special snowflakes, but that's something I would hope to fix via bazel.)
The concern I originally had with the monolithic node_modules
dependency is that each executable / script in our repo only depends on only a subset of the hundreds of modules in node_modules
. By having a single node_modules
target, any update to any module will require a rebuild / retest of the whole repo, instead of only those targets dependent on the changed or added module.
I don't have any measurements, as there are no build rules for coffeescript, which is what a substantial portion of the repo is written in. However your question made me take a closer look at our commit history. It looks like node_modules
changes less often than I'd originally thought, so this is probably not a huge issue during development. From an artifact size perspective, however, if we package nodejs_binary
rules for distribution they'll end up being hundreds of MB if they use even one module. There may still be some value in allowing dependencies on individual modules.
Curious to hear your thoughts!
for nodejs_binary
deployments on the server we don't care that much, a few hundred MB is the standard size for Java programs in a large monorepo with a lot of code sharing and a deep tech stack.
For a presumed npm_bundle rule (#10), none of the node_modules
would be included, only the package.json
file.
Note that you can have multiple node_modules_xx
targets, each a filegroup with a manually curated set of glob patterns, to satisfy your use case - but I think the manual curation won't be practical over time or at scale. Maybe that can be a workaround if we do discover some reason that the whole of node_modules
can't be an input.
There's a scheme in https://github.com/dropbox/rules_node to generate the bazel workspace from the package.json, which you might want to check out.
For an idea of "how big are we talking", take https://angular.io/guide/quickstart . Following the instructions, I get a node_modules directory of 335 MB. Given that's just the recommended "getting started" Angular app, I think it's not unlikely for a combined node_modules/ for a monolithic repo to be better part of a gigabyte.
That causes several problems with rules_nodejs:
(1) can only be fixed by making Bazel-native installs, in the style of https://github.com/johnynek/bazel-deps (Maven) or https://github.com/dropbox/rules_node (npm).
(2), (3), (4), and (5) can be fixed in less drastic ways. npm/yarn can still resolve and install the dependencies, compile native bindings, etc. But Bazel, via the generated lock file, could understand how those installed dependencies are linked. Rules could depend on specific dependencies and rules_nodejs would understand which files those are. This is not a very complicated or error-prone algorithm, and it would help enormously with those issues. (Or you could opt-out of the fine grained dependency management, and basic workflow is still all the same.)
Thoughts @alexeagle ?
@kamalmarhubi @pauldraper Alex and I have a PR up for fine grained npm dependencies via yarn_install/npm_install
based on ideas from https://github.com/pubref/rules_node.
This PR defined targets for each node module installed and will allow for the following:
yarn_install(
name = "npm",
package_json = "//:package.json",
yarn_lock = "//:yarn.lock",
)
nodejs_binary(
name = "my_binary",
...
data = [
"@npm//:foo",
"@npm//:bar",
...
],
)
@pauldraper We've seen performance issues with large node_module filegroups as well https://github.com/bazelbuild/bazel/issues/5153
@gregmagolan :+1: that's exactly what I was thinking.
Looks like proposal is at https://docs.google.com/document/d/1AfjHMLVyE_vYwlHSK7k7yW_IIGppSxsQtPm9PTr1xEo/preview
This is in now, note that a breaking change is still coming in https://github.com/bazelbuild/rules_nodejs/pull/337
We have a monorepo with a vendored
node_modules
tree. If we evaluate bazel for this codebase, our ideal use would be to allow depending on individual modules in library targets. This would let our build take advantage of bazel's sandboxing and knowledge of dependencies to vastly reduce the work done, while maintaining reproducibility.These rules appear to discourage fine-grained dependencies on modules, which will result in lots of wasted builds whenever a dependency is added, updated, or removed. Any suggestions on how to handle our use case / info on planned changes that might make it easier?