Support yarn/npm7 workspaces

bugzpodder commented 6 years ago

Would yarn workspaces work with this rule? I was playing around with 0.10.0 recently but didn't get very far in trying to setup yarn_install. Would a future version support this?

alexeagle commented 6 years ago

@clydin from our team has been looking at yarn workspaces.

As I understand it, we have the same features as yarn workspaces without actually using it. You can have multiple directories with a package.json file, run a command to install all their dependencies, and publish individual directories to npm. But to have one package depend on another, you don't need package.json for this, as the deps of the consuming target can point directly to the output of the providing target.

tristanz commented 5 years ago

@alexeagle I think you'll still run into problem supporting both yarn workspaces (e.g. for interactive development with livereload) and Bazel. To support a local yarn install, you need the subpackages in your package.json, but this will break the yarn_install in WORKSPACE since those private subpackages won't be published to npm.

Do you have advice on how to support both a native yarn workflow and Bazel workflow from a monorepo with multiple packages?

Toxicable commented 5 years ago

@tristanz I think the idea here is that you don't isntall your sub packages with a yarn install you just reference their outputs directly

trin0491 commented 4 years ago

If bazel supports yarn workspaces, could that be used to solve the following issue:

Assume a monorepo has two 'top-level' projects that are separately deploy able, e.g. 'app1' and 'app2' and they both depend upon a library 'lib1'. Lib1 is typescript library with a package.json that has a dependency on a third party library, e.g. lodash. Each of the applications also have package.json files with lodash but they also add another dependency, e.g. angular. The two apps must have a consistent version for lodash but they may want to have different versions for angular at times, e.g. so they can upgrade independently. Bazel supports supports multiple yarn_install/npm_install calls to create the workspaces but ts_library from app1 would prevent a dependency on lib1 because it will throw the 'All npm dependencies need to come from a single workspace" error and I don't think a single npm_install/yarn_install' workspace can contain two versions of angular so we can't use one package.json?

Is there some other solution to this or would we need the yarn workspace capability? Thanks.

DylanVann commented 4 years ago

There's probably a lot of people using Lerna / Yarn workspaces currently that would be interested in Bazel but need to keep workspaces working while using Bazel for some things.

This is currently blocked.

DylanVann commented 4 years ago

I think you'll still run into problem supporting both yarn workspaces (e.g. for interactive development with livereload) and Bazel. To support a local yarn install, you need the subpackages in your package.json, but this will break the yarn_install in WORKSPACE since those private subpackages won't be published to npm.

Assuming Bazel is delegating to Yarn how it says it is Yarn handles doing a symlink for packages within the monorepo instead of trying to download them.

migueloller commented 4 years ago

It looks like currently Bazel doesn't blow up when using Yarn workspaces, but the generated @npm repository will only have the top-level node_modules packages included. If there's a nested node_modules folder in a yarn workspace package, it will be ignored. For example, given the directory structure below, if app and lib both depend on a different version of some external library, Yarn will properly have the two versions, one at the top-level node_modules and one inside app or lib. The runfiles set up with the @npm repository, though, will always point to the top-level node_modules, regardless if the runfiles are declared in the packages/app Bazel package or packages/lib Bazel package. If each Bazel package got runfiles set up the same way Yarn does and let Node module resolution take care of the rest, then it should work as expected.

project
├── BUILD
├── WORKSPACE
├── package.json
├── packages
│   ├── app
│   │   ├── BUILD
│   │   ├── index.ts
│   │   └── package.json
│   └── lib
│       ├── BUILD
│       ├── index.ts
│       └── package.json
└── yarn.lock

migueloller commented 4 years ago

Could the Node linker be reworked so that instead of having a single node_modules in the runfiles, it has one per package? I guess this is what it would mean to "add Yarn workspaces support". It would be nice if rules_nodejs could do this by itself and use of Yarn workspaces wasn't required. Delegating to the package manager, in this case Yarn, seems more Bazel "native" though, since Bazel stays away from having multiple versions of dependencies.

@alexeagle, if there's a need for contributors I'll gladly help with getting Yarn workspaces support. My team is trying to move from Lerna + Yarn workspaces to Bazel but it's imperative that if two different apps in our repo have two different versions of some transitive dependency, it resolves properly. There's two reasons for this. First, because it would break the application if the wrong version is used of course. And second, because while keeping the same version of a shared library the same across a monorepo is definitely ideal, specially if the library is in the monorepo itself, it's sometimes not possible to make a change across the entire repo.

I more or less understand the source so I might be able to experiment with it if I'm pointed in the right direction.

migueloller commented 4 years ago

I looked more into this. It seems the current rules are pretty close.

# WORKSPACE

# ... prior setup snipped

yarn_install(
  name = "npm",
  package_json = "//:package.json",
  yarn_lock = "//:yarn.lock",
)

When using the repository rule above, one can depend on dependencies as follows...

# packages/lib/BUILD.bazel

ts_library(
    name = "lib",
    srcs = ["index.ts"],
    module_name = "lib",
    deps = [
        "@npm//lodash",
    ],
)

# packages/app/BUILD.bazel

nodejs_binary(
  name = "bin",
  deps = [
    "//packages/lib",
    "@npm//lodash",
  ],
  entry_point = "index.ts",
)

If the Yarn workspace "app" depends on lodash@4 but "lib" dependes on lodash@3, both packages will get v4. That being said, the runfiles at bin.sh.runfiles do contain both versions, buried under the @npm repository. All that's left is either for bin.runfiles/packages/lib/node_modules to be symlinked to bin.runfiles/npm/node_modules/packages/lib/node_modules or for the Node.js linker's require patch to do the proper mapping depending on whether the import is coming from "app" or "lib".

Before coming to this conclusion, I tried some things.

I tried making following change to BUILD.bazel but no luck.

 # packages/app/BUILD.bazel

 nodejs_binary(
   name = "bin",
   deps = [
+    "@npm//lib",
     "@npm//lodash",
   ],
   entry_point = "index.ts",
 )

Also, I played with the generated BUILD file's visibility to expose the @npm//lib:lib_contents target since that one includes the @npm//lib:lib__nested_node_modules target and then doing something like this:

# packages/lib/BUILD.bazel

 ts_library(
     name = "lib",
     srcs = ["index.ts"],
     module_name = "lib",
     deps = [
-        "@npm//lodash",
+        "@npm//lib:lib__contents",
     ],
 )

This will properly add the desired contents to @npm but they still won't be available in bin.sh.runfiles.

One more thing, you can always get to the right lodash version from "lib" by doing something like this:

console.log(require.resolve('../../..')) // resolves to "bin.sh.runfiles"

require('../../../npm/node_modules/lib/node_modules/lodash')

mistic commented 4 years ago

@alexeagle @clydin do you have any news on that topic? I have a pretty similar use case to the one mentioned by @migueloller and considering what he described in his findings it looks like what we currently have is only some step ways from what is needed. It would be really nice to finally have support for this 😃

alexeagle commented 4 years ago

Sorry, no update. We support multiple nested package.json but they each need to be self-contained right now. We don't even have any design thoughts yet for how we would do yarn workspaces, and it might need changes in Bazel to support properly (like with the managed_directories feature)

On Mon, Jun 15, 2020 at 1:47 PM Tiago Costa notifications@github.com wrote:

@alexeagle https://github.com/alexeagle do you have any news on that topic? I have a pretty similar use case to the one mentioned by @migueloller https://github.com/migueloller and considering what he described in his findings it looks like what we currently have is only some step ways from what is needed. It would be really nice to finally have support for this 😃

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_nodejs/issues/266#issuecomment-644380660, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALSI2XJVP7YCHSR2LH3Z3RW2CGJANCNFSM4FNOURJQ .

SGudbrandsson commented 4 years ago

Sorry, no update. We support multiple nested package.json but they each need to be self-contained right now.

@alexeagle do you have a link to an example? I'm looking to solve a similar issue and I'd love to see how smarter people solved it

alexeagle commented 4 years ago

Sure, this repo is an example. Multiple calls to yarn_install in the WORKSPACE file.

migueloller commented 4 years ago

I've spent a very long time attempting to implement a workspace rule for Yarn workspaces. My conclusion is that because with Bazel we want to be explicit about dependencies (i.e., fine-grained deps), and a yarn install with workspaces can result in multiple versions of the same package (i.e., require('lodash') could be a different version depending on the file that calls that), there have to be potentially multiple labels for each dependency. For example, if there are 2 Yarn workspaces foo and bar:

# //foo/BUILD

js_library(
  name = "foo",
  srcs = ["index.js", "package.json"],
  entry_point = "index.js",
  deps = ["@npm//foo/lodash"],
)

# //bar/BUILD

js_library(
  name = "bar",
  srcs = ["index.js", "package.json"],
  entry_point = "index.js",
  deps = ["@npm//bar/lodash"],
)

It is necessary to namespace lodash because foo and bar might depend on different versions of it. Each label can then reference the correct files, something that would not be possible otherwise (i.e., @npm//lodash isn't specific enough if there's 2 versions of lodash).

Now here's problem, though. When using js_binary the Node.js linker that patches require would have to determine which lodash to provide and this is where I get stuck. For example, Go Modules enforces the import specifier to include the major version to solve this diamond dependency issue and Rust's name mangler includes crate versions to do the same. Node.js gives you nothing, just "lodash". And unless the require patch script can somehow detect which file is calling require and then provide the appropriate version, I'm not sure how this issue can be solved. It is 100% solvable, though, because Yarn PnP does it. Perhaps the answer is in the RFC or white paper.

It might be worth mentioning that this isn't an issue with Node.js because it relies in the filesystem for its module resolution. That wouldn't be idiomatic in Bazel, though, so it doesn't seem like a viable alternative.

EDIT: Module._resolveFilename provides the parent module and that can be used to determine which version to provide. So if the module map used by require_patch.js keeps information about which paths require which versions, it's possible to solve the diamond dependency issue and make multiple version work!

@alexeagle, did you have any ideas yet for the design of a potential implementation?

alexeagle commented 4 years ago

Hi @migueloller I was actually thinking about this last night and wrote a bit in #1977 about it.

I don't think we want to get into the business of understanding multiple versions nor the node APIs needed to resolve them. Rather, we want to be 100% mechanical about:

let the user run the package manager under Bazel
with a light touch, control the package manager environment so the stuff it installs goes into an external repository
provide BUILD files for that repository so you can transport a subset of the files into action's execroot / test's runfiles
make sure that by the time the node program runs, the node_modules directories on disk look just like they would have if there was no Bazel

So this means our responsibility should just be about locations on disk where the packages go, first making them different from node idiom (external repo rather than in the source tree) and then making them the same again (linker).

My theory is that if the linker should not always link a single node_modules tree into the $pwd/node_modules. Instead we should remember what subdirectory of your project had the package.json file, and link the node_modules there. Then we are free to have multiple npm/yarn installed external repositories in the deps[] because each will get linked to a different place. And then when node runs, it does the same thing under Bazel as it would outside of Bazel.

Does that make sense? It feels doable but it's a pretty tricky area of the code.

migueloller commented 4 years ago

@alexeagle, yeah that makes sense. If I'm understanding correctly: option 1 would be to patch Node.js require so that nodejs_binary can find the right dependencies at runtime. Isn't there some of that going on right now? Option 2 would be to instead put the node_modules folders in the correct places in the filesystem — likely under runfiles — to achieve the same result.

I agree with you that option 2 is probably the best way to go about this. I took another look at the linker, and if I'm understanding it correctly, does it do the filesystem changes when the binary is executed? Or said a different way, when running a nodejs_binary, before Node.js starts, there's filesystem I/O to either move or symlink node_modules into the right places?

Also, regarding the ability to reference npm/yarn dependencies from multiple external repositories, this would in a way solve the same problem that Yarn workspaces solves and is something that I personally prefer even over Yarn workspaces. It makes sense that if support for multiple package.json is added to the linker, then the node_modules folder could be put in the right place, thus allowing references to different external repositories created by npm_install/yarn_install to be deterministic.

Would you say that these are two separate features, though? One is adding support for Yarn workspaces and the other is adding support for referencing different external repositories for the npm_install and yarn_install rules. Although probably the implementations will overlap.

alexeagle commented 4 years ago

You understand the linker right. It adapts from Bazel semantics of separate sources/outputs/external repo back to node semantics of those all being together.

Yarn workspaces is a special case of multiple repositories and should be done second. The hard part is calling yarn install and then getting the layout in the right places for Bazel.

We should discuss in the next team meeting, ping me on slack if you want to attend

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any activity for 90 days. It will be closed if no further activity occurs in two weeks. Collaborators can add a "cleanup" or "need: discussion" label to keep it open indefinitely. Thanks for your contributions to rules_nodejs!

alexeagle commented 3 years ago

Now that Workspaces in npm has shipped it's more clear that this is a standard and we should understand it better, probably implement support via a new *_install repository rule.

ajanuar commented 3 years ago

Any update for this? 🙏

alexeagle commented 3 years ago

hey @ajanuar - the first blocker was for our linker to understand the multiple node_modules directories created when installing a workspace, this is nearly done in https://github.com/bazelbuild/rules_nodejs/pull/2389 after that lands and ships, then we should be able to tackle this one once we get some more time.

jinfwhuang commented 3 years ago

Now that https://github.com/bazelbuild/rules_nodejs/pull/2389 has merged.

What phase of the work is this issue in?

Is this item actively being worked on?

alexeagle commented 3 years ago

I'm working on a related prototype for how Bazel could fetch only the npm tarballs needed for a build. We're not directly working on supporting workspaces but I think it will be soon. We may need it to support next.js moving to Bazel

On Mon, Feb 15, 2021 at 12:49 PM Jin Huang notifications@github.com wrote:

Now that #2389 https://github.com/bazelbuild/rules_nodejs/pull/2389 has merged.

What phase of the work is this issue in?

Is this item actively being worked on?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_nodejs/issues/266#issuecomment-779441403, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALSI3GSPACSSDI4WSBICTS7GCFLANCNFSM4FNOURJQ .

martaver commented 3 years ago

I find it odd that bazel, which is a tool for builds in monorepos, doesn't play nice with other monorepo management tools like yarn workspaces or nx. What's the ideal 'bazel' way to handle deps across many apps, then? One big package.json file in the root of the repo? package.json per app/lib? no package.json?

I'd love to hear some strong and reasoned opinions about this...

martaver commented 3 years ago

Is there any level of compatibility with yarn workspaces?

I'm playing around with trying to build in a monorepo managed by yarn workspaces, and it seems like the workspaces dirs being symlinked into node_modules cause problems for bazel:

E.g the following structure:

/WORKSPACE
/BUILD
/package.json (defines workspaces as /apps/*)
/node_modules (contains symlink to /apps/www)
/yarn.lock
/apps
   /www. (symlinked to /node_modules)
      /BUILD
     /package.json

Causes the following error:

ERROR: Traceback (most recent call last):
        File "/private/var/tmp/_bazel_martaver/226f0bf14f02eb207aab42d5e85425cf/external/npm/@cleric/www/BUILD", line 868, column 10, in <toplevel>
                filegroup(
Error in filegroup: filegroup rule 'www' in package '@cleric/www' conflicts with existing _js_library rule, defined at /private/var/tmp/_bazel_martaver/226f0bf14f02eb207aab42d5e85425cf/external/npm/@cleric/www/BUILD:571:11
ERROR: /private/var/tmp/_bazel_martaver/226f0bf14f02eb207aab42d5e85425cf/external/npm/BUILD.bazel:7883:11: Target '@npm//@cleric/www:www__contents' contains an error and its package is in error and referenced by '@npm//:node_modules'
ERROR: /private/var/tmp/_bazel_martaver/226f0bf14f02eb207aab42d5e85425cf/external/npm/BUILD.bazel:7883:11: Target '@npm//@cleric/www:www__files' contains an error and its package is in error and referenced by '@npm//:node_modules'
ERROR: /private/var/tmp/_bazel_martaver/226f0bf14f02eb207aab42d5e85425cf/external/npm/BUILD.bazel:7883:11: Target '@npm//@cleric/www:www__nested_node_modules' contains an error and its package is in error and referenced by '@npm//:node_modules'
ERROR: Analysis of target '//apps/www:build' failed; build aborted: Analysis failed

repro here: https://github.com/cleric-sh/experiments-bazel

Seems like almost all monorepo management tools use some kind of symlinks to resolve dependencies between packages... if these symlinks don't work, then what's the 'bazel' way to manage a monorepo and its deps?

adam-thomas-privitar commented 3 years ago

@martaver Ive also attempted this and hit a dead end.

There seems to only be 2, not very good, options:

1) Have 1 lockfile per package and define each node_modules within every package as its own managed directory. Bazel would then manage the deps between your own packages. Fundamentally you then dont get hoisting that npm/yarn workspaces give you with one canonical lockfile so your version management would be very difficult with all those lockfiles constantly changing and having to be manually kept in sync. You'd also experience problems with anything that requires 1 copy (singletons) since node will instantiate multiple instances of the same module if they are in multiple locations on the filesystem. That probably means youd need to maintain a bunch of aliases somehow in your bundler. Youd probably have to forgo any quality of life dev tooling that's mono-repo compatible like vite etc and only dev with production builds, not hot reloading etc. Suspect IDE would hate this as well. Its not really a dev experience you could realistically expect a developer to work with.

2) Break your logical dependency tree and have every dep in the top level workspace. This will make your packages not consumable from the outside if you publish them anywhere for consumption. It also only works if your whole tree has 0 cases of "same package different versions". But since so many NPM packages are transitive (with a typically very-large dep tree), this isnt something you could thats easy to control long term and it may be difficult to enforce. It also feels logically wrong to define all dependencies at the top. At least with this solution, you would be able to benefit from a npm-workspace-like dev environment as youd still use npm/yarn workspaces -- you've basically just coerced it to install node_modules a certain way.

3) Variation of 2). If you have strict_visibility = False in your yarn_install rule you might be able to keep your dependencies in the relevant package.json and rely on npm/yarn hoisting to ensure all the deps end up in top level node_modules. Still need the "no multiple copies of any dep" rule in place. Im not sure bazel will correctly rebuild when a deep package.json changes though if you do this.

4) Ok, I guess theres a third option. Dont have multiple packages and build your entire app in one big monolith module. Not a good idea for obvious reasons.

Yarn/npm workspaces were designed to solve these exact problems and these are industry standard tools that are in most enterprise projects. It therefore works widely with other tooling in the npm ecosystem. I eagerly await compatibility with bazel.

jinfwhuang commented 3 years ago

@martaver You could define one or multiple package.json in repo. They could depend on one another. But the easiest might be just one single package.json. The main downside of that is that a lot of the npm dev toolings, e.e.g IDE, might break.

Here is an example: https://github.com/jinfwhuang/playmono-public/tree/master/nodejs

adam-thomas-privitar commented 3 years ago

@jinfwhuang I think the biggest problem with this is you lose all the good positives of a real workspace like one canonical yarn.lock.

How do you sync versions of things used in more than one package?

What is the dev experience?

What about nodes module resolution when working in dev where youd have each module symlinked -- if package a and b requires package x; and a depends on b; and we use node to execute a; then there 2 copies of x in nodes memory and so singletons get broken. This is killer for things like React projects which rely on there being one copy of context providers etc. Essentially, it rules it out as a possibility.

jinfwhuang commented 3 years ago

That is not ideal. The most attractive solution is to implement support for yarn/npm workspace.

That would preserve much of the bazel and npm/yarn tooling interoperability, which is kind of required if we want a smooth developer experience.

Aghassi commented 3 years ago

I'd like to add a few points to this topic as it seems there are differentiating view points on the solution necessary.

Yarn/NPM Workspace Functionality

Single Module Guarantee : These workspace setups are not perfect and have many pitfalls even though they mask a lot from the developer. For example, you are not guaranteed one version of React , the lockfile can nest different versions if it determines it necessary. On a sufficiently large project, I have hit this exact problem. So this does not solve for that situation because the bundler or build tool can crawl the node_modules tree and find a version of React that is not the one you specified. It is unlikely, but it can happen. In this case, you can still use resolutions field and achieve that all the way down the module tree, but then you are fighting what nested modules specify and are hoping that you don't break them. Fun fact, this is so much a problem that Create React App has a webpack plugin fighting the module resolution system so it works (see https://github.com/facebook/create-react-app/blob/master/packages/react-scripts/config/webpack.config.js#L354-L362)
Linking Modules: Workspaces do link modules that are local, so you are always building off HEAD so to speak. Under Bazel, you get the same guaranteed if you specify the first party package you need in your build file with the multi linker rules @gregmagolan has been working on, so this ability is preserved. However, the benefit you get under Bazel that you don't get under Yarn/NPM Workpsaces is the dependency graph guarantee (meaning rebuilds if A depends on B and B changes, then both B and A rebuild).
Syncing Versions of Shared Tools: Same problem as single module guarantee, so this isn't something that you had before. If you hoisted all dev deps to the root of the workspace, you still have to monitor that nested packages don't end up bringing in a tool. The tool I have used to solve this for us is SyncPack. You can use it with husky to enforce it on a precommit hook. Alternatively, you can have a package under bazel that acts like a CLI and can be invoked to run stuff from a set of shared tools instead of having your packages maintain the build tools themself. Again, this is your preference.
Single Yarn.lock: While it may seem like a benefit, it means under Bazel your entire world changes if the main lockfile in a workspace changes as the lockfile can be used to key off of for installation and setup of the repo. Multiple lockfiles is not only more accurate to your project setup, but faster long term in regards to installs on CI because touching just package A of set A, B, C would only do a yarn install on A (assuming B and C are standalone from A). This is not true in the traditional workspace sense the NPM ecosystem as it stands today since usually workspaces are used to create tools and frameworks. However, in a company circumstance, this is very much the case and is a benefit not a negative.

This being said, I think we still fall short with things like IDE integration and local node_module expectations. That being said, I haven't seen either of those things scale well once you work at a large company because custom tooling and build systems never seem to take those things into consideration. I agree it's not great and should be improved, but arguably also the plugin and IDE ecosystem should be more flexible with their assumptions instead of just taking everything to be a small personal project.

Long term, I expect that most things should move towards a PNPM/Yarn V2 style install to help with speed, so things like a single workspace package won't be terrible and will help more with the single version everywhere problem.

adamscybot commented 3 years ago

@Aghassi Thank you for your well reasoned points. I think I agree that ultimately having a better linker that is not filesystem based like existing ui/node tooling brings benefits. I have no doubt about that.

I think the real problem lies in that Bazel is only a small part of the UI community, and so existing tooling that is very "filesystem based" simply doesn't work because there's so many calls into the FS. Therefore getting a UI project working with Bazel is painful. Its conceptually valid; but practically difficult. Its frustrating, because I totally accept the Bazel approach is a more robust one.

I think its probably fair that Bazel can not be expected to keep up with all the tooling in the UI ecosystem, which is why I think we agree (?) that leaning on Yarn 2 PNP (and perhaps workspaces as a result) might be a solution which increases the adoptability of Bazel.

Enabling Yarn 2 PNP is a big effort in a project as well, for similar reasons to adopting Bazel. But it has a benefit in that it has the firepower in that it offers solutions for common tools, which are reasonably well maintained. The compatibility list is now fairly impressive.

Ultimately, rallying around popular tooling that exists in the UI space seems a realistic approach.

Theres no easy solution -- the community is far into another solution which is flawed. But I think supporting existing package manager constructs will make many lives easier.

martaver commented 3 years ago

I think front-end apps can be considered as a special case, since they’re the top of the dependency hierarchy, and they’re bundled and tree-shaken.

The build artifacts don’t have external dependencies, and will never impose transitive dependencies to another operational module.

There are many scenarios in the world of NPM where I need to legitimately use both modules A and B, which both depend on different, incompatible versions of X. There isn’t always a compatible resolution either.

In some cases, incompatible X might be a deal breaker like React, but usually it’s some tiny utility lib like uuid. In these cases, if it means two versions of X get bundled, so be it… all I care about is that the bundle works at runtime as intended.

Personally, I’m willing to sacrifice on Single Module Guarantee for the sake of developer experience and productivity.

martaver commented 3 years ago

I just re-read and realised how out-of-tune my last post was... for context, when we talk about broken tooling and developer experience, it's mostly a problem with front-end tooling. For nodejs apps and libs, the bazel way works fine for me.

roninjin10 commented 3 years ago

I've been thinking about the idea of generating bazel BUILD files programmatically such that a developer can run a JavaScript program that analyzes a yarn workspace package and automatically generates a new BUILD.bazel file for that yarn workspace package after figuring out what specific node modules that workspace will ultimately resolve to using require.resolve and require.resolve.paths. The developer would just need to run this tool (possibly in a watch mode) whenever adding or using a new node_module in a yarn workspace.

molszanski commented 2 years ago

Anyone has a working bazel monorepo setup? Where "workspace" packages depend on each other? Like A -> B; B->C; B ->D. Any tips or ideas?

Maybe we could utilize this approach: https://github.com/mweststrate/relative-deps.

The idea would be to dodge yarn/npm workspaces, rely on packages to be self sufficient, add some bazel-dependency into package.json. And then run yarn install with bazel, after that it copy bazel-dependencies into node_models. Kinda repeat what yarn does, only with Starlark. This way, we can avoid the "root global" node_modules folder dependency that is outside of BUILD.bazel and WORKSPACE files.

molszanski commented 2 years ago

Hm.. This will be really rubbing against the grain with the javascript ecosystem. All the prettier files, tsconfig.extends and alike expect that they can access the parent directory

Aghassi commented 2 years ago

@molszanski our use case is internal, but what @gregmagolan and I have done so far is have a preinstall script that runs Bazel, then copies the Bazel output artifact to a hidden folder which is listed in the yarn workspace. This allows you to "handoff" the workspaces management to yarn, but let Bazel manage all the rebuilding of dependencies. It's not perfect, but it's helping us get to one workspace from many in the future.

adam-thomas-privitar commented 2 years ago

@Aghassi I'm having trouble visualising your workaround, but it definitely sounds interesting. If you were able to publish some kind of example repo I'd be eternally grateful :D.

Aghassi commented 2 years ago

@adam-thomas-privitar I'll discuss with @gregmagolan on monday and see what we can put out since technically it's company code at the moment 😛

tcarrio commented 2 years ago

I have started working on a change to support this by updating the require patch, but I see that the linker is the suggested way to manage modules now. I would like to support both options, so I'm working on this now. I would like to clarify a couple things here around the design prior to moving to far along with the implementation.

How would you see module resolution working with multiple node_modules roots? Bazel is all about hermetic builds and reproducibility is inherent with this, so there needs to be some standardization around how duplicates will be handled (allowed, prioritization, etc.).

Example

For an example. Let's say we want to define a test target to use Stencil. You could have the following in your WORKSPACE file:

# setup stencil spec test dependencies
yarn_install(
    name = "npm_stencil",
    exports_directories_only = True,
    frozen_lockfile = True,
    package_json = "//tools//stencil:dependencies/package.json",
    package_path = "sub_node_modules/stencil",
    yarn_lock = "//tools/stencil:dependencies/yarn.lock",
)

yarn_install(
    name = "npm",
    data = ["//:.npmrc"],
    exports_directories_only = True,
    frozen_lockfile = True,
    package_json = "//:package.json",
    yarn_lock = "//:yarn.lock",
)

And the following in your BUILD file to define a test using the Stencil CLI:

nodejs_test(
    name = "test",
    args = ["test", "--spec"],
    chdir = native.package_name(),
    data = [
        "@npm//rxjs",
        "@npm_stencil//@stencil/core",
        "@npm_stencil//jest",
        "@npm_stencil//jest-cli",
        "@npm_stencil//@types/jest",
        "@npm_stencil//yargs",
        ":src",
        ":stencil.config.ts",
    ],
    entry_point = stencil_cli_path,
)

Here, we will need to pull in certain libraries to test with the Stencil CLI. However, the rest of my project may be standardized on a different version of jest than Stencil requires. In tools/stencil, we have defined a package manifest for Stencil, including the core libraries for building and any required testing libraries, and installed required libraries for my web component from the root package.json as well as the stencil libraries from tools/stencil/dependencies/package.json.

Design

In this example scenario, I would expect either the linker or require_patch methodology to allow me to use these modules from separate roots.

No Duplicates

So long as there aren't duplicates, the solution is fairly straightforward.

For the module linker: Link each of the targets modules into a single root regardless of separate workspaces (which is currently checked for a failed).

For the require patch: The require_patch would simply check both roots for a module, resolving the first one found, and throwing an error in a similar fashion if the module can't be found in any workspace.

Duplicates

With duplicates, the design must make certain decisions which will have impacts to the reproducibility of a build. Supposing that we have defined a dependency, e.g. rxjs, from both @npm and @npm_stencil, how these modules will resolve is another matter. Another scenario might be dependencies of dependencies- where the child dependencies make use of varying versions of the package. Any version change can break compatibility outside of semantic versioning, and there's little control to be had over these child dependencies.

If duplicates are allowed, in order to be reproducible the operation to link these must be a bijective function, and some manner of defining order of precedence will be required. This could be inferred by order of declaration, e.g. the last dependency always wins (⍺, β -> β).

This is somewhat implicit with those using rules that may provide certain dependencies on their own, but reproducible nonetheless, based on the order of modules as given in the data for the target.

Another option is to simply disallow duplicates, which may be a more destructive or limiting approach in the case of child dependencies.

Feedback

I'm hoping to get feedback on what logic would be preferred for this. Let me know what you think of the handling of duplicates.

molszanski commented 2 years ago

@Aghassi and @gregmagolan, thank you for the all the tips. Will start exploring the possibilities.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had any activity for 6 months. It will be closed if no further activity occurs in 30 days. Collaborators can add a "cleanup" or "need: discussion" label to keep it open indefinitely. Thanks for your contributions to rules_nodejs!

tony-scio commented 2 years ago

Still humbly hoping for workspaces support and that this doesn't get auto closed after 4 years.

Aghassi commented 2 years ago

So my previous comment about copying back to the source tree still stands as how we do things. I'm still working with @gregmagolan @alexeagle and @joeljeske about next steps, but there is a strong possibility my team will not remain on yarn long term as we pursue things like pnpm. I'm not sure how that changes the equation, but the concept of workspaces doesn't really apply under the bazel graph it's mostly a source tree concept while you are in an in-between state as far as I know. For now, our preinstall script method has served us well, it's just cumbersome to use unfortunately.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any activity for 6 months. It will be closed if no further activity occurs in 30 days. Collaborators can add a "cleanup" or "need: discussion" label to keep it open indefinitely. Thanks for your contributions to rules_nodejs!

github-actions[bot] commented 1 year ago

This issue was automatically closed because it went 30 days without any activity since it was labeled "Can Close?"

alexeagle commented 1 year ago

this is available in https://github.com/aspect-build/rules_js

adam-thomas-privitar commented 1 year ago

Yep! And just as one data point, I switched over and it resolved everything mentioned in this thread perfectly. I would encourage everyone to switch.

bazel-contrib / rules_nodejs