aspect-build / rules_js

High-performance Bazel rules for running Node.js tools and building JavaScript projects
https://docs.aspect.build/rules/aspect_rules_js
Apache License 2.0
299 stars 102 forks source link

[Bug]: pkg is a directory; dependency checking of directories is unsound #1408

Open UebelAndre opened 9 months ago

UebelAndre commented 9 months ago

What happened?

I have a dependencies on prettier@2.8.8 and after updating to the latest 1.34.1 and Bazel 7.0.0 I get the following error

WARNING: /home/user/code/BUILD.bazel:42:22 //:.aspects_rules_js/node_modules/prettier@2.2.8/pkg is a director; dependency checking of directories is unsound

Version

Development (host) and target OS/architectures:

Output of bazel --version: 7.0.0

How to reproduce

I'm not able to provide too much detail on my workspace but suspect anything which depends at least on `prettier==2.8.8` would repro this problem.

Any other information?

No response

IOrlandoni commented 9 months ago

You can also get the same warnings, on latest (1.34.1) and bazel 7.0.0, by using npm_link_all_packages. That'll give you a warning per package.

UebelAndre commented 9 months ago

Yup, I see that. Every package warns about this.

alexeagle commented 8 months ago

I believe Bazel 7 made this existing warning more noisy for some reason. It would be really useful if someone could bisect where this was introduced in Bazel, maybe that commit message will give us more clues about why it does this.

Adding this to .bazelrc makes the warning go away:

# Allow the Bazel server to check directory sources for changes. Ensures that the Bazel server
# notices when a directory changes, if you have a directory listed in the srcs of some target.
# Recommended when using
# [copy_directory](https://github.com/aspect-build/bazel-lib/blob/main/docs/copy_directory.md) and
# [rules_js](https://github.com/aspect-build/rules_js) since npm package are source directories
# inputs to copy_directory actions.
# Docs: https://bazel.build/reference/command-line-reference#flag--host_jvm_args
startup --host_jvm_args=-DBAZEL_TRACK_SOURCE_DIRECTORIES=1

(from https://docs.aspect.build/guides/bazelrc#correctness-options )

UebelAndre commented 8 months ago

Why not use globs instead? I’d rather not be relying on unsound behavior

opicaud commented 8 months ago

Hey, There is an issue opened on this topic: https://github.com/bazelbuild/bazel/issues/18646 I've tried with build --incompatible_disallow_unsound_directory_outputs=false but no success The issue i have with this on my side, is that the cache is not hit anymore :-(

FYI i am using rules_js only to provide a macro to release component in my monorepo through semantic-release I confirm that this behaviour was not present with bazel 6.4.0

alexeagle commented 7 months ago

@UebelAndre We don't want globs since that produces thousands of input files. We just want one input directory. It's Bazel's fault :) @opicaud I think that's a different issue about directory outputs. This one is about directory inputs.

UebelAndre commented 7 months ago

@UebelAndre We don't want globs since that produces thousands of input files. We just want one input directory. It's Bazel's fault :)

I just don't want this warning and don't want to be required to use some experimental behavior to make them go away :(

alexeagle commented 7 months ago

It would be a massive performance regression to hand Bazel an input file for every file in the node_modules directory, so I don't think we'd consider turning that on for everyone. If you have a specific proposal of how to change it, I guess someone can make a PR on a fork of rules_js to bypass the warning.

jesses-canva commented 7 months ago

@alexeagle Have you considered downloading the archive with http_file/rctx.download and then doing the untar in a build action and consuming the untarred directory in downstream actions? So it stays as one input to downstream actions but isn't a source directory.

SanjayVas commented 7 months ago

The relevant commit to Bazel appears to be https://github.com/bazelbuild/bazel/commit/2fe450fd7f8888e619e38aa0b359073e801f1120, referencing https://github.com/bazelbuild/bazel/issues/18579

This does indeed appear to be also related to outputting directories.

Rules outputting directories (which is not sane, but unfortunately happens)

This implies that the copy_directory rule is itself not sane, as it is a rule that outputs a directory. The general pattern is to use archives (e.g. tar or jar) instead of directories, and extract as-needed downstream.

jesses-canva commented 7 months ago

@alexeagle Alternatively to avoid doing an untar in a build action, you could glob(["**/*"]) up all the inputs and use copy_to_directory to turn them into a single directory and expose that as the directory in node_modules.

jesses-canva commented 7 months ago

Rules outputting directories (which is not sane, but unfortunately happens)

@SanjayVas I believe this is referring to rules which do ctx.actions.declare_file during analysis but then create a directory during execution, such as the genrule in the example on that issue. See the later comments.

A directory properly declared with ctx.actions.declare_directory (as copy_to_directory does here) is well supported.

Silic0nS0ldier commented 6 months ago

@alexeagle Alternatively to avoid doing an untar in a build action, you could glob(["**/*"]) up all the inputs and use copy_to_directory to turn them into a single directory and expose that as the directory in node_modules.

If a performance hit is observed with this change, odds are they can be reclaimed by making changes to Bazel.

Analysis time should perform well (hashing of TreeArtifact instances should be cheaper vs. a plain list of files, it effectively caches a part of the computation).

"Preparation" tasks before execution such as runfiles creation might take a hit if Bazel is symlinking TreeArtifact files instead of the directory. If such a problem exists, it'll likely need an --incompatible_ change in Bazel.

Silic0nS0ldier commented 6 months ago

"Preparation" tasks before execution such as runfiles creation might take a hit if Bazel is symlinking TreeArtifact files instead of the directory. If such a problem exists, it'll likely need an --incompatible_ change in Bazel.

I tested this with an implementation of the suggested change. Bazel symlinks the TreeArtifact directory, so runtime performance seems to check out. Analysis should but will require benchmarking to confirm.

I'll get a PR for this.

alexeagle commented 6 months ago

@jesses-canva that's a clever idea - but currently we need access to some files inside the package earlier (e.g. repository rule needs to read package.json) so it's not enough to delay extraction until an action runs. Perhaps we can extract just some of the files in a repository rule and ALSO extract them again in an action later.

Thanks for tracking down that change @SanjayVas - I'll reach out to @tjgq about our options here to keep the same performance but not trigger the warning.

Silic0nS0ldier commented 6 months ago

Perhaps we can extract just some of the files in a repository rule and ALSO extract them again in an action later.

Patches are applied before package.json is read (enables codegen for bin entries), so extracting just package.json wouldn't really work.

However it is possible to be more frugal. pnpm-lock.yaml will tell us if a package has a bin field.

# ...
packages:
  # ...
  /@aspect-test/a@5.0.2:
    resolution: {integrity: sha512-bURS+F0+tS2XPxUPbrqsTZxIre1U5ZglwzDqcOCrU7MbxuRrkO24hesgTMGJldCglwL/tiEGRlvdMndlPgRdNw==}
    hasBin: true
    # ^^^^^^^^^^
    dependencies:
      '@aspect-test/b': 5.0.2
      '@aspect-test/c': 2.0.2
      '@aspect-test/d': 2.0.0(@aspect-test/c@2.0.2)
    dev: true

This means for a typical scenario (out of all dependencies, few have hasBin: true).

On paper performance should be close with potential for savings seen in certain scenarios (since extraction can potentially be skipped).

Stats for perspective (pulled from Rules JS pnpm-lock.yaml).


If desiring to removal all extraction from repository generation, it may be doable with an API change (breaking change).

The generated lambdas and other symbols do not appear to need much information about the package. It may be possible to generalise them such that the user is responsible for specifying which binary to use (information which must already be known by users).

# From Rules JS, //e2e/bzlmod:BUILD.bazel
load("@npm//:jasmine/package_json.bzl", jasmine_bin = "bin")

jasmine_bin.test(
    name = "jasmine_test",
    bin = "jasmine",
    # ^^^^^^^^^^^^^^
    args = ["*.spec.js"],
    data = ["test.spec.js"],
    # jasmine doesn't know to run without runfiles
    target_compatible_with = not_windows,
)

With the user provided bin attribute (normally present in the macro name, e.g. jasmine_bin.jasmine_test) it can be used to construct the same values currently eagerly computed.

gregmagolan commented 5 months ago

This has been "mostly" mitigated in the latest rules_js release now that for most npm package the input to the graph is the .tgz file and it is extracted into the package store.

See the PR summary of https://github.com/aspect-build/rules_js/pull/1538 and this comment on #1412 for more info.

Since there are still cases where a source directory input is required for packages with lifecycle hooks and/or patches this issue should stay open. The lifecycle and patches cases will be fixed in the future.

rickystewart commented 1 month ago

Is there any ETA on the lifecycle hooks/patches part of this? In CockroachDB, the newer version of rules_js quashes a lot of these warnings, but there are still quite a lot, mostly in the various esbuild- packages.