bazelbuild / rules_scala

Scala rules for Bazel
Apache License 2.0
360 stars 273 forks source link

Scalafmt fails with Build without the bytes #1525

Open gergelyfabian opened 10 months ago

gergelyfabian commented 10 months ago

When running a target like this:

# Important to clean, as the files can stay from a previous bazel build with a lower Bazel version.
bazel clean
bazel run //utils/smth:library.format-test

I receive:

utils/smth/src/main/scala/com/dummy/smth/Smth.scala
diff: /home/gege/.cache/bazel/_bazel_user/7f83a0b1f0940f88aec1f4530f927169/execroot/repo/bazel-out/k8-fastbuild/bin/utils/smth/utils/smth/src/main/scala/com/dummy/smth/Smth.scala.fmt.output: No such file or directory

It seems the action responsible for creating this file executes (I have tested disabling writing the output there, but then obviously it fails due to not generating an output), but at a later step the action running scalafmt from the manifest cannot see the scalafmt outputs.

One difference I can see in the bazel-bin folder between Bazel 6.4.0 and Bazel 7.0rc2 (after running bazel run //utils/smth:library.format-test):

# 6.4.0:
ls bazel-bin/utils/smth/utils/smth/src/main/scala/com/dummy/smth/
Smth.scala.fmt.output
# 7.0rc2:
ls bazel-bin/utils/smth/utils/smth/src/main/scala/com/dummy/smth/
[empty]
gergelyfabian commented 10 months ago

Is this a Bazel bug or a rules_scala one?

gergelyfabian commented 10 months ago

Added reproduction in https://github.com/gergelyfabian/bazel-scala-example/ (master branch):

# Change .bazelversion to 7.0.0rc2 and then run:
bazel run //example-maven:example-maven.format-test
gergelyfabian commented 10 months ago

Checking whether this is a bazel issue:

export BAZELISK_CLEAN=true
bazel --bisect=6.4.0..release-7.0.0rc2 run //example-maven:example-maven.format-test
gergelyfabian commented 10 months ago

Checking whether this is a bazel issue:

export BAZELISK_CLEAN=true
bazel --bisect=6.4.0..release-7.0.0rc2 run //example-maven:example-maven.format-test

Trick is that I had to turn off using a java toolchain and remotejdk_17 first, as remotejdk_17 does not work for some commits as bisecting.

gergelyfabian commented 10 months ago
--- Bisect Result

first bad commit is https://github.com/bazelbuild/bazel/commit/9c96529fca4a135c162e35ce2882834b879438fb

That is:

Build without the bytes by default

by changing the default value of `--remote_download_outputs` to `toplevel`.
gergelyfabian commented 10 months ago

Thank you, bazelisk, indeed this seems to be the answer. This works:

bazel run --remote_download_all //example-maven:example-maven.format-test
gergelyfabian commented 10 months ago

Forming the question again, scalafmt internal outputs not working with "build without the bytes" is a bug of Bazel or rules_scala?

gergelyfabian commented 10 months ago

Starting from Bazel 7, the default download mode will be changed from --remote_download_all to --remote_download_toplevel. That is, Bazel will no longer download intermediate outputs of remote actions into the local output base. In return, your remote builds will be faster.

"Using a remote cache or even a disk cache will make actions "remote" from this point of view." (thanks @fmeum)

In my case, disk cache is enabled.

gergelyfabian commented 10 months ago

The information I received on the Bazel slack was that if rules_scala emitted correct information (DefaultInfo(runfiles = ...)) for the outputs it generates (the .fmt.output files), then those should be visible also when using Build without the bytes. Unfortunately I cannot find yet whether the scalafmt phase is generating proper DefaultInfo.

liucijus commented 10 months ago

The information I received on the Bazel slack was that if rules_scala emitted correct information (DefaultInfo(runfiles = ...)) for the outputs it generates (the .fmt.output files), then those should be visible also when using Build without the bytes. Unfortunately I cannot find yet whether the scalafmt phase is generating proper DefaultInfo.

@gergelyfabian could you check if anything might be missing here: https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/phase_scalafmt.bzl#L18?

gergelyfabian commented 10 months ago

The current state where I'm at is this. phase_scalafmt seems to be adding the proper information (on the line you quoted):

        # Return a depset containing all the relevant files, so a wrapping `sh_test` can successfully access them.
        return struct(runfiles = depset([manifest] + files + srcs))

Then, we go to the phase_default_info, and runfiles properly includes the .fmt.output files there (added a print before that):

https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/phase_default_info.bzl#L46

But, when api.bzl runs (https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/api.bzl#L85) it does not get the DefaultInfo that phase_default_info was supposed to return to it. I don't get why.

gergelyfabian commented 10 months ago

Debug output like this:

DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/phase_default_info.bzl:38:10: Returning DefaultInfo with runfiles: depset([<generated file example-maven/example-maven.jar>, <generated file external/maven/com/twitter/algebird-core_2.13/0.13.7/processed_algebird-core_2.13-0.13.7.jar>, <generated file external/maven/com/googlecode/javaewah/JavaEWAH/1.1.7/processed_JavaEWAH-1.1.7.jar>, <generated file external/maven/org/scala-lang/modules/scala-collection-compat_2.13/2.1.6/processed_scala-collection-compat_2.13-2.1.6.jar>, <generated file external/maven/org/scala-lang/scala-library/2.13.12/processed_scala-library-2.13.12.jar>, <generated file external/maven/org/scala-lang/scala-reflect/2.13.12/processed_scala-reflect-2.13.12.jar>, <generated file external/maven/org/typelevel/algebra_2.13/2.0.0/processed_algebra_2.13-2.0.0.jar>, <generated file external/maven/org/typelevel/cats-kernel_2.13/2.0.0/processed_cats-kernel_2.13-2.0.0.jar>, <generated file example-maven/format/example-maven/manifest.txt>, <generated file example-maven/example-maven/src/main/scala/mypackage/Maven.scala.fmt.output>, <source file example-maven/src/main/scala/mypackage/Maven.scala>])
DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/api.bzl:85:22: Detected external providers: {"DefaultInfo": struct(data_runfiles = None, default_runfiles = <unknown object com.google.devtools.build.lib.analysis.Runfiles>, files = depset([<generated file example-maven/example-maven.jar>]), files_to_run = None)}

Debugging code is here: https://github.com/gergelyfabian/rules_scala/tree/bazel-7.0-scalafmt-bug

gergelyfabian commented 10 months ago

Btw. the following command also fails for Bazel 6.4.0: bazel run --remote_download_toplevel //example-maven:example-maven.format-test

gergelyfabian commented 10 months ago

Maybe the necessary information is lost here:

    print("DefaultInfo with runfiles: %s" % depset(direct = direct, transitive = runfiles))
    print("DefaultInfo runfiles after wrapping: %s" % ctx.runfiles(transitive_files = depset(direct = direct, transitive = runfiles), collect_data = True))

Added these debug lines just before the return statement in phase_default_info.bzl.

Prints:

DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/phase_default_info.bzl:38:10: DefaultInfo with runfiles: depset([<generated file example-maven/example-maven.jar>, <generated file external/maven/com/twitter/algebird-core_2.13/0.13.7/processed_algebird-core_2.13-0.13.7.jar>, <generated file external/maven/com/googlecode/javaewah/JavaEWAH/1.1.7/processed_JavaEWAH-1.1.7.jar>, <generated file external/maven/org/scala-lang/modules/scala-collection-compat_2.13/2.1.6/processed_scala-collection-compat_2.13-2.1.6.jar>, <generated file external/maven/org/scala-lang/scala-library/2.13.12/processed_scala-library-2.13.12.jar>, <generated file external/maven/org/scala-lang/scala-reflect/2.13.12/processed_scala-reflect-2.13.12.jar>, <generated file external/maven/org/typelevel/algebra_2.13/2.0.0/processed_algebra_2.13-2.0.0.jar>, <generated file external/maven/org/typelevel/cats-kernel_2.13/2.0.0/processed_cats-kernel_2.13-2.0.0.jar>, <generated file example-maven/format/example-maven/manifest.txt>, <generated file example-maven/example-maven/src/main/scala/mypackage/Maven.scala.fmt.output>, <source file example-maven/src/main/scala/mypackage/Maven.scala>])
DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/phase_default_info.bzl:39:10: DefaultInfo runfiles after wrapping: <unknown object com.google.devtools.build.lib.analysis.Runfiles>
gergelyfabian commented 10 months ago

I'm wondering whether collect_data does what we expect it to do, especially if I read the comment:

https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/phase_default_info.bzl#L43-L46

gergelyfabian commented 10 months ago

I've added some more debugging and I think api.bzl is returning proper DefaultInfo for the rules.

Having https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/api.bzl#L91 (that sets up the final rule output):

I have added a debug line just before that:

print("DefaultInfo before the end: %s" % acculmulated_external_providers["DefaultInfo"].default_runfiles.files)

This prints for me:

DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/api.bzl:93:10: DefaultInfo before the end: depset([<generated file example-maven/example-maven.jar>, <generated file external/maven/com/twitter/algebird-core_2.13/0.13.7/processed_algebird-core_2.13-0.13.7.jar>, <generated file external/maven/com/googlecode/javaewah/JavaEWAH/1.1.7/processed_JavaEWAH-1.1.7.jar>, <generated file external/maven/org/scala-lang/modules/scala-collection-compat_2.13/2.1.6/processed_scala-collection-compat_2.13-2.1.6.jar>, <generated file external/maven/org/scala-lang/scala-library/2.13.12/processed_scala-library-2.13.12.jar>, <generated file external/maven/org/scala-lang/scala-reflect/2.13.12/processed_scala-reflect-2.13.12.jar>, <generated file external/maven/org/typelevel/algebra_2.13/2.0.0/processed_algebra_2.13-2.0.0.jar>, <generated file external/maven/org/typelevel/cats-kernel_2.13/2.0.0/processed_cats-kernel_2.13-2.0.0.jar>, <generated file example-maven/format/example-maven/manifest.txt>, <generated file example-maven/example-maven/src/main/scala/mypackage/Maven.scala.fmt.output>, <source file example-maven/src/main/scala/mypackage/Maven.scala>], order = "postorder")

So, this includes the files that I think should be there, specifically: Maven.scala.fmt.output

So I think the rule definition returns a DefaultInfo that contains what is necessary. Even though it's under default_runfiles.files. When setting up ctx.runfiles (that ends up in DefaultInfo) I tried adding these files into ctx.runfiles(files = ...) , but still it did not help, in either way those files end up in .default_runfiles.files

Is the DefaultInfo constructed by rules_scala faulty, or it's a Bazel bug?

Updated debugging code in https://github.com/gergelyfabian/rules_scala/tree/bazel-7.0-scalafmt-bug.

gergelyfabian commented 10 months ago

Opened Bazel bug: https://github.com/bazelbuild/bazel/issues/20099

gergelyfabian commented 10 months ago

Quoting a response from the bazel issue:

There are a couple of odd things going on here, and I'm not sure that any of them are Bazel's fault.

It appears that you're invoking bazel run on a generated file, not an executable rule:

$ bazel query --output=label_kind //example-maven:example-maven.format-test
generated file //example-maven:example-maven.format-test

I don't know why bazel run allows a plain file as an argument, but it seems reasonable to me that running a file won't give access to the runfiles; files have no associated runfiles, rules do.

That brings us to another problem, which is that scala_library isn't an executable rule, so it's not possible to bazel run it either:

$ bazel query --output=label_kind //example-maven:example-maven
scala_library rule //example-maven:example-maven
$ bazel run //example-maven:example-maven
ERROR: Cannot run target //example-maven:example-maven: Not executable

So my suggestion would be to (1) mark scala_library as an executable rule, (2) return the formatter in the DefaultInfo.executable field alongside the runfiles, (3) make sure the formatter looks for the runfiles in the right location (perhaps by using a runfiles library for your implementation language), and finally (4) bazel run the scala_library directly.

As to why you're only running into this now: as it currently stands, your bazel run command is effectively non-hermetic, but bazel run(unlike build actions) doesn't provide a safeguard against non-hermeticity; if Bazel downloads all build outputs, your program will find them anyway, but Bazel has no idea that the dependency is there, and won't realize that they must be downloaded when building without the bytes.

Originally posted by @tjgq in https://github.com/bazelbuild/bazel/issues/20099#issuecomment-1802513312

alexeagle commented 4 months ago

Hey @gergelyfabian I'm curious if you've tried this with https://github.com/aspect-build/rules_lint We support scalafmt and shouldn't have any of these bugs that Tiago pointed out there.

gergelyfabian commented 4 months ago

I experienced some issues when trying to apply the latest version of rules_lint, and also it seems it's ~3 times slower on our monorepo than running rules_scala's scalafmt (with parallelized targets), so I think we'll be staying with that.