bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.01k stars 4.03k forks source link

ctx.actions.symlink + building without the bytes misbehaves on Windows #21747

Open hauserx opened 6 months ago

hauserx commented 6 months ago

Description of the bug:

The issue seems to be that if built normally, the builder_reset\builder.exe created in rules_go below through ctx.actions.symlink (link) is a regular file when run without cache, but is JUNCTION when taken from remote cache.

Looks like junctions for files do not work well under Windows.

Below reproduction on hello world example using rules_go: https://github.com/hauserx/rules-go-startup

After regular build:

> dir C:\b\execroot\_main\bazel-out\x64_windows-opt-exec-ST-13d3ddad9198\bin\external\go_sdk\builder_reset
2024-03-20  15:57         5,992,448 builder.exe

After taking from remote cache:

> dir C:\b\execroot\_main\bazel-out\x64_windows-opt-exec-ST-13d3ddad9198\bin\external\go_sdk\builder_reset
2024-03-20  15:48    <JUNCTION>     builder.exe [C:\b\execroot\_main\bazel-out\x64_windows-opt-exec-ST-13d3ddad9198\bin\external\go_sdk\builder.exe]
>bazel clean
Starting local Bazel server and connecting to it...
INFO: Starting clean.

>bazel build :hello --remote_cache=<remote cache>
Target //:hello up-to-date:
  bazel-bin/hello_/hello.exe
INFO: Build completed successfully, 11 total actions

>bazel clean
INFO: Starting clean.

>bazel build :hello --remote_cache=<remote cache>
INFO: 11 processes: 4 remote cache hit, 7 internal.
INFO: Build completed successfully, 11 total actions

>bazel build :hello
INFO: Analyzed target //:hello (0 packages loaded, 0 targets configured).
ERROR: C:/b/external/io_bazel_rules_go/BUILD.bazel:42:7: GoStdlib external/io_bazel_rules_go/stdlib_/pkg failed: (Exit -1): builder.exe failed: error executing GoStdlib command (from target @@io_bazel_rules_go//:stdlib) bazel-out\x64_windows-opt-exec-ST-13d3ddad9198\bin\external\go_sdk\builder_reset\builder.exe stdlib -sdk external/go_sdk -installsuffix windows_amd64 -out ... (remaining 5 arguments skipped)
Action failed to execute: java.io.IOException: ERROR: src/main/native/windows/process.cc(202): CreateProcessW("C:\b\execroot\_main\bazel-out\x64_windows-opt-exec-ST-13d3ddad9198\bin\external\go_sdk\builder_reset\builder.exe" stdlib -sdk external/go_sdk -installsuffix windows_amd64 -out bazel-out/x64_windows-fastbuild/bin/external/io_bazel_rules_go/stdlib_ -package std -gcflags ""): Access is denied.
 (error: 5)
Target //:hello failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 26.594s, Critical Path: 25.80s
INFO: 3 processes: 2 internal, 1 local.
ERROR: Build did NOT complete successfully

Which category does this issue belong to?

Core

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

Windows

What is the output of bazel info release?

release 7.1.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

Below bug describes some non-deterministic behavior with same symptoms. https://github.com/bazelbuild/bazel/issues/19018 In contrary this bug is fully reproducible and seems to be caused by build without bytes behavior.

Any other information, logs, or outputs that you want to share?

No response

hauserx commented 6 months ago

There is some code where bazel is creating junctions if cannot find target file:

https://github.com/bazelbuild/bazel/blob/8f18d362c852377740cb032b02c42d78b9a44ad0/src/main/java/com/google/devtools/build/lib/windows/WindowsFileSystem.java#L89

laszlocsomor commented 2 months ago

Thanks for the detailed bug report!

I suspect the fix is simply:

-      // Still Create a dangling junction if the target doesn't exist.
-      if (!target.toFile().exists() || target.toFile().isDirectory()) {
-        WindowsFileOperations.createJunction(link.toString(), target.toString());
+      if (createSymbolicLinks) {
+        WindowsFileOperations.createSymlink(link.toString(), target.toString());
       } else {
-        if (createSymbolicLinks) {
-          WindowsFileOperations.createSymlink(link.toString(), target.toString());
+        // Still Create a dangling junction if the target doesn't exist.
+        if (!target.toFile().exists() || target.toFile().isDirectory()) {
+          WindowsFileOperations.createJunction(link.toString(), target.toString());
         } else {
           Files.copy(target, link);
         }

...but I had no time yet to try it out and implement a test.

I believe this would fix the createSymbolicLinks=true case. There's no hope for the other case; junctions are inherently different from symlinks (junctions can only point to directories, and they are always absolute), creating a dangling junction instead of a dangling symlink was just enough to fix https://github.com/bazelbuild/bazel/issues/2474 (in https://github.com/bazelbuild/bazel/commit/6c07525462062d491816a66a94c3db3b2dbc8f98).

tjgq commented 2 months ago

The issue here is that, when building without the bytes, we create the symlink before the file it points to exists (it will be downloaded later if the symlink is ever consumed as an action input, or is itself materialized as an output).

The createSymbolicLink implementation for Windows creates a junction if the target path doesn't exist, which will fail later because junctions can only point at directories, not files. As Laszlo points out above, we could make it work for the --windows_enable_symlinks case, but that's not a complete solution since not everyone is able or willing to set this flag (it requires administrator privileges).

Possible solutions, in rough order of complexity:

  1. Always fall back to making a copy on Windows if the symlink target doesn't exist.
  2. Disable "building without the bytes" for symlink artifacts (i.e., the target of the symlink is always downloaded, even with --remote_download_minimal. This must happen transitively since symlinks can point to other symlinks.
  3. Plant a dangling junction initially, but replace it with a symlink or copy when the target is later downloaded (might be too complex).