bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.29k stars 4.09k forks source link

Windows Server Data Deduplication Incorrectly Interpreted as Symlinks #19437

Open bdunkin opened 1 year ago

bdunkin commented 1 year ago

Description of the bug:

Problem

When running Bazel on a Windows server with Data Deduplication turned on (see https://learn.microsoft.com/en-us/windows-server/storage/data-deduplication/overview), https://github.com/bazelbuild/bazel/blob/478c7e06924a4f277d2252610b8f675fc6faa6f7/src/main/native/windows/file.cc#L95 will interpret deduplicated files as symlinks. When the file is then read, https://github.com/bazelbuild/bazel/blob/478c7e06924a4f277d2252610b8f675fc6faa6f7/src/main/native/windows/file.cc#L533 will return "Unknown link type".

IsSymlinkOrJunction only checks if the file is a reparse point, not that the reparse point is a symlink or junction. All reparse points listed here except for IO_REPARSE_TAG_SYMLINK and IO_REPARSE_TAG_MOUNT_POINT (which are handled explicitly in ReadSymlinkOrJunction) are handled incorrectly.

This issue was brought up in https://github.com/bazelbuild/bazel/issues/11237 but for IO_REPARSE_TAG_DFS reparse points. In it, there was a suggestion to change IsSymlinkOrJunction to look at the reparse tag in addition to the reparse point flag, but it was never committed. I believe this change would address the problem.

Context

My company wants to use Bazel to compile their product, however we are intermittently getting error like:

[0 / 5] [Prepa] BazelWorkspaceStatusAction stable-status.txt
ERROR: {obfuscated path}/BUILD:16:10: {obfuscated target}: error reading file '{obfuscated path}': Cannot read link (name={obfuscated path}): unknown link type
ERROR: {obfuscated path}/BUILD:16:10: 1 input file(s) are in error
Target {obfuscated target} failed to build

After investigating, all files that are reported as "unknown link type" have been deduplicated by Windows after the deduplication delay.

This is preventing us from adopting Bazel, as we can't have intermittent errors like this. We can't just turn off file deduplication because it saves us a very appreciable amount of disk space.

Which category does this issue belong to?

Core

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

On a Windows Server machine with deduplication turned on for a volume, build anything with Bazel after the files have been deduplicated (there is a configurable delay for deduplication).

Which operating system are you running Bazel on?

Windows Server

What is the output of bazel info release?

release 6.1.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

This does not appear to be a regression. It just never worked.

Have you found anything relevant by searching the web?

https://github.com/bazelbuild/bazel/issues/11237 https://github.com/bazelbuild/bazel/issues/7907

Any other information, logs, or outputs that you want to share?

No response

fmeum commented 1 year ago

@bdunkin Would you be interested in submitting a PR to fix the reparse point handling? Your analysis is very convincing.

bdunkin commented 1 year ago

My company does not have a CLA that would allow me to submit a PR at this time. I don't know if/when that would change, so it might be faster if someone who already does have a CLA submit a fix.