Open thesayyn opened 3 months ago
cc @tjgq
Note that the files that symlinks point to are not in the inputs of the action that creates the symlink, and i don't think it matters much.
I think that might be the problem here. Building without the bytes uses a virtual filesystem to compute the metadata for action outputs, and that virtual filesystem only has access to the action's declared inputs, for both hermeticity and efficiency reasons. (If, on the other hand, you can repro this even when the symlink can be resolved given only access to the action's inputs, I'd consider it a bug.)
(To further clarify: if I understand the scenario correctly, your build is already non-hermetic; without the input dependency, nothing guarantees that the output pointed to by the symlink is produced first, or even that it's produced at all. Building without the bytes only makes the problem easier to detect.)
Sorry, i was not arguing if this is bug or not, i am simply trying to understand it and see if it's me who's doing something really horrible or it's a Bazel issue.
I think that might be the problem here.
I agree, that's indeed the culprit, and i understand why one might consider it not a bug. It's just odd given the action succeeds with other spawn strategies, even with RBE + --remote_download_toplevel
, With --remote_download_toplevel[all]
it's not using a VFS i assume?
your build is already non-hermetic; without the input dependency, nothing guarantees that the output pointed to by the symlink is produced first
It is produced because the files which symlinks point to are in the DefaultInfo
of the target, and the action doesn't care if the target file symlink points to is produced first because it eventually is. I would think that what being done here is not a lot different than using declare_symlink + symlink(target_path = ...)
.
Also since we are talking about hermeticity, do you think it's more appropriate to put these unused inputs into unused_inputs_list
attribute? (Okay i don't think it's quite what i thought it was)
Am i doing something really weird here?, because i don't fully understand what's fundamentally wrong here.
FWIW, we have already tested this feature with a few large codebases and it seems to work.
It's just odd given the action succeeds with other spawn strategies
I agree that it's unfortunate that the behavior depends on the contents of the output tree. We could maybe do a better job of flagging these symlinks as non-hermetic (assuming it can be done with a reasonable amount of overhead).
With --remote_download_toplevel[all] it's not using a VFS i assume?
It's an overlay filesystem that falls back to the actual filesystem; in a purely "without the bytes" build, there's nothing to fall back to. (It should probably only fall back for files that have been determined to be in the input set; this would make the behavior consistent, but is tricky to implement in an efficient manner.)
the action doesn't care if the target file symlink points to is produced first because it eventually is
This is simply not how Bazel works; the outputs of an action must be available when the action finishes, not later. Among other reasons why, Bazel needs to hash the tree artifact contents to be able to determine whether and how to run followup actions that depend on it.
I would think that what being done here is not a lot different than using declare_symlink + symlink(target_path = ...).
It's different because declare_symlink
doesn't care about file contents; declare_file
and declare_directory
do (and in order to determine the contents, they must follow symlinks).
Also since we are talking about hermeticity, do you think it's more appropriate to put these unused inputs into unused_inputs_list attribute? (Okay i don't think it's quite what i thought it was)
It won't help;unused_inputs_list
is unrelated to the issue you're having.
Am i doing something really weird here?, because i don't fully understand what's fundamentally wrong here.
I worry that you might be trying too hard to optimize things by creating symlinks (and running into non-intuitive problems as a result). Would it perhaps be acceptable to copy files around instead, and rely on the various caching mechanisms built into Bazel to keep your builds fast?
Thanks @tjgq
copy files around instead
Note that they are often huge. PyTorch with cuda is multiple GB by itself and some clients need to build that into multiple OCI images.
But I think it's simply unavoidable given your analysis here
Description of the bug:
With rules_oci 2.x we implemented a feature where we create symlinks inside of a tree-artifact that points out of the tree-artifact (https://github.com/bazelbuild/bazel/issues/20891) to files created by other actions and this works fine with
RBE
andLocal
execution.Note that the files that symlinks point to are not in the inputs of the action that creates the symlink, and i don't think it matters much.
However, when the same action run on RBE with
--remote_download_minimal
flag, it fails with the message below.Here's the Bazel slack channel threads which might be helpful: https://aspect-build.slack.com/archives/C04281DTLH0/p1716397067812139
Which category does this issue belong to?
No response
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
1- clone https://github.com/bazel-contrib/rules_oci/tree/2.x 2- run bazel test //examples/... --remote_download_minimal on RBE
Which operating system are you running Bazel on?
Darwin
What is the output of
bazel info release
?7.1.1
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
I don't think is a regression.
Have you found anything relevant by searching the web?
Probably related https://github.com/bazelbuild/bazel/issues/15454
Any other information, logs, or outputs that you want to share?