Open dgp1130 opened 3 years ago
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage
) if you think this issue is still relevant or you are interested in getting the issue resolved.
I'm definitely still interested in seeing this fixed. To me, it seems like a bug in aquery
, since it's printing an objectively wrong command.
It's a little tricky to fix this; aquery
by design only runs the analysis phase, but the contents of tree artifacts are only known after the execution phase (more to the point: we'd have to actually run the action that produces the tree artifact, which in turn would require us to run the actions it depends on). The --subcommands
flag might be more helpful in this case.
I wonder if we should more clearly mark the directories as a lie in the aquery
output.
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.
Limiting bazel aquery
to the analysis phase seems reasonable. Can I suggest a slight alternative of generating an executable command which dynamically expands the directory? Something like:
$ bazel aquery //:expansion
action 'Action expansion.txt'
# ...
Command Line: (exec /bin/bash \
-c \
'
echo $@ > bazel-out/k8-fastbuild/bin/expansion.txt
' \
'' \
$(bazel-out/tools/expand-tree bazel-out/k8-fastbuild/bin/dir))
Here I added $(bazel-out/tools/expand-tree ${dir})
which would list all the files in the directory and print their full execpaths so this command expands to:
$ bazel aquery //:expansion
action 'Action expansion.txt'
# ...
Command Line: (exec /bin/bash \
-c \
'
echo $@ > bazel-out/k8-fastbuild/bin/expansion.txt
' \
'' \
bazel-out/k8-fastbuild/bin/dir/entry1.js bazel-out/k8-fastbuild/bin/dir/entry2.js)
This would allow us to stay constrained to the analysis phase, but also copy-pasting the command line will run the command identically to the build and align with --subcommands
, even if it needs some indirection to get there.
The two challenges I can think of are:
expand-tree
lists the files in the correct order. I assume the order already needs to be stable so this is probably doable.expand-tree
is available and built. It's not normally a dependency of this target, so even if //tools/expand-tree
somehow exists in the build graph, it won't be built. I'm not sure if @bazel_tools
can help here or exactly how that fits in. Maybe we can ship this binary in there somehow?cmd
/ PowerShell / whatever, which might be tricky to make fully xplat compatible. My example already has exec /bin/bash
, so I feel like Bazel must already be doing that here to some extent?Can I suggest a slight alternative of generating an executable command which dynamically expands the directory?
If you mean a command you can run in substitution of an actual build, I don't think this will work. In the general case there's an entire action (sub-)graph that needs to run to get to the point where the action that produces the directory can itself run, i.e., expand-tree
is basically equivalent to bazel build
.
If you mean a command you can run after the build, it's literally just a script to recursively list the directory contents; I don't see much value in Bazel providing it.
I was mainly looking to debug the command being executed. I agree building the directory is probably out of scope and equivalent to bazel build
-ing it. My goal here is:
bazel aquery
doesn't mislead users. I argue the current behavior is actively lying to users about it's arguments, which is worse than just not supporting tree artifacts.You're right that my proposed expand-tree
doesn't do much useful, though I don't think it strictly needs to. Maybe Bazel just prints a find
command. As long as the output is the same was what Bazel will provide during execution (file order might be important), that's fine.
Even if this isn't worth an extra expand-tree
tool and we can't trivially do the same thing with find
, we could at least stop misleading users by changing the syntax to be clear it isn't directly executable. Something like:
$ bazel aquery //:expansion
action 'Action expansion.txt'
# ...
Command Line: (exec /bin/bash \
-c \
'
echo $@ > bazel-out/k8-fastbuild/bin/expansion.txt
' \
'' \
BAZEL_EXPANDED_TREE_OF_bazel-out/k8-fastbuild/bin/dir)
Unfortunately it's tricky to communicate that this isn't executable code and what the equivalent should be, which is why I think an expand-tree
or find
subcommand would be preferred.
On second thought, I think I've misrepresented the complexity of expand-tree
above. In the general case, it's not simply a matter of listing the directory contents, because you can write relatively arbitrary Starlark code to transform them. For example:
def _mapper(f):
if f.path.endswith(".txt"):
# Omit from command line
return None
else:
# Convert to uppercase and prefix with flag name
return "--foo={}".format(f.path.toupper())
...
d = ctx.actions.declare_directory("dir")
...
args = ctx.action.args()
args.add_all([d], map_each = _mapper)
I'm not claiming that this particular example is realistic, just making a point that expand-tree
would need to contain a Starlark interpreter to be generally applicable.
I think it makes sense to communicate to users that the command line is unreliable, either by mangling it into something that clearly won't work (such as a BAZEL_EXPANDED_TREE_OF_
prefix), or by adding an additional is_command_line_reliable
field to the aquery
output. I'm less sure that we should implement expand-tree
; it feels like an awful lot of work for something that requires you to actually run the build to be useful (at which point you might as well use -s
).
I don't remember trying -s
, if that prints the correct output, then I agree it might be better to point users at that. I can see what you mean that since this is dependent on Starlark the mapping information is lost and needs to reanalyze the build. I suspect a Starlark interpreter in expand-tree
wouldn't be enough? You'd need to bazel build
from scratch to get the right inputs.
Edit: Actually map_each
doesn't support closures right? So maybe you could get away with just parsing the file. Is there any way that could depend on content outside the file? Maybe calling a load(...)
statement loaded with --package_path
? An extreme example for sure, but I suspect using an interpreter directly might come with it's own limitations.
The one alternative is to implement something simple like find
for cases where map_each
is not used on the assumption that this is most cases and at least provides partial support. My intuition is that most tree artifact-consuming actions don't use that particular feature, but I don't have any data to back that up. If so, we could at least give partial support for the simple case.
is_command_line_reliable
feels too easy to overlook IMHO. I recommend having something which is invalid Bash in the display command to make clear that it's not a real executable statement and cause an error for users who attempt to run it.
I don't remember trying
-s
, if that prints the correct output, then I agree it might be better to point users at that.
Yes, -s
will print a fully working command line, with the caveat that it will use the execroot as the working directory, which is non-hermetic; the actual action might run inside a sandbox, which -s
doesn't recreate (and neither does aquery
).
I can see what you mean that since this is dependent on Starlark the mapping information is lost and needs to reanalyze the build. I suspect a Starlark interpreter in
expand-tree
wouldn't be enough? You'd need tobazel build
from scratch to get the right inputs.
You always need to run bazel build
to establish the precondition that the action inputs exist and are up to date. The question is whether you then run the command line indicated by -s
or the one provided by aquery
. The only thing expand-tree
could reasonably do (since running a build is unreasonable) is run the Starlark expansion logic on the files that are present in the directory to produce a command line equivalent to -s
.
I meant "to get the right inputs [for the map_each
]". What happens when:
load("//path/to/pkg:defs.bzl", "map_file")
def _mapper(file):
return map_file(file)
def _some_rule_impl(ctx):
dir = ctx.actions.declare_directory(ctx.attr.name)
args = ctx.actions.args()
args.add_all(dir, map_each = _mapper)
# ...
Then if you build with something like --package_path
which contains path/to/pkg:defs.bzl
:
$ bazel build --package_path /some/other/path //foo
So to be correct, I would think expand-tree
would also need that --package_path
arg and makes me think that even evaluating the Starlark isn't that far off from a full analysis from bazel build
. Not sure if other options might affect it too.
Regardless, I agree implementing expand-tree
to be fully correct for the map_each
case is likely infeasible. I think the options here are:
bazel aquery
command line to be clear that the result is not a truly equivalent command line execution.bazel aquery
command line to print a correct version with find
or expand-tree
for "simple" cases (no map_each
).bazel aquery
with expand-tree
for map_each
~ - INFEASIBLE.I feel like 1. is probably easy enough to be worth doing, it's mainly a question of what the right syntax is to best communicate this limitation. 2. is potentially worth it if we can confirm that most args.add_all
invocations on TreeArtifact
objects do not use map_each
, not sure if we have any good data or intuition on that point (I suspect google3 metrics might be skewed since TreeArtifact
was relatively recently added)?
Description of the problem / feature request:
bazel aquery
should expand directories.Feature requests: what underlying problem are you trying to solve with this feature?
I was trying to leverage the expand directories feature of
ctx.actions.arguments().add_all()
, but was encountering bugs. I usedbazel aquery
to inspect the arguments and was confused when the directory was not expanded. I spent a solid hour debugging this before I eventually discovered thatbazel aquery
was lying to me. The directory was being expanded, this was just wasn't being shown inbazel aquery
.Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
We define two rules,
simple_dir()
, which generates a directory of input files, andexpand_directories()
, which uses the directory expansion feature in its command line execution.Now we use them:
Now let's query the action:
Note that only one command line argument is listed:
bazel-out/k8-fastbuild/bin/dir
.But what if we execute the build:
Both files were passed as individual arguments!
bazel aquery
should have shown both of these files in its command line, or at least indicated that the directory is expanded:<expanded directory bazel-out/k8-fastbuild/bin/dir>
.Note that if I add
exit 1
to the end of theexpand_directories()
command and run with--verbose_failures
, I do get a more honest logged command:What operating system are you running Bazel on?
Ubuntu 20.04 via WSL2 on Windows 10.
What's the output of
bazel info release
?Have you found anything relevant by searching the web?
Google searches for "bazel actions add_all does not expand directories" didn't come up with much. Searches for "bazel expand directories aquery" also did not yield much.