bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.21k stars 4.06k forks source link

ctx.expand_location doesn't handle multiple files with spaces in filenames #10309

Open alexeagle opened 4 years ago

alexeagle commented 4 years ago

In rules_nodejs we have a rule which wants to behave like genrule, so it uses ctx.expand_locations.

However that function always returns a string. If you call it with $(locations :foo) you get back a string with a space-separated list of paths.

https://docs.bazel.build/versions/master/skylark/lib/ctx.html#expand_location does document this behavior.

However it makes it impossible to pass the result as argv to a program - you either get a single argument with all the files if you pass along the string result, or you have to split on spaces to get multiple argv, so that you can no longer correctly handle filenames with spaces.

genrule() does the right thing so it should be possible to do the right thing with a rule as well.

As a rough repro consider

BUILD

load(":my_rule.bzl", "my_rule")

filegroup(
    name = "things",
    srcs = ["thing1", "thing2"],
)

genrule(
    name = "works",
    srcs = ["things"],
    outs = ["yay"],
    cmd = "cat $(locations things)  > $@",
)

my_rule(
    name = "broken",
    srcs = ["things"],
    outs = ["boo"],
    arg = "$(locations things)",
)

where my_rule.bzl is sth like

def _impl(ctx):
    xpand = ctx.expand_location(ctx.attr.arg, ctx.attr.srcs)
    print("expand_location", type(xpand), xpand)
    ctx.actions.run_shell(
        inputs = ctx.files.srcs,
        outputs = ctx.outputs.outs,
        command = "cat",
        arguments = [
            xpand,
            ">",
            ctx.outputs.outs[0].path
        ]
    )

my_rule = rule(_impl, attrs = {
    "srcs": attr.label_list(),
    "outs": attr.output_list(),
    "arg": attr.string(),
})

You can see the effect:

$ bazel build :works -s
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; cat ./thing1 ./thing2  > bazel-out/k8-fastbuild/bin/yay')
Target //:works up-to-date:
  bazel-bin/yay
$ bazel build :broken -s
DEBUG: /usr/local/google/home/alexeagle/Projects/repro/my_rule.bzl:3:5: expand_location string ./thing1 ./thing2
  /bin/bash -c cat '' './thing1 ./thing2' '>' bazel-out/k8-fastbuild/bin/boo)
ERROR: /usr/local/google/home/alexeagle/Projects/repro/BUILD:15:1: output 'boo' was not created

because the arg is './thing1 ./thing2' - you can workaround by splitting on space inside the rule but that's incorrect when files can have spaces.

meteorcloudy commented 4 years ago

I think the issue is that you want to pass two arguments via one element in arguments attribute. Maybe you can avoid this by not using arguments? For exmaple: command = "cat %s > %s" % (xpand, ctx.outputs.outs[0].path),

alexeagle commented 4 years ago

That doesn't really solve the problem though, because if one of the files had a space in the name, we'd still be missing the quotes around it in the interpolated command string, and it would get interpreted as multiple args by the program.

meteorcloudy commented 4 years ago

I think this should work, because ctx.expand_location("//foo:bar") is equivalent to $(location //foo:bar) in genrule, which should correctly quote files with space. I did a experiment with an extra "thing 3" file, the result is here:

pcloudy@pcloudy:~/workspace/my-tests/debug-10309$ bazel build :broken -s
DEBUG: /usr/local/google/home/pcloudy/workspace/my-tests/debug-10309/my_rule.bzl:3:5: expand_location string './thing 3' ./thing1 ./thing2
INFO: Analyzed target //:broken (4 packages loaded, 10 targets configured).
INFO: Found 1 target...
SUBCOMMAND: # //:broken [action 'Action boo', configuration: 3b7735e5f50c87f46f758d8a57e508cac18f76f2c5be2e7251c0415de6f7ed61]
(cd /usr/local/google/home/pcloudy/.cache/bazel/_bazel_pcloudy/25275420eaf9c168494cf9c65df4df72/execroot/__main__ && \
  exec env - \
  /bin/bash -c 'cat '\''./thing 3'\'' ./thing1 ./thing2 > bazel-out/k8-fastbuild/bin/boo')
Target //:broken up-to-date:
  bazel-bin/boo
INFO: Elapsed time: 0.262s, Critical Path: 0.03s
INFO: 1 process: 1 linux-sandbox.
INFO: Build completed successfully, 2 total actions
pcloudy@pcloudy:~/workspace/my-tests/debug-10309$ cat bazel-bin/boo
3
1
2
alexeagle commented 4 years ago

Is it equivalent to $(locations //foo:bar) where that rule has multiple outputs?

meteorcloudy commented 4 years ago

Yes

meteorcloudy commented 4 years ago

Oh, sorry. It should be ctx.expand_location("$(locations //foo:bar)") is equivalent to $(locations //foo:bar) in genrule. Basically, if you take a command in genrule, and use this function to expand, it should work.

kylecordes commented 4 years ago

I ran across this expand_location situation also. If I'm following things right, it returns a space separated; and if those resulting filenames have spaces, they will be quoted. So if I want the actual list of filenames to further process (rather than append directly to a command line), I need to find a parser for such a format, to get the individual filenames back.

This seems like a fragile path - is there such a parser already in the toolkit somewhere in Bazel that I should use? I arrived here while searching for an example of using expand_location, one I spotted was:

https://github.com/bazelbuild/rules_nodejs/blob/1860a6a6d6ee9cc8b7297edda119d8babbf6db4a/internal/common/expand_into_runfiles.bzl#L38

... which does the trivial-but-insufficient split(" "). cc @alexeagle

sgowroji commented 1 year ago

Hi there! We're doing a clean up of old issues and will be closing this one. Please reopen if you’d like to discuss anything further. We’ll respond as soon as we have the bandwidth/resources to do so.

alexeagle commented 1 year ago

I think this is still a bug in Bazel, though I don't have time to recreate the scenario to reproduce it.

fmeum commented 1 year ago

The C++ builtins already contain an almost complete Starlark implementation of ctx.expand_locations at https://github.com/bazelbuild/bazel/blob/ca30372e210a638cfce8334b6dc3396c83424baa/src/main/starlark/builtins_bzl/common/cc/cc_helper.bzl#L789

With only a few changes, it could be made to return an Args object instead of strings, which would solve this problem.

CC @gregestren @aranguyen This is also the solution I have in mind for handling $(execpath ...) with path mapping in Starlark-defined actions.

github-actions[bot] commented 1 week ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.