facebook / buck2

Build system, successor to Buck
https://buck2.build/
Apache License 2.0
3.56k stars 219 forks source link

Best way to define a pure Runnable from a command line? #665

Open cbarrete opened 4 months ago

cbarrete commented 4 months ago

Is there a recommended way of creating a target from something like a cmd_args or a string, that only has a RunInfo, using the standard prelude?

I've tried using genrule for this like so:

genrule(
    name = "run_os",
    executable = True,
    out = "dummy",
    cmd = "touch dummy && $(exe :qemu) -cdrom $(location :os_image) -monitor stdio",
)

But this has a couple of major downsides:

I've considered using a sh_binary instead, but I like the conciseness of defining everything in a single rule invocation, and I have other concerns mentioned in https://github.com/facebook/buck2/issues/664.

For now I'm going to write something like a runnable rule that will do what I want, but I was wondering if there was a better way that I missed.

cbarrete commented 4 months ago

For anyone interested in a workaround, the following works, although it doesn't guarantee hermeticity in any way:

runnable = rule(
    impl = lambda ctx: [DefaultInfo(), RunInfo(args = cmd_args(ctx.attrs.cmd))],
    attrs = {
        "cmd": attrs.list(attrs.arg()),
    },
)

Usage:

runnable(
    name = "run_os",
    cmd = ["$(exe :qemu)", "-cdrom", "$(location :os_image)", "-monitor", "stdio"],
)
mohamed-abdelnour commented 2 weeks ago

Hello, all! I'm leaving this here in case anyone is interested: I recently used Buck2 for formatting and linting cxxrs. The general idea is:

  1. each subcell (owned by a build file) defines targets that group its files by type—for example, a rust group: <cxxrs/packages/buck/cxxrs/tag.bzl>;

  2. at the root, targets are defined that perform particular actions on such groups—for example, executing rustfmt on all rust groups: <cxxrs/packages/buck/cxxrs/root.bzl>, <cxxrs/BUCK>; and

  3. the cxxrs_dev.buck_incremental_command rule then executes ::cxxrs_dev::action::buck::IncrementalCommand to perform an action incrementally as documented here.

For context, here are the targets that end up being defined at the root. ```bash buck2 targets --config=cxxrs.allow_slow=true : 2>/dev/null ``` ```text root//:action.clang-format root//:action.clang-tidy root//:action.deadnix root//:action.nixfmt root//:action.prettier.json root//:action.prettier.yaml root//:action.ruff root//:action.rustfmt root//:action.shellcheck root//:action.shfmt root//:action.taplo root//:cxxrs_tag.by_type.cxx root//:cxxrs_tag.by_type.json root//:cxxrs_tag.by_type.nix root//:cxxrs_tag.by_type.rust root//:cxxrs_tag.by_type.shell root//:cxxrs_tag.by_type.starlark root//:cxxrs_tag.by_type.toml root//:cxxrs_tag.by_type.yaml ```

It's a very basic implementation, but it can be useful as a starting point for anyone interested. I know implementing the cxxrs_dev binary in Rust is an odd choice, but it can be easily implemented in Python like other scripts used in the prelude.

Feel free to let me know if you have any questions!


A note on hermeticity (#664): if you're using Nix, you can write derivations for runnables you want to run hermetically and have Buck2 invoke them by referring to their absolute paths in the Nix store. That is, isolating a runnable and "bundling" it with its dependencies (including resources and the like) is a strength of Nix that can be useful here. Interactions with caching may be a little tricky (i.e., having Buck2 determine when to rebuild these or update the absolute paths it uses to refer to them), but other than that I'd expect an implementation to be reasonably straightforward.

cbarrete commented 2 weeks ago

That's an interesting approach. So far I've settled on doing linting/formatting via BXL scripts. The ergonomics aren't ideal due to limitations in Buck2's core, which I've opened issues for, but it works pretty well as it leverage parallelism and caching from Buck2.

How do you scale your approach though? Rules (specifically filegroup in this case) don't have visibility over sources that are owned by more specific BUCK files, even with globbing. Also some files you might want to format/lint may not be sources (e.g. BUCK files). Finally, it looks like your actions affect the whole repo, which would quickly get out of hand for a large enough repo (of course this is fine for many use cases).


R.e. Buck2 + Nix indeed, it's very easy to get started:

# Could come from somewhere else.
NIXPKGS_FLAKEREF = "github:nixos/nixpkgs/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

def _nix_runnable_impl(ctx: AnalysisContext):
    nix_store_link = _nix_package_impl(ctx)
    binary_names = ctx.attrs.binary_names or [ctx.attrs.name]

    if len(binary_names) == 1:
        return [DefaultInfo(), RunInfo(args = cmd_args(nix_store_link.default_outputs[0], format = "{}/bin/" + binary_names[0]))]
    else:
        sub_targets = {}
        for binary_name in binary_names:
            sub_targets[binary_name] = [DefaultInfo(), RunInfo(args = cmd_args(nix_store_link.default_outputs[0], format = "{}/bin/" + binary_name))]
        return [DefaultInfo(sub_targets = sub_targets)]

nix_runnable = rule(
    impl = _nix_runnable_impl,
    attrs = {
        "package_name": attrs.option(attrs.string(), default = None),
        "binary_names": attrs.option(attrs.list(attrs.string()), default = None),
        "flakeref": attrs.string(default = NIXPKGS_FLAKEREF),
    },
)

But this doesn't play nicely with remote execution. My usages of RBE and Nix are mutually exclusive right now, so I haven't explored this yet.

Note also that writing some nix2buck tool that takes a Nix file as input and generates Buck targets for various languages (like Reindeer does for Rust and we do at my dayjob for Conda) should not be too much work (although I don't know if there is a compelling use case for that given the that pinning Nixpkgs already means that any set of nix_package/nix_runnable targets would be consistent out of the box.

mohamed-abdelnour commented 2 weeks ago

I haven't explored using BXL so far; thanks for the pointer.

How do you scale your approach though? Rules […], even with globbing.

Yes, each build file tags its own sources (at least each build file whose sources we want to act on). Notice how most BUCK files in the project call [cxxrs_tag.known] ([example][0]). I'm doing this even for BUCK files that are generated by Reindeer: [][1] [][2].

Also some files you might want to format/lint may not be sources (e.g. BUCK files).

I'm not sure I follow; would you please clarify? For what it's worth, I am formatting BUCK files, and it's working as intended.

Finally, it looks like your actions affect the whole repo, […].

Yes, I'm doing that for convenience. It would be simpler to have each BUCK file define it's own actions; we'd do away with having to tag files to begin with (the entire point of doing so is to have files exposed at the root). With that approach, each BUCK file would define an action.<FOO> that performs <FOO> on its sources only, as opposed to having a single action.<FOO> at the root. Whether this split is better or worse is, I suppose, in the eye of the beholder.

I haven't made it clear in my previous comment, but also note that the approach I'm using, albeit simplistic, performs actions incrementally. The initial run performs actions on all (matching) files, but each run after that performs actions on modified (i.e., having different hashes) files only. We'd get that for free if we define a target for each file, but that would quickly get out of hand, of course.

Demonstration ```bash #!/usr/bin/env bash set -o errexit -o nounset -o pipefail _redact_cwd() { sed "s:$PWD::" } _inspect_rustfmt_action() { buck2 build \ --build-report=- \ --config=build_report.unstable_include_other_outputs=true \ --config=cxxrs.log_level=debug \ :action.rustfmt | jq --raw-output0 '.results[].other_outputs.DEFAULT[]' | xargs -0 jq .fields.args } 2>/dev/null _invalidate_2_rust_files() { fd --extension=rs --max-results=2 --print0 | xargs -0 -I{} "$SHELL" -c 'printf // >>{}' } buck2 clean 2>/dev/null { _inspect_rustfmt_action _invalidate_2_rust_files _inspect_rustfmt_action | _redact_cwd } | jq --slurp ``` ```json [ [ "packages/_/ad_hoc/lto_0/src/lib.rs", "packages/rust/cxxrs/lto_1_rs/src/lib.rs", "packages/rust/cxxrs/nix_rs/build.rs", "packages/rust/cxxrs/nix_rs/src/error.rs", "packages/rust/cxxrs/nix_rs/src/lib.rs", "packages/rust/cxxrs/trycmd/src/lib.rs", "packages/rust/cxxrs/trycmd/src/main.rs", "packages/rust/_0/reindeer/src/lib.rs", "packages/rust/_0/reindeer/extern/valuable/src/lib.rs", "packages/rust/cxxrs/dev/src/action.rs", "packages/rust/cxxrs/dev/src/action/buck.rs", "packages/rust/cxxrs/dev/src/error.rs", "packages/rust/cxxrs/dev/src/ext.rs", "packages/rust/cxxrs/dev/src/lib.rs", "packages/rust/cxxrs/dev/src/main.rs", "packages/rust/cxxrs/dev/src/wrapper.rs", "packages/rust/cxxrs/lto_2_rs/build.rs", "packages/rust/cxxrs/lto_2_rs/src/lib.rs" ], [ "/packages/_/ad_hoc/lto_0/src/lib.rs", "/packages/rust/cxxrs/trycmd/src/main.rs" ] ] ```
So, I'd expect this approach to work decently well even for "medium"[^1]-sized projects, especially with a better implementation—there are a bunch of easy improvements to make, but I opted for simplicity. If possible, it would also be nice to cache the state—even the state of a clean repository only, so we don't have to perform that full run initially. --- I agree with you on the Buck2 + Nix front. Unlike Reindeer, that `nix2buck` tool wouldn't have access to structured (meta)data given that flakes can have arbitrary outputs, but we can easily work around this by requiring certain outputs. That is, "to use this tool, your flake _must_ output a value of the form […]"; we could even add a [Flake Parts] module or something to improve that experience. Like you, I don't have an immediate use for any complex integration between Buck2 and Nix, so I also haven't explored this any further; however, you can imagine this being very useful: if we think bigger than formatters and linters, having Buck2 use compiler toolchains (and the like) that are built with Nix would be great for reproducibility. If you use Buck2 from inside of a Nix `devShell` you get some benefits locally, but it's still far from ideal (as you mentioned because of interaction with remote execution, for example). [^1]: Loosely: what most would consider medium-to-large but not _massive_. [0]: https://github.com/mohamed-abdelnour/x/blob/cxxrs/0/packages/rust/cxxrs/dev/BUCK#L4 [1]: https://github.com/mohamed-abdelnour/x/blob/cxxrs/0/packages/rust/_0/reindeer/reindeer.toml#L10 [2]: https://github.com/mohamed-abdelnour/x/blob/cxxrs/0/packages/buck/cxxrs/reindeer.bzl#L4 [Flake Parts]: https://github.com/hercules-ci/flake-parts [`cxxrs_tag.known`]: https://github.com/mohamed-abdelnour/x/blob/cxxrs/0/packages/buck/cxxrs/tag.bzl#L97