facebook / buck2

Build system, successor to Buck
https://buck2.build/
Apache License 2.0
3.58k stars 223 forks source link

How do target configuration constraints work? #184

Open zjturner opened 1 year ago

zjturner commented 1 year ago

I found this code from grepping the repository:

        kwargs["link_style"] = select({
            "DEFAULT": link_style,
            "ovr_config//os:macos": "static",
        })

How does parsing of this "ovr_config//os:macos" string work? That also looks a lot like one of those target patterns, but that clearly is not a target. And it's not the same thing as @prelude//toolchains:cxx.bzl either, because it doesn't have an @ at the beginning, and doesn't refer to a file.

I grepped for ovr_config and found this:

[repository_aliases]
ovr_config = prelude

I guess it's coming from here. But the exact syntax isn't documented. And what is os? What other selectors are there that I could use besides os? It looks like this pattern might be a "target configuration constraint", which I figured out by looking at this piece of code from somewhere else:

def _tp2_constraint(project, version):
    """
    Return the target configuration constraint to use for the given project
    version
    """

    return "ovr_config//third-party/{}/constraints:{}".format(project, version)

but if you type "constraint" into the search box of the documentation page you get nothing.

And how do these map to values from .buckconfig? Concrete example: Let's say I have this in my .buckconfig:

[foo]
bar = baz

How can I do the following two things:

  1. Write a select() statement that passes --compiler-flag-1 if foo.bar == baz, pass --compiler-flag-2 if foo.bar == buzz, and generate a build failure if foo.bar is anything else?
  2. Ignore .buckconfig for the purposes of obtaining a value of foo.bar, and have the user write buck2 build <command-line-argument-that-forces-foo.bar-to-buzz>?
stepancheg commented 1 year ago

How does parsing of this "ovr_config//os:macos" string work? That also looks a lot like one of those target patterns, but that clearly is not a target

It is a target.

Try buck2 uquery ovr_config//os:macos.

what is os?

Directory in ovr_config cell.

What other selectors are there that I could use besides os

Simplified, config_setting rule defines select key.

But yes, documentation could be better.

zjturner commented 1 year ago

Thank you! Would you be able to also provide answers to the two questions at the end of the OP?

stepancheg commented 1 year ago

Would you be able to also provide answers to the two questions at the end of the OP?

Can you please give more concrete examples of what flags do you want to switch?

zjturner commented 1 year ago

Hmm, is that actually important? I chose a hypothetical non-existant flag --compiler-flag-1 on purpose because I'm assuming that the answer should be the same for any arbitrary flag. If the setting in buck config is foo, pass flag 1. If it's bar, pass flag 2. Does it actually m atter what flag 1 and flag 2 are?

stepancheg commented 1 year ago

Hmm, is that actually important?

Yes, it is. Depending on your use case I can suggest one solution or another, or describe our plans to address the issue.

Basically we don't support "flags". Most of the current issue can be addressed by adding a custom configuration with platform rule. But these are not composable. There's workaround with buckconfigs and config_setting depending on that buckconfig, which is partly deprecated, but we don't have a replacement yet.

cjhopman commented 1 year ago

Write a select() statement that passes --compiler-flag-1 if foo.bar == baz, pass --compiler-flag-2 if foo.bar == buzz, and generate a build failure if foo.bar is anything else?

There's a lot that goes into this. A lot of it is described here: https://buck2.build/docs/rule_authors/configurations/

Here's one way to define a platform and a select such that the select() will resolve as you ask when in the configuration defined by that platform:

constraint_setting(
  name = "foo-bar"
)

constraint_value(
  name = "foo-bar-baz",
  constraint_setting = ":foo-bar",
)

constraint_value(
  name = "foo-bar-buzz",
  constraint_setting = ":foo-bar",
)

foo_config = read_config("foo", "bar", None)

platform(
  name = "my-platform",
  constraint_values = [":foo-bar-baz"] if foo_config == "baz" else ([":foo-bar-buzz"] if foo_config = "buzz") else []
)

my_rule(
  name = "use",
  flags = select(
     ":foo-bar-baz": ["--compiler-flag-1"],
     ":foo-bar-buzz": ["--compiler-flag-2"],
   )
)

Now, that example doesn't cause the :use rule to be built in the :my-platform configuration, there's a handful of different ways to do that. But if it is built in that configuration, that select() would be resolved in the way you asked.

dtolnay commented 1 year ago

How can I do the following two things:

  1. Write a select() statement that passes --compiler-flag-1 if foo.bar == baz, pass --compiler-flag-2 if foo.bar == buzz, and generate a build failure if foo.bar is anything else?
  2. Ignore .buckconfig for the purposes of obtaining a value of foo.bar, and have the user write buck2 build <command-line-argument-that-forces-foo.bar-to-buzz>?

It's hard to be sure whether you are asking this in order to understand better how platforms/configurations/selects work, vs. to accomplish these two specific pieces of functionality.

Mostly likely it's the former, in which case cjhopman's answer has good guidance.

But just in case it's the latter, select and constraint may not necessarily be the thing you're after.

  1. foo_bar = read_config("foo", "bar", None)
    
    my_rule(
        name = "...",
        flags = [
            "--compiler-flag-1" if foo_bar == "baz"
            else "--compiler-flag-2" if foo_bar == "buzz"
            else fail("...")
        ],
    )
  2. buck2 build --config foo.bar='...'

    foo_bar = read_config("foo", "bar", None)
    
    my_rule(
        name = "...",
        flags = foo_bar.split(" ") if foo_bar else [],
    )
zjturner commented 1 year ago

Yes, it is. Depending on your use case I can suggest one solution or another, or describe our plans to address the issue.

Basically we don't support "flags". Most of the current issue can be addressed by adding a custom configuration with platform rule. But these are not composable. There's workaround with buckconfigs and config_setting depending on that buckconfig, which is partly deprecated, but we don't have a replacement yet.

This is quite a fundamental divergence from what I'm accustomed to, and it makes me question whether buck2 is even compatible with our builds at all.

For some context, we currently use CMake. In CMake you can pass -DKEY=VALUE on the command line, and you can pass that over and over again with different keys and different values. And you can have any combination of these. So our entire build has been constructed with this assumption in mind, and it sounds like you're saying this simply isn't possible.

I probably can't list every possible way in which we rely on this, but here's a few:

  1. We allow the user to specify which santizer(s) they want to build with. -DUSE_SANITIZER=[address|undefined|thread].
  2. We allow the user to specify which build configuration they want. -DBUILD_TYPE=[debug|optimized|production]
  3. We allow the user to specify which product they're building (a single codebase can build several different products). -DPRODUCT=[client|server|secret_project].
  4. We allow the user to specify which kinds of hardening measures they want baked into the application. -DHARDENING=[none|minimal|production].

Number 3 is especially problematic, because part of the semantics of that one involve passing a define into every compilation unit, including shared code. This means that a build of two different products passes different flags to the same source files so the compiled code can't be shared between products, even though much of the source code is the same.

There are at least 20 more of these flags that we can pass to the build, and any combination of flags can be mixed and matched. The semantics of these flags are extremely diverse. They can mean anything from:

  1. Add a flag to the linker for a single target
  2. Add a define to every compilation unit
  3. Perform some additional packaging / installation steps at the end of the build
  4. Conditionally compile (or don't compile) some source files based on the value of the flag.
  5. Omit some build targets entirely, so that the targets don't even exist.

And more.

And of course, the semantics can also be compound, so that a single build flag can do a combination of the above semantics, including various things that aren't listed.

thoughtpolice commented 1 year ago

Hey Zach, good to see you around these parts; been a while! I'll try to give you a good answer to help get you moving along since I saw some other questions of yours...

Write a select() statement that passes --compiler-flag-1 if foo.bar == baz, pass --compiler-flag-2 if foo.bar == buzz, and generate a build failure if foo.bar is anything else?

Well, the statement itself is pretty simple, in theory, though I haven't tested this with the prelude:

cxx_binary(
  cflags = select({
    "cell//path/to:target": [ "-O2" ],
    "cell//path/to/other:target": [ "-O1" ],
  }) + default_cflags,
)

In the above example, cell//path/to:target is a configuration setting. But we might have an X/Y problem here. This probably isn't a good example so we should start over from the beginning, I think.

For example if the following BUCK file exists at the path foo/bar/baz/BUCK:

cxx_binary(name = "test_cxx", ...)
rust_binary(name = "test_rs", ...)

We'd say that cxx_binary and rust_binary are rules. I prefer to think of a rule as something even simpler: it's just a function with arguments! And every time you call that rule, with some arguments, you create an instance of the rule. Every instance of a rule has a name — the name field of a rule instance gives it an externally visible name that can be mentioned in a target pattern. Again: the name of the target that you refer to in buck build is always listed in the name field of a rule. It's a special field that you cannot get rid of. Rule = function. Creating instance of rule = application of function.

Then there are two targets above: foo/bar/baz:test_cxx and foo/bar/baz:test_rs. So you could run buck build on those paths. Remember how I said target patterns are patterns? You can build them both using the ... pattern: buck build frob//path/to:... would build every target listed in the BUCK file in cell frob//, under path path/to. So it would build both of these binaries.

And there are lots of other target patterns you can use, try a bunch:

And some other various, more non-obvious ones, like : for example (all targets in the package in $CWD)


Before you can talk about configurations, you need to talk about providers. This is a very, very, very important part of the design of Buck, and a lot more things made sense once I realized this.

I mentioned make before as a point of contrast. I did so for a reason, because it creates big contrast. To sum up:

It is very, very important to understand this. It is key to Buck's modularity. You can never simply look at a target pattern foo//bar:baz and understand what baz is without reading the BUCK file or running uquery or something. It isn't a file output! The string ovr_config//os:macos you brought up is absolutely a target, but it is not a file. It is a data structure. Big difference.

Providers — data structures — are the only way a rule can provide information to downstream rules that depend on it. In Make, a rule can only create a single file, which can be consumed by downstream rules, in contrast.

Providers are very important to understanding how the action graph is created after analysis, and are key to Buck's modularity — they're why you can have a python_binary depend on an ocaml_library which depends on a cxx_library and it all works out magically. But this is about as good of a summary as you need. I strongly suggest reading this: https://buck2.build/docs/rule_authors/writing_rules/ it's sparse but will give you an idea of what a rule looks like inside, and you can see how rules consume and create providers.


So now we know that ovr_config//os:macos is actually a target, and it can return data structures. What does this mean in practice for things like select()? In Buck, you have constraint settings, constraint values, and config settings.

My understanding of platforms and their ins-and-outs is poor still. This is also getting too long. But the upshot is that you can use select() on constraint or configuration settings, like in the example from @cjhopman or my original example above.

As a side note: actually, I think select() as a function only works on targets that return ConfigurationInfo objects. So why does it work on constraint values, AKA ConstraintValueInfo objects, like in @cjhopman's example? Read the source code to constraint_value_impl to find out why ;)

Ignore .buckconfig for the purposes of obtaining a value of foo.bar, and have the user write buck2 build ?

There is --config flag you can pass to buck2 build that will do this, e.g. buck2 build -c foo.bar=buzz and I believe that overrides it. I can't tell you if it needs to restart the buck2d daemon or anything, some config values require that.

See also https://github.com/facebook/buck2/issues/142 and the "configuration-at syntax" RFC linked within. This would allow you to conveniently set a specific configuration platform using syntax on the command line, which is basically what you want. In this case you would simply set a configuration by building a target like buck build //foo/bar:baz@config//platforms:linux-x86_64-O3. Then the rule implementing //foo/bar:baz would be able to select() the right things. But I don't think any of the features are there yet for that RFC. The example from @cjhopman is about as good as it gets.


This is way way way too long. But personally, in this case I'd probably structure my solution differently until some of those features get added — I'd just avoid configurations and build every version of the thing you want, all with different target names. This takes longer to build the whole package, but is more flexible right now. It also helps you ensure every version of a build is working; e.g. a change doesn't break ASAN builds or something. That's actually a big problem with the approach to using $(CFLAGS) like Make does — you can easily test something with make CFLAGS=-fsanitize=address and it works, check it in, and another build breaks (imagine you didn't #ifdef some sanitizer logic appropriately.) Or vice versa; the normal build breaks ASAN. In this case, all your builds would be integrated every time.

So instead, have foobar target be the default, and also foobar-O3 and foobar-O3-LTO and also foobar-fuzz-asan and so on and so forth... You can generate all these rules with code, by the way. Something like this in defs.bzl:

def cxx_multi_binaries(name, **kwargs):
    default_way = {
        "cflags": [],
        "ldflags": [],
    }
    other_ways = {
        "O3": {
            "cflags": [ "-O3" ],
            "ldflags": [ "-O3" ],
        },
        "LTO": {
            "cflags": [ "-O3", "-flto" ],
            "ldflags": [ "-fuse-ld=lld", "-flto" ],
        },
    }

    # generate a target named 'foobar-XYZ' for every 'way' above
    for (way_name, way_opts) in other_ways .items():
        cxx_binary(
            name = "{}-{}".format(name, way_name)
            cflags = way_opts.cflags,
            ld_flags = way_opts.ldflags,
            *kwargs,
        )

    # and also a default binary
    cxx_binary(name = name, cflags = default_way["cflags"], ldflags = default_way["ldflags", **kwargs)

Then, in the BUCK file:

load(":defs.bzl", "cxx_multi_binaries")

cxx_multi_binaries(
    name = "test",
    srcs = glob(["*.cxx"])
)

And then you'll have targets named :test, :test-O3, :test-LTO, etc. You'd just specify whatever target you wanted if you wanted a specific one. This is also a key advantage of Buck: you can just write Starlark functions to generate rules in whatever magical way you want! (Functions that purely call and instantiate rules are often called "macros", confusingly...) And these functions abstract everything away. The caller of cxx_multi_binaries does not have to have any knowledge of how flags are propagated or interpreted, for instance. An engineer can completely change its implementation without impacting callers, as long as some basic functional rules about APIs are respected (i.e. don't rename stuff like crazy.)

This is a long post but hopefully it'll help. Restructuring a build for Buck is fundamentally a lot of work. But it gives you a lot more power than CMake once you lean into it. Hope you're doing well, too!

zjturner commented 1 year ago

Hey Zach, good to see you around these parts; been a while! I'll try to give you a good answer to help get you moving along since I saw some other questions of yours...

Haha, noticed you too in other places. Good to see you :)

Anyway, this is a very long answer, so thank you! Will take me some time to read / digest. That said, from a cursory skim, it sounds like this is exactly the kind of advice and guidance that should find a way into the documentation (or examples) in some form. Perhaps I'll take it as an exercise to the reader to produce such an example and make a PR updating the examples folder with new examples.

thoughtpolice commented 1 year ago

I'll also note something about the efficiency of my proposed solution. It's true that by representing each build as an individual target, rather than modifying CFLAGS, you have added more targets to build and test. If it takes 30 minutes to build a project, then adding the O3 and LTO way like my example will, in the worst case, increase build time by 200% relative to base. At best, it will increase build time by 0%, assuming perfect parallelism and each build is run independently. Except that:

I call the above solution the "cartesian product" approach because rather than modifying the set of build parameters in place, you instead effectively represent the set of parameters to the build as a tuple (A, B, C, ...), calculate the cartesian product of that tuple A * B * C * ..., and then run the build once for-each one of these. Excluding various invalid scenarios. I've been using this approach for a while and I find it is often much better because in practice there aren't actually a billion parameters, normally there's about 5 or so. And in practice you integrate all of these "build variants" anyway on most commits. And even if you don't want that — just don't build it! Just select the specific target you want; build foobar-O3 and skip foobar-asan. Buck even has cquery and BXL which allows you to programmatically calculate these targets based on various conditions! This isn't appropriate for everything, but I've found it quite handy.

zjturner commented 1 year ago

I like the cartesian product idea, we tried to do this with CMake at one point but gave up because CMake did not make this easy. My biggest concern is that now engineers who don't care about build systems and who just want to click a button and have things work suddenly have to start remembering this cartesian product.

Do you think it makes sense to say that developer only options used for testing and experimentation are better off being done via read_config (see @dtolnay's comment above), and configurations that are tested on CI could be done via a cartesian product approach?

thoughtpolice commented 1 year ago

Yeah, one issue is that people have to pick out what version of the build they want, in the cartesian approach. So if you want ASAN+UBSAN+LTO+Fuzzing, you might have to look for a target-with_really_long_target_name or whatever. It does become annoying for people who just want to test a single build and aren't necessarily using every variant.

I missed his comment originally but, the approach from @dtolnay is instead closer to what people would expect to do with make, so it has less "sticker shock." It's about as simple as it can be, I think? Since you also need to pick up ldflags and maybe some other settings, you could always put them into a .bzl file and then load() those values from the file, too. That also has an advantage that you don't have to keep repeating buck build -c ... every time:

dev_options = {
  "cflags": [ ... ],
  "ldflags": [ ... ],
  "defines": [ ... ],
}
load(":devopts.bzl", "dev_options")

cxx_binary(
  cflags = dev_options.cflags
  # and similar for ldflags, etc
)

In this example dev_options.bzl would just override whatever is defined by default, or augment it, etc. An advantage over putting this here instead of passing buck build -c cc.flags=$SOMETHING ..., or putting it in .buckconfig, is that this file isn't global! There are other tradeoffs with this variant — for example you could accidentally commit changes to devopts.bzl when it should always look like dev_options = {} in the repository; only a working copy being hacked on should be different. You could auto-reject commits that make changes to that file, for instance. Or use fail() inside the file somewhere, or something, if another thing isn't true e.g. if read_config("ci_system","in_ci", false) and dev_options != {}: fail("devopts.bzl can't be used in CI") or whatever.

Ultimately an exact 1-to-1 replica of the make-based interface created by CMake is just a pre-existing choice to adhere too, but it does have the huge advantage of being familiar. The lesson here though is your Starlark program is a program, so you have a bit of flexibility with how you arrange files and integrate them.