Open zjturner opened 1 year ago
How does parsing of this "ovr_config//os:macos" string work? That also looks a lot like one of those target patterns, but that clearly is not a target
It is a target.
Try buck2 uquery ovr_config//os:macos
.
what is os?
Directory in ovr_config
cell.
What other selectors are there that I could use besides os
Simplified, config_setting
rule defines select key.
But yes, documentation could be better.
Thank you! Would you be able to also provide answers to the two questions at the end of the OP?
Would you be able to also provide answers to the two questions at the end of the OP?
Can you please give more concrete examples of what flags do you want to switch?
Hmm, is that actually important? I chose a hypothetical non-existant flag --compiler-flag-1
on purpose because I'm assuming that the answer should be the same for any arbitrary flag. If the setting in buck config is foo, pass flag 1. If it's bar, pass flag 2. Does it actually m atter what flag 1 and flag 2 are?
Hmm, is that actually important?
Yes, it is. Depending on your use case I can suggest one solution or another, or describe our plans to address the issue.
Basically we don't support "flags". Most of the current issue can be addressed by adding a custom configuration with platform
rule. But these are not composable. There's workaround with buckconfigs and config_setting
depending on that buckconfig, which is partly deprecated, but we don't have a replacement yet.
Write a select() statement that passes --compiler-flag-1 if foo.bar == baz, pass --compiler-flag-2 if foo.bar == buzz, and generate a build failure if foo.bar is anything else?
There's a lot that goes into this. A lot of it is described here: https://buck2.build/docs/rule_authors/configurations/
Here's one way to define a platform and a select such that the select() will resolve as you ask when in the configuration defined by that platform:
constraint_setting(
name = "foo-bar"
)
constraint_value(
name = "foo-bar-baz",
constraint_setting = ":foo-bar",
)
constraint_value(
name = "foo-bar-buzz",
constraint_setting = ":foo-bar",
)
foo_config = read_config("foo", "bar", None)
platform(
name = "my-platform",
constraint_values = [":foo-bar-baz"] if foo_config == "baz" else ([":foo-bar-buzz"] if foo_config = "buzz") else []
)
my_rule(
name = "use",
flags = select(
":foo-bar-baz": ["--compiler-flag-1"],
":foo-bar-buzz": ["--compiler-flag-2"],
)
)
Now, that example doesn't cause the :use
rule to be built in the :my-platform
configuration, there's a handful of different ways to do that. But if it is built in that configuration, that select() would be resolved in the way you asked.
How can I do the following two things:
- Write a
select()
statement that passes--compiler-flag-1
iffoo.bar == baz
, pass--compiler-flag-2
iffoo.bar == buzz
, and generate a build failure iffoo.bar
is anything else?- Ignore
.buckconfig
for the purposes of obtaining a value offoo.bar
, and have the user writebuck2 build <command-line-argument-that-forces-foo.bar-to-buzz>
?
It's hard to be sure whether you are asking this in order to understand better how platforms/configurations/selects work, vs. to accomplish these two specific pieces of functionality.
Mostly likely it's the former, in which case cjhopman's answer has good guidance.
But just in case it's the latter, select and constraint may not necessarily be the thing you're after.
foo_bar = read_config("foo", "bar", None)
my_rule(
name = "...",
flags = [
"--compiler-flag-1" if foo_bar == "baz"
else "--compiler-flag-2" if foo_bar == "buzz"
else fail("...")
],
)
buck2 build --config foo.bar='...'
foo_bar = read_config("foo", "bar", None)
my_rule(
name = "...",
flags = foo_bar.split(" ") if foo_bar else [],
)
Yes, it is. Depending on your use case I can suggest one solution or another, or describe our plans to address the issue.
Basically we don't support "flags". Most of the current issue can be addressed by adding a custom configuration with
platform
rule. But these are not composable. There's workaround with buckconfigs andconfig_setting
depending on that buckconfig, which is partly deprecated, but we don't have a replacement yet.
This is quite a fundamental divergence from what I'm accustomed to, and it makes me question whether buck2 is even compatible with our builds at all.
For some context, we currently use CMake. In CMake you can pass -DKEY=VALUE
on the command line, and you can pass that over and over again with different keys and different values. And you can have any combination of these. So our entire build has been constructed with this assumption in mind, and it sounds like you're saying this simply isn't possible.
I probably can't list every possible way in which we rely on this, but here's a few:
-DUSE_SANITIZER=[address|undefined|thread]
.-DBUILD_TYPE=[debug|optimized|production]
-DPRODUCT=[client|server|secret_project]
.-DHARDENING=[none|minimal|production]
.Number 3 is especially problematic, because part of the semantics of that one involve passing a define into every compilation unit, including shared code. This means that a build of two different products passes different flags to the same source files so the compiled code can't be shared between products, even though much of the source code is the same.
There are at least 20 more of these flags that we can pass to the build, and any combination of flags can be mixed and matched. The semantics of these flags are extremely diverse. They can mean anything from:
And more.
And of course, the semantics can also be compound, so that a single build flag can do a combination of the above semantics, including various things that aren't listed.
Hey Zach, good to see you around these parts; been a while! I'll try to give you a good answer to help get you moving along since I saw some other questions of yours...
Write a select() statement that passes --compiler-flag-1 if foo.bar == baz, pass --compiler-flag-2 if foo.bar == buzz, and generate a build failure if foo.bar is anything else?
Well, the statement itself is pretty simple, in theory, though I haven't tested this with the prelude:
cxx_binary(
cflags = select({
"cell//path/to:target": [ "-O2" ],
"cell//path/to/other:target": [ "-O1" ],
}) + default_cflags,
)
In the above example, cell//path/to:target
is a configuration setting. But we might have an X/Y problem here. This probably isn't a good example so we should start over from the beginning, I think.
make
, you say make foo.exe
to build the foo.exe
binary. You call foo.exe
a target. In Make, a target is, generally speaking a file. You might even say something like make foo/bar/baz.exe
to build that file in a non-recursive Make build system. (Nit: sometimes they aren't files, though; it might be a .PHONY
rule, which isn't a file but just a list of commands to run every time you ask.) buck2 build cell//foo/bar:baz
. Now, Buck does build targets, but targets are not necessarily files. They are just names of things.cell//path/to:target
is a target pattern. It's a pattern because you can actually match multiple targets at once (more on that in a second)
cell//
is just an arbitrary name. It points to a directory in the repository. This is listed in the .buckconfig
file, in the [repositories]
section. You can basically create as many cells as you want — but there are two mandatory cells, one to refer to the root (often root = .
) and one named prelude//
you have to specify, at minimum.path/to/
part refers to a path within the cell. Just expand the name of the cell to get the path. If cell = foo/bar
in the root .buckconfig
, then cell//baz/
translates to foo/bar/baz/
when expanded, relative to the root of the repository. Very simple.
BUCK
file to describe targets.:target
part is, well, the target. But broadly speaking, the target is the name
field of something listed in a BUCK
file. Note it has a semicolon in front. When writing buck build path/to:target
, the semicolon separates the two, but you also use the colon elsewhere (more in a second.)For example if the following BUCK
file exists at the path foo/bar/baz/BUCK
:
cxx_binary(name = "test_cxx", ...)
rust_binary(name = "test_rs", ...)
We'd say that cxx_binary
and rust_binary
are rules. I prefer to think of a rule as something even simpler: it's just a function with arguments! And every time you call that rule, with some arguments, you create an instance of the rule. Every instance of a rule has a name — the name
field of a rule instance gives it an externally visible name that can be mentioned in a target pattern. Again: the name of the target that you refer to in buck build
is always listed in the name
field of a rule. It's a special field that you cannot get rid of. Rule = function. Creating instance of rule = application of function.
Then there are two targets above: foo/bar/baz:test_cxx
and foo/bar/baz:test_rs
. So you could run buck build
on those paths. Remember how I said target patterns are patterns? You can build them both using the ...
pattern: buck build frob//path/to:...
would build every target listed in the BUCK
file in cell frob//
, under path path/to
. So it would build both of these binaries.
And there are lots of other target patterns you can use, try a bunch:
cell//path/to/a/package:name
builds a targetcell//path/to/a:...
builds every target in the package recursivelycell//path/to/a:
builds every cell//...
builds every package and every target in the cell//path/to/a/package:name
— implicit cell resolves to root//
//...
builds everything the default cell...
builds everything in the default cell, even shorterAnd some other various, more non-obvious ones, like :
for example (all targets in the package in $CWD
)
Before you can talk about configurations, you need to talk about providers. This is a very, very, very important part of the design of Buck, and a lot more things made sense once I realized this.
I mentioned make
before as a point of contrast. I did so for a reason, because it creates big contrast. To sum up:
make
, a target (ignoring .PHONY
targets!!!) always refers to a file.buck
, a target doesn't refer to a file, but, in reality, it is a data structure returned from a rule.
cxx_library
may have two things it outputs: the compiled library (in whatever form you choose), and a big ball of generated HTML doxygen documentation.A < B
(rule instance B depends on A) are actually, concretely, a piece of code in B
needing a value from a provider returned by A
. If cxx_library
returns both a DoxygenHtmlProvider
object and a CxxLinkableInfoProvider
object, then something that depends on cxx_library
can consume these objects any way they wish. They are just normal values in a programming language (called Starlark.) cxx_library
and only consume the CxxLinkableInfoProvider
object, Buck will never build the doxygen docs. After all, you didn't need the DoxygenHtmlProvider
, so why compute it?It is very, very important to understand this. It is key to Buck's modularity. You can never simply look at a target pattern foo//bar:baz
and understand what baz
is without reading the BUCK
file or running uquery
or something. It isn't a file output! The string ovr_config//os:macos
you brought up is absolutely a target, but it is not a file. It is a data structure. Big difference.
Providers — data structures — are the only way a rule can provide information to downstream rules that depend on it. In Make, a rule can only create a single file, which can be consumed by downstream rules, in contrast.
Providers are very important to understanding how the action graph is created after analysis, and are key to Buck's modularity — they're why you can have a python_binary
depend on an ocaml_library
which depends on a cxx_library
and it all works out magically. But this is about as good of a summary as you need. I strongly suggest reading this: https://buck2.build/docs/rule_authors/writing_rules/ it's sparse but will give you an idea of what a rule looks like inside, and you can see how rules consume and create providers.
So now we know that ovr_config//os:macos
is actually a target, and it can return data structures. What does this mean in practice for things like select()
? In Buck, you have constraint settings, constraint values, and config settings.
cpu
, which can take on several possible constraint values, such as x86_64
and aarch64
. You can also have a constraint like os
which has win
, lin
, and mac
.
constraint_setting
and constraint_value
, so they return providers.ConstraintSettingInfo
provider object, while constraint values use ConstraintValueInfo
provider objects. cpu == x86_64
. That example seems redundant, since it's one config for one constraint (you still need both, even in that case.) But it naturally extends to more constraints; the x86_64-linux configuration has constraint cpu == x86_64
and os == linux
both at the same time, for instance.
config_setting
rule, and it yields a ConfigurationInfo
provider object.My understanding of platforms and their ins-and-outs is poor still. This is also getting too long. But the upshot is that you can use select()
on constraint or configuration settings, like in the example from @cjhopman or my original example above.
As a side note: actually, I think select()
as a function only works on targets that return ConfigurationInfo
objects. So why does it work on constraint values, AKA ConstraintValueInfo
objects, like in @cjhopman's example? Read the source code to constraint_value_impl
to find out why ;)
Ignore .buckconfig for the purposes of obtaining a value of foo.bar, and have the user write buck2 build
?
There is --config
flag you can pass to buck2 build
that will do this, e.g. buck2 build -c foo.bar=buzz
and I believe that overrides it. I can't tell you if it needs to restart the buck2d
daemon or anything, some config values require that.
See also https://github.com/facebook/buck2/issues/142 and the "configuration-at syntax" RFC linked within. This would allow you to conveniently set a specific configuration platform using syntax on the command line, which is basically what you want. In this case you would simply set a configuration by building a target like buck build //foo/bar:baz@config//platforms:linux-x86_64-O3
. Then the rule implementing //foo/bar:baz
would be able to select()
the right things. But I don't think any of the features are there yet for that RFC. The example from @cjhopman is about as good as it gets.
This is way way way too long. But personally, in this case I'd probably structure my solution differently until some of those features get added — I'd just avoid configurations and build every version of the thing you want, all with different target names. This takes longer to build the whole package, but is more flexible right now. It also helps you ensure every version of a build is working; e.g. a change doesn't break ASAN builds or something. That's actually a big problem with the approach to using $(CFLAGS)
like Make does — you can easily test something with make CFLAGS=-fsanitize=address
and it works, check it in, and another build breaks (imagine you didn't #ifdef
some sanitizer logic appropriately.) Or vice versa; the normal build breaks ASAN. In this case, all your builds would be integrated every time.
So instead, have foobar
target be the default, and also foobar-O3
and foobar-O3-LTO
and also foobar-fuzz-asan
and so on and so forth... You can generate all these rules with code, by the way. Something like this in defs.bzl
:
def cxx_multi_binaries(name, **kwargs):
default_way = {
"cflags": [],
"ldflags": [],
}
other_ways = {
"O3": {
"cflags": [ "-O3" ],
"ldflags": [ "-O3" ],
},
"LTO": {
"cflags": [ "-O3", "-flto" ],
"ldflags": [ "-fuse-ld=lld", "-flto" ],
},
}
# generate a target named 'foobar-XYZ' for every 'way' above
for (way_name, way_opts) in other_ways .items():
cxx_binary(
name = "{}-{}".format(name, way_name)
cflags = way_opts.cflags,
ld_flags = way_opts.ldflags,
*kwargs,
)
# and also a default binary
cxx_binary(name = name, cflags = default_way["cflags"], ldflags = default_way["ldflags", **kwargs)
Then, in the BUCK
file:
load(":defs.bzl", "cxx_multi_binaries")
cxx_multi_binaries(
name = "test",
srcs = glob(["*.cxx"])
)
And then you'll have targets named :test
, :test-O3
, :test-LTO
, etc. You'd just specify whatever target you wanted if you wanted a specific one. This is also a key advantage of Buck: you can just write Starlark functions to generate rules in whatever magical way you want! (Functions that purely call and instantiate rules are often called "macros", confusingly...) And these functions abstract everything away. The caller of cxx_multi_binaries
does not have to have any knowledge of how flags are propagated or interpreted, for instance. An engineer can completely change its implementation without impacting callers, as long as some basic functional rules about APIs are respected (i.e. don't rename stuff like crazy.)
This is a long post but hopefully it'll help. Restructuring a build for Buck is fundamentally a lot of work. But it gives you a lot more power than CMake once you lean into it. Hope you're doing well, too!
Hey Zach, good to see you around these parts; been a while! I'll try to give you a good answer to help get you moving along since I saw some other questions of yours...
Haha, noticed you too in other places. Good to see you :)
Anyway, this is a very long answer, so thank you! Will take me some time to read / digest. That said, from a cursory skim, it sounds like this is exactly the kind of advice and guidance that should find a way into the documentation (or examples) in some form. Perhaps I'll take it as an exercise to the reader to produce such an example and make a PR updating the examples folder with new examples.
I'll also note something about the efficiency of my proposed solution. It's true that by representing each build as an individual target, rather than modifying CFLAGS
, you have added more targets to build and test. If it takes 30 minutes to build a project, then adding the O3
and LTO
way like my example will, in the worst case, increase build time by 200% relative to base. At best, it will increase build time by 0%, assuming perfect parallelism and each build is run independently. Except that:
make CFLAGS=-O3
and make CFLAGS=-fsanitize=address
on every commit.foobar
and e.g. foobar-O3
at once. Let's say you call buck build :...
to build all the targets. If your foobar
build has a part where parallelism decreases due to linking steps, buck can immediately begin building objects in the foobar-O3
target during that time. With make CFLAGS=-O3
, it's all or nothing. You have to wait until the build completes fully and then run make CFLAGS=<something-else>
afterwords.ccache
on steroids and needs no extra integration.
-O3
and -fsanitize=address
, those both exist in buck-out/
under unique names. The input cflags
are calculated as being an input to the object file. So you can build the -O3 version, the ASAN version, then the -O3 version again, and the final call is instant. In contrast you need to explicitly use separate build directories with cmake or ccache
to achieve this.I call the above solution the "cartesian product" approach because rather than modifying the set of build parameters in place, you instead effectively represent the set of parameters to the build as a tuple (A, B, C, ...)
, calculate the cartesian product of that tuple A * B * C * ...
, and then run the build once for-each one of these. Excluding various invalid scenarios. I've been using this approach for a while and I find it is often much better because in practice there aren't actually a billion parameters, normally there's about 5 or so. And in practice you integrate all of these "build variants" anyway on most commits. And even if you don't want that — just don't build it! Just select the specific target you want; build foobar-O3
and skip foobar-asan
. Buck even has cquery
and BXL which allows you to programmatically calculate these targets based on various conditions! This isn't appropriate for everything, but I've found it quite handy.
I like the cartesian product idea, we tried to do this with CMake at one point but gave up because CMake did not make this easy. My biggest concern is that now engineers who don't care about build systems and who just want to click a button and have things work suddenly have to start remembering this cartesian product.
Do you think it makes sense to say that developer only options used for testing and experimentation are better off being done via read_config
(see @dtolnay's comment above), and configurations that are tested on CI could be done via a cartesian product approach?
Yeah, one issue is that people have to pick out what version of the build they want, in the cartesian approach. So if you want ASAN+UBSAN+LTO+Fuzzing, you might have to look for a target-with_really_long_target_name
or whatever. It does become annoying for people who just want to test a single build and aren't necessarily using every variant.
I missed his comment originally but, the approach from @dtolnay is instead closer to what people would expect to do with make
, so it has less "sticker shock." It's about as simple as it can be, I think? Since you also need to pick up ldflags
and maybe some other settings, you could always put them into a .bzl
file and then load()
those values from the file, too. That also has an advantage that you don't have to keep repeating buck build -c ...
every time:
dev_options = {
"cflags": [ ... ],
"ldflags": [ ... ],
"defines": [ ... ],
}
load(":devopts.bzl", "dev_options")
cxx_binary(
cflags = dev_options.cflags
# and similar for ldflags, etc
)
In this example dev_options.bzl
would just override whatever is defined by default, or augment it, etc. An advantage over putting this here instead of passing buck build -c cc.flags=$SOMETHING ...
, or putting it in .buckconfig
, is that this file isn't global! There are other tradeoffs with this variant — for example you could accidentally commit changes to devopts.bzl
when it should always look like dev_options = {}
in the repository; only a working copy being hacked on should be different. You could auto-reject commits that make changes to that file, for instance. Or use fail()
inside the file somewhere, or something, if another thing isn't true e.g. if read_config("ci_system","in_ci", false) and dev_options != {}: fail("devopts.bzl can't be used in CI")
or whatever.
Ultimately an exact 1-to-1 replica of the make
-based interface created by CMake is just a pre-existing choice to adhere too, but it does have the huge advantage of being familiar. The lesson here though is your Starlark program is a program, so you have a bit of flexibility with how you arrange files and integrate them.
I found this code from grepping the repository:
How does parsing of this
"ovr_config//os:macos"
string work? That also looks a lot like one of those target patterns, but that clearly is not a target. And it's not the same thing as@prelude//toolchains:cxx.bzl
either, because it doesn't have an@
at the beginning, and doesn't refer to a file.I grepped for
ovr_config
and found this:I guess it's coming from here. But the exact syntax isn't documented. And what is
os
? What other selectors are there that I could use besidesos
? It looks like this pattern might be a "target configuration constraint", which I figured out by looking at this piece of code from somewhere else:but if you type "constraint" into the search box of the documentation page you get nothing.
And how do these map to values from
.buckconfig
? Concrete example: Let's say I have this in my.buckconfig
:How can I do the following two things:
select()
statement that passes--compiler-flag-1
iffoo.bar == baz
, pass--compiler-flag-2
iffoo.bar == buzz
, and generate a build failure iffoo.bar
is anything else?.buckconfig
for the purposes of obtaining a value offoo.bar
, and have the user writebuck2 build <command-line-argument-that-forces-foo.bar-to-buzz>
?