facebook / buck2

Build system, successor to Buck
https://buck2.build/
Apache License 2.0
3.5k stars 215 forks source link

Idea: "target set" language for specifying target patterns #749

Open thoughtpolice opened 3 weeks ago

thoughtpolice commented 3 weeks ago

While talking on Discord about a general idea of "How do I skip some tests that are slow", I came up with an idea I wanted to post here on the bug tracker:

Here is a crazy idea that maybe isn't feasible but, in Jujutsu and Sapling (and Mercurial) there's a concept of "revision sets" which are a language for selecting commits in the commit graph. So jj log -r 'mine() ~ description("wip:")' means "find all my commits, but exclude ones with wip: in the description."

What if we had "target sets" which were a similar language for selecting target names from a more general target pattern? For example, buck2 build '//folder/... ~ label(exact:"slow") would be an explicit version of "skip all slow tests but run everything else." I actually wonder if this syntax might even be "compile-able" to a buck2 query expression in some way? It seems like it would solve a lot of the same issues.

For example, one thing I like to do is test clean build times for Rust compiles, cargo vs buck. But cargo keeps downloads cached even after a clean, while buck doesn't. So the first thing I do is some monstrosity like buck2 build $(buck2 uquery "kind('http_archive', deps('//...'))" | grep third-party//) which is basically saying "Download all http_archive files" first and foremost, then you can run the normal build. But it's kind of awkward to write out and probably doesn't work on Windows.

But instead, I could do: buck2 build 'rule(exact:"http_archive") & cell("third-party") & depOf("//...")' and it works everywhere.

I actually use this all the time and it would make many forms of automation obsolete if I could more programmatically select target names from a simple language with expressions and operators.

Before getting into the weeds here, would something like this be sensible? I guess it's kind of tricky because there's a question of whether things like select() or configurations should be considered. But it feels like a lot of basic examples like the above should be "sort of" easy to translate to a query expression, and they would provide a lot of open ended value. Slow tests and particular rules are just one example.

cormacrelf commented 2 weeks ago

The uquery context does have operators. They're not as good as Jujutsu's, but they go alright. I forget the symbols a lot but + is union and I think ^ is intersection (?). Most of the functions like attrfilter() are designed to take the thing they're filtering as an argument rather than the JJ-revset style "just give me some sets and let me intersect or subtract them".

I can see why it might have been designed this way -- Buck target sets have to be quite lazy because you don't want to have to materialize large swaths of a megarepo from VFS just to evaluate rule("http_archive") & //some/package:. To make a really nice target set language that didn't explode on basic queries you'd need to invest some time in a solid query planner to propagate the limiting factor around and avoid evaluating any large sets in any intermediate results. The way it is now, you're forced to clarify the limiting factor at multiple points while writing a query. At the end of the day you can use the existing fancy set operations as much as you like if every rdeps() etc is constrained to an appropriately sized universe. If you're not Meta, then //... is already an appropriately sized universe, and you can just pass that everywhere and rely entirely on set operations.

Simpler question: should you be able to buck2 build a target expression? I don't see why not, especially if you can buck2 test one. I think it would be fun. Currently I have BXL to build the "diag.txt" subtarget of lots of rust targets and a wrapper script to dump all the output to stdout. It might be nice if this could be done with a single build command, with a bit of target expression magic to select subtargets, and dump all the output diag.txt files at once with --out -. If it doesn't cover that use case, it would still be useful.

cbarrete commented 1 week ago

Here are a few of our use cases where this would come in handy:

In both cases, it is possible to write cqueries to get what they want, but the ergonomics are rather poor for day to day use.