where can I ask questions / get help?

BrianHicks commented 1 year ago

Hey! I have a bunch of beginner-level questions. What's an appropriate place to ask them? I don't want to clutter this repo with a bunch of small questions whose answer is probably "that works like Buck 1" (which I haven't used 😅)

ndmitchell commented 1 year ago

Here is a good place to ask. We'll close the question once you are done, and hopefully improve the buck2 docs, so no clutter but will driven improvements in our documentation at https://buck2.build/. Feel free to just use this thread as your question thread for this batch of questions.

BrianHicks commented 1 year ago

OK, so here's my overall goal: I'm using Shake for building a project (github.com/NoRedInk/noredink-ui) but I'd like to try Buck for it instead. That project has both an Elm library (at /) and a demonstration site (at /component-catalog and https://noredink-ui.netlify.app/) which uses the library. If this conversion goes well, we may use Buck for our monorepo with more languages etc in the future (also using Shake now! While I've got you here: thanks for making that! 🎉)

This test project uses Elm, which i guess means I need to write my own Elm rules/toolchain/etc. That's fine; I know I'm jumping in the deep end.

So far, I'm just building library docs. That's a good smoke test, since it loads all the source files but only requires the elm binary to run. I've got that working with a BUCK and elm.bzl file.

BUCK

```starlark load(":elm.bzl", "elm_docs") elm_docs( name = "docs", ) ```

elm.bzl

```starlark def _elm_docs_impl(ctx: "context"): docs = ctx.actions.declare_output(ctx.attrs.out) ctx.actions.run( ["elm", "make", "--docs", docs.as_output()], category = "elm_docs" ) return [DefaultInfo(default_output = docs)] elm_docs = rule( impl = _elm_docs_impl, attrs = { "out": attrs.string(default="docs.json"), } ) ```

I know that's pretty naive in some important ways, and I'd like to fix them. Here are the ones I see, and some questions:

If I change a source file, the docs are not regenerated. The set of files this command cares about would look like glob(["src/**/*.elm", "elm.json"]), but I'm not sure how to tell Buck/my rule that it should care about them. (This is the thing I suspect is just a beginner "same as Buck 1" question.)
The command is definitely not being isolated, since it's loading all the files whenever it runs. How do I turn that on? Maybe just specifying source dependencies will do it?
This code is assuming elm is on the path—safe for now, since it's all loaded via Nix, but not safe in CI or remote builders. Looking around, I guess I need to write a toolchain. Is that correct? What is the simplest toolchain I could copy? (I looked at Go because it's a similar single-file compiler but there are a lot more options!)

ndmitchell commented 1 year ago

The rule should not look at any files that it doesn't get given from the user. To do that, you'd add an attribute "srcs": attrs.list(attrs.source()), and then in the BUCK file pass glob(["srcs/**/*.elm"]). If you have two types of source file (e.g. I suspect elm.json is either config or a project file?) you'd probably want a separate attribute for that.
We only have isolated commands when running through remote execution (https://buck2.build/docs/remote_execution/). In many cases, just calling run will require things like elm.json as arguments. If there is a elm make mode where you can pass --project or similar, then ["--project", ctx.attrs.project] will add the dependency. It's better to run the tools in "explicit" mode if available. If not you can do cmd_args(["elm", "make", "--docs", docs.as_output()]).hidden(ctx.attrs.project, ctx.attrs.srcs) and it will know they are required for the command.
Yes, a toolchain is the way to go. No great docs on this unfortunately - they're all a bit complex, and you probably need something fairly simple. At it's simplest a toolchain is a rule (https://github.com/facebook/buck2/blob/main/prelude/toolchains/ocaml.bzl) that produces a provider (https://github.com/facebook/buck2/blob/f9755e80863a1b9ca4c1358575a7a317a4707919/prelude/ocaml/ocaml_toolchain_types.bzl#L12-L43), and then you add an automatic _elm_toolchain attribute to elm_docs - OCaml isn't a bad reference and is fairly simple. I'd probably go a bit further before investing in a toolchain.

BrianHicks commented 1 year ago

Ah, great. Thank you! hidden is gonna get a lot of play here—the Elm compiler does a lot of things implicitly with packages.

About isolation: is that planned-for at all? One of the reasons I'm excited by Buck (or other similar tools) is to get better enforcement around dependencies. I suppose if we eventually had a remote build server we'd get it anyway though!

Is there a specific point at which you'd recommend looking at a toolchain or it more like "you have bigger fish to fry right now"?

BrianHicks commented 1 year ago

a couple more miscellaneous questions:

Every elm invocation requires there to be an elm.json in the working directory. As you point out, it'd be nice to have it specified as --project or similar, but that's not an option (even as an environment variable.) I can probably have every invocation set it explicitly by requiring a root attr or similar, but it'd be nice to have it just be the default since you can't ever change it. Is there a way to construct a source or set a relative file (e.g. "elm.json") as a default? When I try it, Buck says it can only be set to a relative file in the BUCK file. Fair, but a bit repetitive in this case?

What is the normal pattern for reading a config file for dependency information and creating calls to (say) elm_library based on that? Is that something I'll need to use an external tool for or can Buck 2 handle it internally? (Edit: oh! Is this dependency files? Maybe I just need to make a tool that can write those!)

ndmitchell commented 1 year ago

For isolation, you can set up buildbarn today and have a local remote execution server which gives you isolation. See #105 for a potential idea of basically baking something like buildbarn into Buck2, so we do that for you behind the scenes if you want isolation. The alternative would be to define your own elm.py that takes everything explicit, creates a temp dir, changes directory and copies/symlinks everything you explicitly passed, then does the build. A bit manual perhaps, but fairly simple to do.
For toolchain, you want that when you move to remote execution without base images (e.g. if you have elm in the base image, you are probably fine) or when you want to productionise and share it with other Elm projects (eventually we'd love to have these rules in the Buck2 prelude, which would require a toolchain). For when, I guess I don't think it will help you evaluate Buck2 for your use case (no hard bits there), but it will help you productionise. So when you have decided that Buck2 is the one true build system for you, I'd start thinking about a toolchain.
For a elm.json not repeating you can define a wrapper def elm_docs_helper(**kwargs): elm_docs(root = "elm.json", **kwargs). You can even define that in elm.bzl as elm_docs and then its like there is an implicit attribute, it just happens to get filled in at the users location.
For dependencies, do you mean which Elm libraries it depends on? Usually you'd vendor those in and list them explicitly. Although I'm not sure that's quite what you mean. Perhaps an example?

BrianHicks commented 1 year ago

All very helpful, thanks again! The wrapper makes a lot of sense; I'm gonna have to do that right away!

Speaking of changing working directory, by the way, is there a nice way to do that in cmd.actions.run or cmd_args or does it require invoking bash or something? I see relative_to, but that only seems to change the output directories?

I've got a couple of examples for dependencies:

You can get packages from packages.elm-lang.org. When I've done Elm packaging with Nix, there's a way to analyze the elm.json (example) to get the package dependencies. I'd like to do the same (or similar) in Buck so that we can get better caching.
Elm packages and projects also specify source roots. I would hypothetically like to tell Buck to depend on all the Elm files it can glob out of those prefixes so that there's only one source of truth for build information.

ndmitchell commented 1 year ago

The only way to change working directory is with a shell script. For most tools (which take explicit arguments) that gives you a much higher chance of getting it right than changing directory and then finding things relatively. That said, if you do an explicit elm wrapper, I'd expect that to take everything explicitly, and then change to a temporary directory.

For packages, you can either put them on the Buck2 graph or leave them off as untracked dependencies (with the assumption they are on the base image). The two options:

If you put them on the graph, you usually need to vendor them in - have something that copies/pastes all their code into your project. E.g. we use the Reindeer tool to do that for Rust dependencies. A bit more work, but works seamlessly on remote execution.
If you put them off the graph they are being found in the base image, so you'd need to make sure they were provided by Nix or a Docker image on remote execution. Easier to get started.

For source roots, perhaps a glob in the macro layer?

BrianHicks commented 1 year ago

about testing:

is there a way to tell Buck that if it has already run some tests for a specific set of inputs, it need not run them again?
is there a way to mark contended resources? In Elm, there's a compilation artifacts directory (elm-stuff.) It's not completely safe to have multiple instances of the compiler running at once on a single one (there is some file locking stuff but there's a crucial gap that means files can get truncated sometimes.)
is there some equivalent to Shake's batch in Buck? For example, I can run the formatting tool in its check mode in Shake, and it will be invoked so many times, only on the files that have changed since it was last invoked. I'd like something similar in Buck, if possible! Failing that, I'd like to split up the elm-format invocations, one per file, but I don't see an obvious way to do that. Is it possible?
(I'm thinking probably this is "no" but I'll ask anyway.) Is there a way for a command to fix up a source file in Buck? Formatters are my main use case, but elm-review (a linter) also has automatic fixes that I feel confident enough in to always apply.

In case it's helpful, here's my elm-format rule:

elm-format in elm.bzl

```starlark def _elm_format(ctx: "context"): command = cmd_args(["elm-format", "--validate"]) command.add(ctx.attrs.srcs) if not ctx.attrs.srcs: fail("I need at least one Elm file to format, but got none.") return [ ExternalRunnerTestInfo( type = "elm_format", command = [command], ), DefaultInfo(), ] elm_format = rule( impl = _elm_format, attrs = { "srcs": attrs.list(attrs.source()), } ) ```

krallin commented 1 year ago

is there a way to tell Buck that if it has already run some tests for a specific set of inputs, it need not run them again?

You can't do that I'm afaid

is there a way to mark contended resources? In Elm, there's a compilation artifacts directory (elm-stuff.) It's not completely safe to have multiple instances of the compiler running at once on a single one (there is some file locking stuff but there's a crucial gap that means files can get truncated sometimes.)

Not currently (though that's actually being added, at least for tests). We would recommend designing your rules to avoid needing this kind of mutual exclusion however.

(I'm thinking probably this is "no" but I'll ask anyway.) Is there a way for a command to fix up a source file in Buck? Formatters are my main use case, but elm-review (a linter) also has automatic fixes that I feel confident enough in to always apply.

This is indeed generally something we would advise against. The way you can integrate with such linters is to have them emit structured data representing the changes that need to happen (either in a subtarget or in a provider that you then access via a BXL script), and then an external tool can call Buck to obtain this data and then apply the changes once Buck is done working.

is there some equivalent to Shake's batch in Buck? For example, I can run the formatting tool in its check mode in Shake, and it will be invoked so many times, only on the files that have changed since it was last invoked.

If you follow the suggestion above of having the formatter's "suggested changes" expressed as the output of running an action, then that will indeed only re-run if the inputs have changed.

I'd like something similar in Buck, if possible! Failing that, I'd like to split up the elm-format invocations, one per file, but I don't see an obvious way to do that. Is it possible?

Assuming you follow the advice above, this might look like (not tested):

def _elm_library(ctx: "context"):
  if not ctx.attrs.srcs:
        fail("I need at least one Elm file, but got none.")

  suggested_changes = []
  for src in ctx.attrs.src:
    suggestion = ctx.actions.declare_output("__suggested_changes__", src.short_path)
    # Obviously making up flags here, you can write a wrapper script as needed.
    ctx.actions.run(
      ["elm-format", "--suggest-changes", src, "--out", suggestion],
      category = "elm-format", identifier = src.short_path
    )
    suggested_changes.append(suggestion)

    return [
         # Would be good to expose a default output of course too
        DefaultInfo(sub_targets= {"format": [DefaultInfo(suggested_changes)]}),
    ]

elm_library = rule(
    impl = _elm_library,
    attrs = {
        "srcs": attrs.list(attrs.source()),
    }
)

I'd strongly recommend not trying to model this formatting as a test, it's pretty far away from the use case for tests

BrianHicks commented 1 year ago

Ooh! Such an interesting idea. OK, I can certainly move away from formatters-as-tests. Thanks for that advice. I guess in that case, you'd have some CI job running to make sure there are no formatting changes to be made instead of using buck2 test?

Also I guess when I'm asking "how can I avoid re-running tests", what I'm actually asking is this: is there a way to run tests only for things whose dependencies have changed? The docs I'm seeing imply that may be Tpx's job, though?

krallin commented 1 year ago

Also I guess when I'm asking "how can I avoid re-running tests", what I'm actually asking is this: is there a way to run tests only for things whose dependencies have changed? The docs I'm seeing imply that may be Tpx's job, though?

Not really actually. At Meta that's something that's done by CI detecting what you've changed and identifying relevant tests (this is built on top of buck2 targets, though I'm oversimplifying here), but if you just do buck2 test X having not changed anything locally, the tests will still run.

BrianHicks commented 1 year ago

Ok, neat! Thank you both again, this has been very helpful. I'm almost done moving everything in this small repo to Buck. I have a few remaining questions, though:

How exactly can I get a list of targets associated with a file? I don't see how to do it in the target language. The workflow I'd like to enable would be something like…
1. see what files changed in git
2. find the targets for those files
3. build/test only those (or build everything but test only those)
Is there a way to build all matching sub-targets? I defined a format sub-target as @krallin suggested, and tried buck2 build '//...[format]. Didn't work! I'm not sure how I can do that in a BXL script either—maybe I have to do a uquery for kinds that I know have a format sub-target?
is there a prescribed way to coordinate networked processes in buck other than "do whatever you want in a bash script"? I'm thinking a database or server for e2e tests, for instance.

krallin commented 1 year ago

How exactly can I get a list of targets associated with a file? I don't see how to do it in the target language. The workflow I'd like to enable would be something like…

This should work:

buck2 uquery 'owner(%s)' $FILE

Is there a way to build all matching sub-targets

Unfortunately not, bxl is likely what you want here, yes. In BXL you don't really need to filter by the kind, you can just do analysis then use analysis[DefaultInfo].sub_targets.get("format") and see whether that's None or not.

is there a prescribed way to coordinate networked processes in buck other than "do whatever you want in a bash script"? I'm thinking a database or server for e2e tests, for instance.

Not currently

BrianHicks commented 1 year ago

when you say "do analysis"—what kind of analysis? Do you mean to just get all the targets and then look inside them to see if they have a "format" target?

krallin commented 1 year ago

when you say "do analysis"—what kind of analysis? Do you mean to just get all the targets and then look inside them to see if they have a "format" target?

Pass a list of targets to your BXL script, then do:

analysis = ctx.analysis(targets)
formats = filter(None, [analysis.providers()[DefaultInfo].sub_targets.get("format") for analysis in all_analysis.values()])

In fact, you can also just have the bxl script do the uquery at which point you can just give it the file path

BrianHicks commented 1 year ago

amazing, thank you. I also just managed to trigger exactly the builds/tests I need with that uquery. Such a big improvement!

Y'all are really making my day over here. ❤️

krallin commented 1 year ago

I'll close this for now since it seems like we've answered the questions you had so far but feel free to just create a new issue if you've got new questions.

facebook / buck2

where can I ask questions / get help? #115