facebook / buck2

Build system, successor to Buck
https://buck2.build/
Apache License 2.0
3.33k stars 194 forks source link

Creating symbolic links to toolchains #605

Open AtomicOperation opened 2 months ago

AtomicOperation commented 2 months ago

I'm trying to set up a rule that will create a symbolic link inside the buck-out folder to a toolchain that is stored elsewhere. I have a package management system that will ensure that the contents inside the symlinked folder won't change. The rule has an attribute which is the toolchain version, and when the rule/script is run with a different version, it will adjust the symlink to the right target path.

 def _install_toolchain_impl(ctx = "context"):
    base_folder = ctx.actions.declare_output("out", dir = True)
    toolchain_folder = base_folder.project(ctx.attrs.toolchain_name)

    command = cmd_args(["install_toolchain.py", 
        "--toolchain", ctx.attrs.toolchain_name,
        "--version",  ctx.attrs.version,
        "--destination", toolchain_folder.as_output(),
        ] 
    )

    sub_targets = {}
    for sub_target in ctx.attrs.sub_targets:
        output = package_folder.project(sub_target)
        command.hidden(output.as_output())
        sub_targets[sub_target] = [DefaultInfo(default_outputs = [output])]

The result of the script is to have out/toolchain_name be a symlink to the target directory.

However, buck is strict about what it considers to be a directory.

If I use declare_output with dir = True, I get something like Required outputs are missing: Action didn’t produce output of the right type. Expected buck-out/v2/gen/root/folder/toolchain_name to be Directoryreal File.

If I use base_folder.project() (like in the example above) then I get an error like Find would traverse a leaf at path: buck-out/v2/gen/root/folder/out/include.

It makes sense that buck is trying to stop me from declaring an artifact that's nested below a regular file. Is there any way to convince it to allow this? I'm suspecting that the best thing to do would be to find the place where the "Action didn't produce output of the right type" error is returned and change it to allow symlinks to folders. Any other thoughts?

JakobDegen commented 2 months ago

If you intend for the out artifact to be a directory, then you definitely need to declare it with dir = True, and buck is right to make you do that.

I suspect the reason that things go wrong if you do that is because you're trying to use a projected artifact as an output. It's not really clear what the behavior of that is supposed to be to me, I wouldn't be surprised if no one ever really thought about that case. If you switch to something like "--destination", cmd_args(base_folder.as_output(), ctx.attrs.toolchain_name, delimiter = "/"), it'll work better.

I would have to dig into this a bit more to confirm though. I also realize that won't work on windows as written too, which is a bit meh. May think about this more

zjturner commented 2 months ago

Isn’t http_archive using a projected artifact as an output?

https://github.com/facebook/buck2/blob/39d275483a79df668ca6eb73b534ea83c27e856f/prelude/http_archive/http_archive.bzl#L215

AtomicOperation commented 2 months ago

I've tried changing the python script to implicitly create the directory (so the --destination arg is passing in just the base folder) and that has the same problem. It seems to be related to the output = package_folder.project(sub_target) call when an intermediate folder on-disk is a symlink. I'd have to go back and verify, but I'm pretty sure the same rule works if the python script creates actual folders without symlinks.

JakobDegen commented 2 months ago

@zjturner output of an action, not output of analysis.

@AtomicOperation can you confirm, is the python script trying to make the output dir a symlink or the toolchain_name in the output dir?

AtomicOperation commented 2 months ago

I'm trying to make an output dir (base_folder in the snippet above) and a symlink to a folder underneath that (toolchain_folder in the snippet).

AtomicOperation commented 1 month ago

I suspect the reason that things go wrong if you do that is because you're trying to use a projected artifact as an output. It's not really clear what the behavior of that is supposed to be to me, I wouldn't be surprised if no one ever really thought about that case

@JakobDegen can you explain this a little more? Isn't the entire point of projecting an artifact to create an output relative to some other output?

JakobDegen commented 1 month ago

can you explain this a little more? Isn't the entire point of projecting an artifact to create an output relative to some other output?

Right, but the typical expectation is that you first declare the artifact, use .as_output() to make it the output of some action, and only then project it - in other words, projection is meant for an already bound artifact.

So concretely, I would expect that the version of the code that works looks something like this:

def _install_toolchain_impl(ctx = "context"):
    base_folder = ctx.actions.declare_output("out", dir = True)

    command = cmd_args(["install_toolchain.py", 
        "--toolchain", ctx.attrs.toolchain_name,
        "--version",  ctx.attrs.version,
        "--destination", base_folder.as_output(),
        ] 
    )

    sub_targets = {}
    for sub_target in ctx.attrs.sub_targets:
        output = base_folder.project(ctx.attrs.toolchain_name).project(sub_target)
        command.hidden(output.as_output())
        sub_targets[sub_target] = [DefaultInfo(default_outputs = [output])]

Note that I'm passing the entire base_folder.as_output() to the action, and then only calling .project() afterwards. Obviously this would need adjustments to the python script

AtomicOperation commented 1 month ago

Thanks, but unfortunately doing it the way you suggest with the .project() call happening after .as_output() results in the same Find would traverse a leaf at path... error.

zjturner commented 1 month ago

I looked into this some and I can reproduce this. Here's a minimal reproducer:

# root//bucklink:BUCK
load(":rule.bzl", "link_dir")

link_dir(
    name = "makelink",
    link_name = "hello",
    link_target = "D:\\src\\buck2",
    script = "makelink.py",
    sub_targets = [
        "buck2",
        "prelude"
    ]
)
# root//bucklink:makelink.py

import os
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--link_name", type=str)
parser.add_argument("--link_target", type=str)
parser.add_argument("--destination", type=str)

args = parser.parse_args()

os.makedirs(args.destination, exist_ok=True)

symlink_path = os.path.join(args.destination, args.link_name)
link_location = args.destination
link_target = args.link_target
print(f"Creating symlink from {symlink_path} to {link_target}")
os.symlink(link_target, symlink_path, target_is_directory=True)
# root//bucklink:rule.bzl

load("@prelude//python:toolchain.bzl", "PythonToolchainInfo")

def _link_dir_impl(ctx:AnalysisContext) -> list[Provider]:
    base_folder = ctx.actions.declare_output("out", dir = True)

    python = ctx.attrs.python[PythonToolchainInfo]
    args = python.interpreter.args.copy()
    args.add(ctx.attrs.script)
    args.add("--link_name", ctx.attrs.link_name)
    args.add("--link_target", ctx.attrs.link_target)
    args.add("--destination", base_folder.as_output())

    ctx.actions.run(
        args,
        category = "genrule",
    )

    sub_targets = {}
    for sub_target in ctx.attrs.sub_targets:
        output = base_folder.project(ctx.attrs.link_name).project(sub_target)
        sub_targets[sub_target] = [DefaultInfo(default_outputs = [output])]

    return [DefaultInfo(default_outputs=[base_folder], sub_targets=sub_targets)]

link_dir = rule(
    impl = _link_dir_impl,
    attrs = {
        "python": attrs.default_only(attrs.toolchain_dep(providers = [PythonToolchainInfo], default = "toolchains//:python")),
        "script": attrs.source(),
        "link_name": attrs.string(),
        "link_target": attrs.string(),
        "sub_targets": attrs.list(attrs.string()),
    }
)

Just replace the link_target in bucklink/BUCK with a real path to your buck2 source folder.

If you run buck2 build //bucklink:makelink it works. The link is created on disk. No errors are generated. But if you instead run buck2 build //bucklink:makelink[prelude] you get an error like this:

$ buck2 build //bucklink:makelink[prelude]
File changed: root//bucklink/rule.bzl
Build ID: e9cb1937-0626-466a-a1e3-db73effdfb13
Jobs completed: 6. Time elapsed: 0.1s.
Cache hits: 0%. Commands: 1 (cached: 0, remote: 0, local: 1)
BUILD FAILED
The path `hello/prelude` cannot be projected in the artifact ``(root//bucklink:makelink (foo//platforms:native#ae24adfe62876596))/out`, action: (target: `root//bucklink:makelink (foo//platforms:native#ae24adfe62876596)`, id: `0`)`

Caused by:
    Find would traverse a leaf at path: `buck-out/v2/gen/root/ae24adfe62876596/bucklink/__makelink__/out/prelude`
JakobDegen commented 1 month ago

Back from travel.

I asked about this, and it would indeed be possible to support projecting through symlinks, at least external symlinks. However, it's a bit of a refactor because you end up with artifacts that aren't really at the place they claim to be.