bazelbuild / rules_rust

Rust rules for Bazel
https://bazelbuild.github.io/rules_rust/
Apache License 2.0
650 stars 409 forks source link

Using crate_universe forces serial fetching of host and target toolchains #2754

Open criemen opened 1 month ago

criemen commented 1 month ago

When using crate_universe, the module extension unfortunately forces first a fetch of the host toolchain, then the module extension finishes, and only then will the target toolchain be fetched.

That's in part due to the lazy-downloading of target toolchains (that's great!), in part because the crate_universe modext can't express multiple dependencies for preloading at the same time (as far as I'm aware, at least).

The result is that we're paying the (quite high) cost of unzipping the toolchain twice, as we can't parallelize this. This'd be less of a problem if the toolchain were zstd-compressed, but that doesn't look like it's happening anytime soon.

What could we do? I don't know enough about what makes the host toolchain so special (with regards to the module extension attributes returned), so this is a bit speculative. Couldn't we, during target toolchain setup, detect if there's a host toolchain configured with the same tools, edition and version (that ought to be fairly standard), and then, instead of generating a separate tools repository, re-use the host toolchain? This might require moving the rust_host_tools modext out of extensions.bzl, which, per the documentation, is recommended anyway.

Another way would be to offer a possibility of somehow force fetching the rust toolchain. That's quite tricky, as it needs to happen in parallel to crate_universe.

For now, the best I could come up with was the following repository rule, but it is suuuuuper hacky and relies on a bunch of bazel and rules_rust implementation details.

load("@rules_rust//rust:repositories.bzl", "DEFAULT_TOOLCHAIN_TRIPLES")
load("@rules_rust//rust/platform:triple.bzl", "get_host_triple")

def _assert_file_exists(ctx, path):
    resolved_path = ctx.path(path)
    if not resolved_path.exists:
        fail("Could not find file %s" % resolved_path)

def _preload_rust_impl(repository_ctx):
    host_triple = get_host_triple(repository_ctx)
    toolchains = []
    for triple, name in DEFAULT_TOOLCHAIN_TRIPLES.items():
        if triple == host_triple.str or triple in repository_ctx.attr.extra_target_triples:
            toolchains.append((name, triple))
    for (name, triple) in toolchains:
        repo_name = "@@rules_rust~~rust~{}__{}__{}_tools".format(name, triple, "stable")
        _assert_file_exists(repository_ctx, Label("%s//:WORKSPACE.bazel" % repo_name))
    repository_ctx.file("WORKSPACE.bazel", "")
    repository_ctx.file("BUILD.bazel", "")
    repository_ctx.file("preload_rust.bzl", "def preload_rust():\n    pass\n")

preload_rust = repository_rule(
    implementation = _preload_rust_impl,
    attrs = {
        "extra_target_triples": attr.string_list(),
    },
    local=True
)

This also then requires

load("@preload_rust//:preload_rust.bzl", "preload_rust")

in a/the build files where you build your rust code to trigger the toolchain pre-loading.

criemen commented 1 month ago

I guess the other question would be why we can't use the usual toolchain selection mechanism for the host toolchain, too, which naturally would eliminate the duplication.

illicitonion commented 1 month ago

I guess the other question would be why we can't use the usual toolchain selection mechanism for the host toolchain, too, which naturally would eliminate the duplication.

We use the toolchain in repository rules. There's a bootstrapping problem here - Bazel does the entire repository fetching phase before it allows accessing toolchains, because repository rules often generate toolchains.