bazelbuild / rules_rust

Rust rules for Bazel
https://bazelbuild.github.io/rules_rust/
Apache License 2.0
650 stars 409 forks source link

Allow path dependencies without having everything in one Cargo workspace #1773

Open shelbyd opened 1 year ago

shelbyd commented 1 year ago

My team has a monorepo for our Rust + Go + JS project. We're trying to get all of our rust code compiling in Bazel. We have path dependencies for our own code inside the monorepo.

AFAICT this function https://github.com/bazelbuild/rules_rust/blob/52e02c25b6fb6f035eb453fec6162243e2d48daa/crate_universe/src/splicing/splicer.rs#L566 effectively requires us to have all of our Rust code in a single Cargo workspace if we're going to use path dependencies. This isn't a huge problem except for dependency resolution.

Cargo can't find compatible versions of dependencies, and I'm not aware of a way to resolve that. It's possible that specifying versions in the WORKSPACE file could work, or vendoring the dependencies. But I'd prefer not to do either of those options.

If we remove the root [workspace], normal Cargo builds everything fine (path dependencies included). So it seems that the requirement of a Cargo workspace when using path dependencies is unnecessary and would be the most effective way to resolve our issue.

Does this sound reasonable? Is there a better way to resolve our situation?

hobofan commented 1 year ago

I think this is a duplicate (or at least has large overlap) with https://github.com/bazelbuild/rules_rust/issues/1525.

shelbyd commented 1 year ago

This is a large issue for my team. We have a few binaries in our repo that we cannot bring into Bazel because of this.

Looking at this more, the issue is that we have:

  1. bin0 indirectly requires some_crate-v0 and our_lib
  2. bin1 indirectly requires some_crate-v1 and our_lib
  3. our_lib does not require some_crate (but may in the future)

There should be no target that actually ends up with both some_crate-v0 and some_crate-v1 in the build.

I tried to specify packages in our root crates_universe by aliasing the different versions of some_crate. But { "some_crate-v0": crate.spec(package = "some_crate", version = "0") } did not generate any rules related to some_crate, so I don't know what to do there. I could probably get this working if that package spec worked.

The only remaining solution I see is to vendor our crates and have a single version of some_crate vendored. They're almost surely compatible, or at least minimal changes required. But I'd prefer not to include all of the crates we use. Is there a way to vendor only certain crates that we need to modify but keep them in the same crate_universe?

As I said, this is a serious issue for my team, so I'd at least like some idea of which direction to go in, or potential alternatives. Thanks.

hobofan commented 1 year ago

So I figured out a workaround (that I've used in variant with cargo-raze in the past which has similar limitations): symlinking the crates, so that they are present in multiple places in the directory tree.

I've built an example repository here: https://github.com/hobofan/rules_rust_multiple_workspaces - I'll see that I can describe it in more detail in the coming days after trying out that approach more in depth.

There are some downsides with that approach (but it seems to be enough for our team at least to make the switch), so first-class support would still be great.

Some limitations (I'm sure there are some I've yet to encounter):

shelbyd commented 1 year ago

I was able to get my package compiling with [patch]ing dependencies with conflicts and updating them to remove the conflicts in another repo. See https://github.com/bazelbuild/rules_rust/pull/1645#issuecomment-1399339918 for some more info.

I'm not sure what the best way to resolve the core problem is. It's reasonable to require [patch]ing conflicting dependencies. But I shouldn't have to have them in a separate repo, so [patch]ing just the conflict in a locally vendored directory is reasonable.

irajtaghlidi commented 4 months ago

I encountered the same core issue in my monorepo, which includes multiple Cargo workspaces with path dependencies between different workspace crates. The symlink that @hobofan provided worked in my test project, but for the actual monorepo that I have, it would be very difficult to manage. I just wanted to ask about your recent findings. Have you found any better solutions for this problem?

UebelAndre commented 4 months ago

This is something we think would be good but haven't had time to really think about and implement. My primary concern with the feature is that folks start using path dependencies as a way to inject dependencies external to the current Bazel workspace. So Ideally relative paths leading beyond BUILD_WORKSPACE_DIRECTORY and absolute paths would be prohibited or some very loud and obvious warning raised informing users they're in dangerous waters.

If someone were to open a PR showing these things were covered in addition to the working feature then I'd be happy to review 😄

irajtaghlidi commented 4 months ago

Since using external dependencies is handled with crates universe and also path dependences (the ones which are outside of Cargo workspace but still inside the Bazel workspace) works fine if we don't use crates_repository and directly include them in BUILD files. What if we add a new flag option to ask the crate universe to accept and just add path dependencies outside of the Cargo workspace as is to all_crate_deps() output array? I just want to start working on a PoC to see how it acts. at least in my setup.

hobofan commented 4 months ago

So we've used a variation of the symlink based approach I described in an earlier comment for more than half a year at the company I worked at until it went bankrupt. I still have access and rights to use the code, so I'll try to distill the "sugar" we built around that approach before I forget how it worked 😅. The rules_rust version used in the repo is 0.17.0, so I hope this isn't too outdated today.

At the end we had used the approach with 10 different crate indices (I've truncated it to two in the examples below). It worked quite decently for reducing the blast radius of dependency updates between different parts of the monorepo and resolving some walls that we hit with dependency resolution in building for different targets (Linux, macOS and WASM), but also had the downside that some compile-heavy deep dependencies (in our case the patched version of RocksDB that came with oxigraph at the time) was compiled ~4 times which could make up the bulk of our total build time. (The impact of that can possibly be mitigated in some way by building that crate more manually and providing that to crates_universe.)

Rough Cargo.toml and symlink structure

|
-- some_package
|   |- Cargo.toml
|   |- multi_index_targets.bzl
|   -  BUILD.bazel
| 
-- other_package
    |- Cargo.toml
    - BUILD.bazel
    - local_vendor
        |- some_package  <- symlink to /some_package

some_package and other_package are in two separate workspaces. some_package is symlinked into the workspace of other_package.

In the Cargo.toml of other_package:

[workspace.dependencies]
some_package = { path = "./local_vendor/some_package" }

Here, some_package appears as a normal local dependency, so on the consuming end there isn't anything special that has to be done with crates_universe. IIRC this worked a lot better than having it as one of the [members] of the workspace, as I did in the example repo I posted initially.

/bazel_tools/rust_workspace_indices.bzl

Used to switch between different aliases and all_crate_deps implementations based on index key.

load("@oxolotl_index//:defs.bzl", oxolotl_aliases = "aliases", oxolotl_all_crate_deps = "all_crate_deps")
load("@query_server_index//:defs.bzl", query_server_aliases = "aliases", query_server_all_crate_deps = "all_crate_deps")

def resolve_rust_workspace_index(index):
    alias_macro = None
    all_crate_deps_macro = None
    if index == "@query_server_index":
        alias_macro = query_server_aliases
        all_crate_deps_macro = query_server_all_crate_deps
    if index == "@oxolotl_index":
        alias_macro = oxolotl_aliases
        all_crate_deps_macro = oxolotl_all_crate_deps

    if alias_macro == None:
        fail("""
        ============
        ============
        It looks like the index `{index}` is not known. Please register it in `/bazel_tools/rust_workspace_indices.bzl`.
        ============
        ============""".format(index = index))

    return (alias_macro, all_crate_deps_macro)

some_package/multi_index_targets.bzl

For each target that should be built for multiple indices we created a thin rust_library macro that utilized resolve_rust_workspace_index that looked something like this example. The macros also take other targets that were specifically built for this index as an argument (here dep_oxolotl_proto_types).

load("@rules_rust//rust:defs.bzl", "rust_library")
load("//bazel_tools:rust_workspace_indices.bzl", "resolve_rust_workspace_index")

VISIBILITY = [
    "//oxolotl:__subpackages__",
    "//query_server:__subpackages__",
    "//msg_server:__subpackages__",
]

def oxolotl_library(name, package_name, index, dep_oxolotl_proto_types, **kwargs):
    (alias_macro, all_crate_deps_macro) = resolve_rust_workspace_index(index)

    rust_library(
        name = name,
        srcs = native.glob(
            [
                "src/**/*.rs",
            ],
        ),
        aliases = alias_macro(
            package_name = package_name,
        ),
        crate_name = "oxolotl",
        edition = "2021",
        proc_macro_deps = all_crate_deps_macro(
            package_name = package_name,
            proc_macro = True,
        ),
        visibility = VISIBILITY,
        deps = all_crate_deps_macro(
            package_name = package_name,
            normal = True,
        ) + [
            dep_oxolotl_proto_types,
        ],
        **kwargs
    )

some_package/BUILD.bazel

The macro was then used in the BUILD.bazel file of the package like so (as seen in package_name this BUILD file needs to know about it's consumers so some coupling is introduced). As seen in the rust_test, in this file the package_name has to be explicitly provided in it's non-symlinked version, because otherwise the BUILD file has problems during evaluation (I don't remember the exact reason).

oxolotl_library(
    name = "oxolotl",
    package_name = "oxolotl/oxolotl",
    dep_oxolotl_proto_types = "//oxolotl/oxolotl_proto_types:oxolotl_proto_types",
    index = "@oxolotl_index",
)

oxolotl_library(
    name = "oxolotl_for_query_server",
    package_name = "query_server/local_vendor/oxolotl",
    dep_oxolotl_proto_types = "//query_server/local_vendor/oxolotl_proto_types:oxolotl_proto_types_for_query_server",
    index = "@query_server_index",
)

rust_test(
    name = "oxolotl_lib_test",
    crate = ":oxolotl",
    env = {
        "INSTA_WORKSPACE_ROOT": "$(rootpath :oxolotl)",
    },
    deps = oxolotl_all_crate_deps(
        package_name = "oxolotl/oxolotl",
        normal_dev = True,
    ) + [":oxolotl"],
)

Hope this gives some insight and that I didn't forget an integral part about it 😬