bazelbuild / rules_rust

Rust rules for Bazel
https://bazelbuild.github.io/rules_rust/
Apache License 2.0
651 stars 412 forks source link

Splicing a workspace is slow #1873

Open typesanitizer opened 1 year ago

typesanitizer commented 1 year ago

Sample repo: https://github.com/typesanitizer/slow-repin-lib

CARGO_BAZEL_REPIN=1 bazel sync --only=crate_index takes over 2-3 minutes to complete on my M1 Mac, for 169 dependencies. In contrast, a clean cargo build, including fetching dependencies, takes about 1m15s.

This example repo is somewhat simplified from our monorepo https://github.com/sourcegraph/sourcegraph, where it takes 150%-250% more time, with ~40% more deps in total.

I understand that some of the overhead may be due to sandboxing on macOS, but I suspect that's probably not the full explanation (it looks like --sandbox_strategy=local doesn't work for bazel sync, so I couldn't tell how to test without sandboxing).

I don't know if there is a standard way of printing the executed commands (--subcommands doesn't work with bazel sync), so I hackily added a print invocation before the cargo-bazel invocation in a local checkout of rules_rust and got something like:

CARGO=/private/var/tmp/_bazel_varun/9733e4cbb184d6ba5c16ee75a5b939da/external/rust_darwin_aarch64__aarch64-apple-darwin__stable_tools/bin/cargo RUSTC=/private/var/tmp/_bazel_varun/9733e4cbb184d6ba5c16ee75a5b939da/external/rust_darwin_aarch64__aarch64-apple-darwin__stable_tools/bin/rustc RUST_BACKTRACE=full CARGO_HOME=/private/var/tmp/_bazel_varun/9733e4cbb184d6ba5c16ee75a5b939da/external/crate_index/.cargo_home /private/var/tmp/_bazel_varun/9733e4cbb184d6ba5c16ee75a5b939da/external/crate_index/cargo-bazel splice --output-dir /private/var/tmp/_bazel_varun/9733e4cbb184d6ba5c16ee75a5b939da/external/crate_index/splicing-output --splicing-manifest /private/var/tmp/_bazel_varun/9733e4cbb184d6ba5c16ee75a5b939da/external/crate_index/splicing_manifest.json --config /private/var/tmp/_bazel_varun/9733e4cbb184d6ba5c16ee75a5b939da/external/crate_index/cargo-bazel.json --cargo /private/var/tmp/_bazel_varun/9733e4cbb184d6ba5c16ee75a5b939da/external/rust_darwin_aarch64__aarch64-apple-darwin__stable_tools/bin/cargo --rustc /private/var/tmp/_bazel_varun/9733e4cbb184d6ba5c16ee75a5b939da/external/rust_darwin_aarch64__aarch64-apple-darwin__stable_tools/bin/rustc --cargo-lockfile /Users/varun/Code/issues/slow-repin/slow-repin-lib/Cargo.lock

On re-running that command with cargo flamegraph (I'm not sure if I need to be aware of extra caches?), I got this flamegraph: (GitHub doesn't provide interactivity, but you should be able to see more information if you download it from https://user-images.githubusercontent.com/7187503/224541869-3dc2f87e-bbd3-41c7-bcc2-c6c112d5a90a.svg and open it in the browser)

flamegraph

Is there a way to speed up the splicing step on re-pinning?

hobofan commented 1 year ago

Not sure if it's related, but have you tried running the sync command with CARGO_BAZEL_ISOLATED=false? Usually a big chunk of the repinning time for us was cloning the crates.io index from scratch, which will happen for the default value of CARGO_BAZEL_ISOLATED (I'm not sure if that shows up as part of the command your flamgraph is based on).

One alternative to that would also be the new sparse index feature (https://github.com/bazelbuild/rules_rust/pull/1857).

typesanitizer commented 1 year ago

Oh wow, CARGO_BAZEL_ISOLATED=0 CARGO_BAZEL_REPIN=1 bazel sync --only=crate_index cuts the time to 40s for the sample repo from 3min.

Usually a big chunk of the repinning time for us was cloning the crates.io index from scratch

Is the splicing step also supposed to be recloning the index? I know there is an index cloning step at the start. I'd run a Bazel profile earlier, and about 20-30% of the time was in the initial index fetching (which is what I understood the sparse index feature to be speeding up), and the rest was opaque in the Bazel profile because it was running the cargo-bazel binary.

That said, I looked at the linked PR, and it seems like the splicing step also needs to be made sparse index aware.

I've already turned on the sparse index via .cargo/config.toml in the sample repo; I'll see if I can test against the PR and get a good speedup.

illicitonion commented 1 year ago

Thanks for the detailed bug report!

My general expectation is:

I'm hoping that after #1857 and #1874 land splicing should pretty reliably take single-digit seconds.