Open Madoshakalaka opened 2 years ago
Can we make this an option?
The first problem still remains, as you cannot update the cache, so it will still be ineffective. But a better solution for cleaning the cache (based on access times, or whatever) would be nice for sure. Iām not quite up-to-date what the current way of using binary assets in actions is like, would need to look that up.
how about using src code as part of the keys too? With the valina actions/cache it would be something like:
- uses: actions/cache@v2
with:
path: |
# ...
key: cargo-${{ runner.os }}-${{ hashFiles('my-crate-directory/src/**') }}-${{ hashFiles('Cargo.lock') }}
restore-keys: |
# ...
To make this work for workspaces, rust-cache can allow the users to specify the member crates to be hashed.
I use a workspace with many (~20) crates in a larger project. I'd certainly get benefit from having some of the lower level dependencies within our own crates cached. We use rust-cache to cache all our external dependencies, but there's still 2-3 minutes worth of build at the end to build our 20 crates.
I could imagine a solution to issue (1) by including say the timestamp of the week in the cache key - such that it's cleared every week (or day). I could also envision a solution where the src
is part of the hash key, but it would be important for the restore-key to also be willing to try older ones, since src
changes a lot.
It would be nice to be able to opt-in to caching ones own workspace code - for projects willing to make a different tradeoff than the default.
Something like cargo sweep would be a cherry on top, but for us, saving 2-3 minutes in the common case would be huge.
I think in this case the right answer is probably to have separate cache blobs for external vs. internal dependencies. External dependencies only change when (certain) changes are made to Cargo.toml
or Cargo.lock
, but internal dependencies change due to changes to source files (which keep in mind can absolutely include files other than .rs
files because e.g. include_str!
is a thing).
However there's a separate problem, which is that cargo
will probably ignore the cache because it's idea of a "hash" of the source files includes the file modification timestamp, and that timestamp will be different every time you check out the code. This is a problem in general, but particularly for such internal dependencies. See https://github.com/rust-lang/cargo/issues/6529.
For reference (presently only available in nightly): https://blog.rust-lang.org/2023/12/11/cargo-cache-cleaning.html
We use rust-cache to cache all our external dependencies, but there's still 2-3 minutes worth of build at the end to build our 20 crates.
+1
Something like cargo sweep would be a cherry on top, but for us, saving 2-3 minutes in the common case would be huge.
+1
The first problem still remains, as you cannot update the cache, so it will still be ineffective.
Always pushing a new cache, and generating/specifying a different (shorter) restore key, the latest cache with a matching prefix will be fetched.
In our (desired) model, workflow runs on master
(after PRs got merged, or on a nightly schedule) would be the "source" of the cache - both for external dependencies; and for our own code (multiple large crates in the repo), with incremental compilation enabled - and the PR runs would be "consumers" of the cache, for speedy merge-ability checks.
but there's still 2-3 minutes worth of build at the end to build our 20 crates.
Does that extra time not reproduce locally? (may be less on a local system that is more powerful than a runner)
Recently I was building Vector locally and exploring caching. Running cargo build
again after the first build (without changing anything) still took quite a while. This was with 680+ crates compiled.
opt-level=0
/ codegen-units = 1
or set lto = "off"
).incremental = true
brought that down to 6 seconds.I still had 50s (with implicit default lto = false
, aka crate local thin LTO?), 44s was due to the implicit LTO.
When building the full project (900+ crates, instead of my reduced feature set build):
lto = "off"
(with incremental = true
, otherwise 3m14s). So over 6 minutes total with implicit LTO and no incremental builds enabled.lto="false"
7m30s.You can also use linker-plugin-lto
with Clang linker
and Mold (actual linker via plugin) to also cache LTO (you need to ensure your clang and rust toolchain versions are compatible):
# .cargo/config.toml
# The `build` section applies for all targets:
[build]
rustflags = [
# Set a linker like clang (affects link-arg compatibility):
"-C", "linker=clang",
# This one is required to enable the LTO caching option:
"-C", "linker-plugin-lto",
# This is important for linker-plugin-lto to work (note: `-flto=thin` / `-flto=full` have no effect here?):
"-C", "link-arg=-flto",
# By default this value is 0, which is 1 thread per physical core.
# Use `all` to get 1 thread per logical core to use full CPU like rust normally would do:
"-C", "link-arg=-flto-jobs=all",
"-C", "link-arg=-Wl,--thinlto-cache-dir=lto-cache",
# Actual linker to use:
"-C", "link-arg=-fuse-ld=mold"
]
Some of the link-arg
vary based on configured linker (-C link-arg=-fuse-ld=...
), for mold the link-arg
options match those documented for lld
. Worth keeping in mind if you use something else instead of mold (lld
/ ld
), or don't use clang
(uses gcc
then AFAIK which also has different options).
This adds a little overhead, but with the cache available, you get faster rebuilds.
incremental = true
+ 7s with -C linker-plugin-lto
).incremental = true
+ 20s with -C linker-plugin-lto
).Over 70% reduction! š
Give it a try, perhaps it'll make a difference for you? :)
However there's a separate problem, which is that
cargo
will probably ignore the cache because it's idea of a "hash" of the source files includes the file modification timestamp, and that timestamp will be different every time you check out the code.
This concern has some workarounds?:
mtime
timestamp yourself in CI to control that?mtime
for each file by their last associated commit date. Although from your linked issue, that's apparently problematic for CI/caching :(Earthly has a feature to optionally preserve timestamps when copying from host to image, noting it's important for incremental = true
cache to work correctly, and the linked concern for git-restore-mtime
may not apply to Github Actions, so it may be a valid workaround š
Unfortunately, cargo-sweep doesn't publish any binary builds nor github actions.
cargo binstall
can work?:
# Grab and extract a static release of cargo-binstall:
curl -sSfL "https://github.com/cargo-bins/cargo-binstall/releases/latest/download/cargo-binstall-$(uname -m)-unknown-linux-musl.tgz" | tar -xz
# Install cargo-binstall into a proper `cargo binstall` command if useful afterwards, and install cargo-sweep:
# These will be installed to the default $CARGO_HOME (eg: /usr/local/cargo/bin/)
# If that location is part of the PATH env, the installed binaries will still be usable if $CARGO_HOME changes.
./cargo-binstall --quiet --no-confirm cargo-binstall cargo-sweep
rm ./cargo-binstall
cargo binstall can work?:
You could use taiki-e/install-action, which automatically fallback to cargo-binstall for crates if doesn't natively support.
It would install cargo-binstall to $CARGO_HOME/bin
so that it can be reused by taiki-e/install-action.
I have totally no idea that lto cache exists, would be great if zig-cc supports it since we use it.
Wonder if rustc/cargo would expose such option.
cargo-sweep is a tool to be used around any build actions to timestamp and remove unvisited cache. It does so by checking the atime (access time) of the files in the target directory.
With this method, incremental build can be left on, and build of own code can be cached for greater speed. Unfortunately, cargo-sweep doesn't publish any binary builds nor github actions.
37 mentions the benefits of not caching one's own code: