Open svartalf opened 4 years ago
In rust-analyzer, we remove artifacts for the workspace-local files before the cache stores. While this isn't required to reduce cache churn like it does on auto-keyed caching (e.g. Travis) (as a matching key will not store the cache), it does still reduce cache sizes.
If actions-rs ever provides a "best practice" caching wrapper, it should probably allow scrubbing these files, so only dependencies' artifacts are cached. (Specifically, scrubbing files directly in ./target/{debug,release}/
as well as folders/files directly in ./target/{debug,release/{.fingerprint,deps,incremental}/
that contain the names of workspace-local libs/bins.
@CAD97 noted, thank you! I do want to create some kind of actions-rs/cache
, which will cache all these things automatically, but as this issue points to, there is no programmatic access to the cache routines right now.
Lucky timing, but looks like a few hours before I came across this issue someone posted a possible workaround to the upstream issue: https://github.com/actions/cache/issues/55#issuecomment-612575988.
Also worth noting that in addition to the target/ directory examples, currently I have these cache keys set up in my jobs:
- name: cache cargo index
uses: actions/cache@v1
with:
path: ~/.cargo/git
key: cargo-index-${{ hashFiles('**/Cargo.lock') }}
restore-keys: |
cargo-index-
- name: cache cargo registry
uses: actions/cache@v1
with:
path: ~/.cargo/registry
key: cargo-registry-${{ hashFiles('**/Cargo.lock') }}
restore-keys: |
cargo-registry-
There is an initial version of @actions/cache
available now: https://github.com/actions/toolkit/tree/master/packages/cache
One problem with using a hash of Cargo.lock
in the cache key is that for library crates, checking Cargo.lock
into version control is discouraged. So there may not be enough data in the repository checkout so that the cache action could score the hit with the dependencies exactly right for the build, before cargo fetch
or cargo update
generates Cargo.lock
locally. But these commands also fetch the dependencies, so they would benefit from some hopefully close snapshot of ~/.cargo/registry
to be restored from the cache already.
I have developed a workflow solution that maintains a Cargo.lock
checked in and automatically updated in a special git branch. In every workflow run that uses cargo to build, Cargo.lock
is grafted on top of the commit that triggered the build, and then the hash of Cargo.lock
is used as the last part of the cache key for restoring ~/.cargo/registry
. The hope is that the lockfile generated off the latest master tree often suits the build of a somewhat divergent revision with no new dependency crates to fetch. This is probably too much complexity for a single GitHub action to hide, though.
for library crates, checking Cargo.lock into version control is discouraged.
This is true, but I both do and don't understand why it's the case. If you're caching the build anyway, you're (probably) going to be caching a Cargo.lock anyway. Committing a Cargo.lock for a library is good for libraries imho for the exact same reasons it would be good for binaries.
So there may not be enough data in the repository checkout so that the cache action could score the hit with the dependencies exactly right for the build, before
cargo fetch
orcargo update
generatesCargo.lock
locally. But these commands also fetch the dependencies, so they would benefit from some hopefully close snapshot of~/.cargo/registry
to be restored from the cache already.
You probably want cargo generate-lockfile
then. Also, general knowledge that I've absorbed is that .cargo/registry
isn't worth caching because 1) it costs a "similar" amount to download that file from the cache or to download it via cargo, and 2) cargo will update an outdated registry anyway for most commands unless you use --offline
.
So I think there are two useful "simple enough" behaviors here:
Cargo.lock
. Use its hash as part of the cache key.Cargo.lock
, and always run CI against the latest versions of dependencies. Run cargo generate-lockfile
first, then use the generated lockfile's hash as part of the cache key.And the idealized verison of the second case, which I don't think is (or will be in the near future) possible:
Cargo.lock
, and always run CI against the latest versions of depedencies. Run cargo generate-lockfile
first, then use the generated lockfile's hash as part of the cache key, restoring from the head's cache, but only persisting the new or reused partial compilation artifacts, so there's no unused artifacts bloating the cache.Personally, I think checking in lockfiles is the better plan, even for libraries, because it makes CI builds more deterministic, and less likely to break because of the outside world. I never want to be in a situation where CI passed on the head, but a PR fails due to no fault of its own, just because a library updated in a minorly breaking way. Rather, I think the better model is to put the Cargo.lock
into the source control and have dependabot (or a similar bot) PR and bors r+
any lockfile updates as libraries release new versions.
You probably want
cargo generate-lockfile
then.
That command needs at least ~/.cargo/registry/index
to be up to date, so it too would benefit from a cache restore. However, that could well be a separate smaller cache, because the primary key of the index state should be the head commit in the rust-lang/crates.io-index
GitHub repo, not the Cargo.lock
of any particular project.
Also, general knowledge that I've absorbed is that .cargo/registry isn't worth caching because 1) it costs a "similar" amount to download that file from the cache or to download it via cargo,
A full restoration including the repository index is long and getting longer as the index is growing. Another motivation for caching is to avoid unnecessarily taxing the crates.io infrastructure.
2) cargo will update an outdated registry anyway for most commands unless you use
--offline
.
With a close enough state of the registry restored from the cache, there is not much to synchronize. A job in the workflow can perform cargo fetch
or cargo update
and store the perfect cache state for other jobs to pick up, which can then run cargo with --frozen --offline
.
- Don't check in
Cargo.lock
, and always run CI against the latest versions of depedencies. Runcargo generate-lockfile
first, then use the generated lockfile's hash as part of the cache key, restoring from the head's cache, but only persisting the new or reused partial compilation artifacts, so there's no unused artifacts bloating the cache.
One current problem with this is the lack of an easy way to prune the outdated dependencies. There is a cargo-cache
command where this could be implemented: https://github.com/matthiaskrgr/cargo-cache/issues/76.
Personally, I think checking in lockfiles is the better plan, even for libraries, because it makes CI builds more deterministic, and less likely to break because of the outside world.
It may be good for reproducible CI builds, but for regular developer working copies of library code a version-controlled Cargo.lock
just gets in the way. There is no single golden state of the build for a library; everyone may want to build/test it locally against their choice of dependency versions, and keep this preference locked in the Cargo.lock
while they are working on it.
Checking the baseline lockfile into a CI-managed branch away from the branch(es) that get checked out by developers could be a way to get the best of both worlds.
For reference, here's a workflow using two stages of caching. First, a job runs cargo generate-lockfile
and updates the crate cache keyed by the hash of Cargo.lock
. Then a matrix of test jobs uses the cached dependency sources on all three supported OSes. The cargo registry filesystem, git checkouts and crate sources archived on Linux seem to work fine when restored on Mac or Windows.
Some follow-up on my earlier comments:
A full restoration including the repository index is long and getting longer as the index is growing.
cargo generate-lockfile
making a full clone of the crates.io
index repository takes about 5 s in this workflow, which is comparable to the time it takes to store it in the cache. Restoration time is about 2 s.
Another motivation for caching is to avoid unnecessarily taxing the crates.io infrastructure.
The crates.io
index is hosted on GitHub, so access to it from GitHub CI should be fast and cheap (free?) and may use any backplane optimization magic invented by GitHub. The CDN hosting the crates is currently on AWS, though.
More observations on windows-latest
:
~/.cargo
directory with a weeks old clone of the repository index and some cached crate sources is present in the vanilla build environment (https://github.com/actions/virtual-environments/issues/895).cargo fetch
on that state takes more than a minute to pull the index repository.~/.cargo/registry/cache
into ~/.cargo/registry/src
can take quite some time. This can be measured by removing the src
directory and running cargo fetch --locked --offline
.Looks like the npm package for making use of the cache actions from other actions has released: https://github.blog/changelog/2020-05-27-github-actions-v2-cache-actions/.
š Reaching out here, since I have created https://github.com/Swatinem/rust-cache and we have been successfully test-driving it for a few sentry projects.
https://github.com/Swatinem/rust-cache/issues/3 was asking about maybe moving it into actions-rs
.
So far its very opinionated with regards to how I think caching should ideally work, but Iām open to ideas.
For tools specifically, https://github.com/actions/toolkit/tree/main/packages/tool-cache exists now and seems useful.
There are two things that can be improved with caching added:
target/
and Cargo folders (see an example)cargo install {pkgid}
commandRight now this affects the following workflows:
actions-rs/cargo
withuse-cross: true
input enabled (cross
is compiled on each call)This issue can be solved by a proper implementation of the
actions-rs/core/Cargo.installCached
function, see https://github.com/actions-rs/core/issues/31, but it is blocked by https://github.com/actions/cache/issues/55Right now this issue acts only as a tracking issue for a possibility of proper caching implementation.