Open andreasabel opened 2 years ago
I do not know that it is blessed, but v3 of actions/cache
provides an example for Haskell using Stack here. That uses hashes of the stack.yaml
and the package.yaml
(it assumes the use of Hpack) in the keys. This repository, like many others, ignores stack.yaml.lock
(there is a discussion here: https://github.com/commercialhaskell/stack/issues/4795).
However that example - and the CI currently used in this repository - seem to me to assume that operating system is Unix-like. The default STACK_ROOT
is System.Directory.getAppUserDataDirectory stackProgName
(see Stack.Config.determineStackRootAndOwnership
). On Unix-like OSs, that is ~/.stack
. On Windows, that is %APPDATA%/stack
(usually C:\Users\<user>\AppData\Roaming\stack
).
On Unix-like operating systems, Stack stores GHC and other tools in a programs
directory in the STACK_ROOT
. On Windows, Stack stores those tools and MSYS2 in %LOCALAPPDATA%\Programs\stack
(usually C:\Users\<user>\AppData\Local\Programs\stack
).
I will raise a separate issue for the lack of Windows caching in the current CI on this repository.
Thanks for the pointer!
v3 of
actions/cache
provides an example for Haskell using Stack here.
I think this example is insufficient, because it does not show how to correctly place these cache action into a bigger workflow. E.g. how do these interact with stack update
?
I am particularly interested in a blessed scheme that would avoid CI breakages coming from upstream (stack
, virtual environment...) as experienced in:
Can you elaborate on the stack update
dimension? My understanding is that stack update
will change the contents of <STACK_ROOT>/pantry
(even if it is only to change <STACK_ROOT>/pantry/hackage/timestamp.json
). However, changing the package index has no effect on the behaviour of Stack: see this FAQ.
Can you elaborate on the
stack update
dimension? My understanding is thatstack update
will change the contents of<STACK_ROOT>/pantry
(even if it is only to change<STACK_ROOT>/pantry/hackage/timestamp.json
).
Ok, but then restoring a cached <STACK_ROOT>
after stack update
(like in the OP) might be slightly problematic because it undoes the effect of stack update
.
However, changing the package index has no effect on the behaviour of Stack:
This cannot be meant literally, otherwise there was no purpose in running stack update
.
As I understand it, there is no substantive purpose in running stack update
, see this FAQ. That is, if stack
needs something that is not in the package index, it automatically updates the index and then tries again.
if
stack
needs something that is not in the package index, it automatically updates the index and then tries again.
Ah, this is very interesting to know. Thanks for the pointer!
Excellent question @andreasabel. I've done quite a few CI/CD pipelines for haskell, on CircleCI, BitBucket Pipelines, GitHub Actions... even Jenkins. I do have comments on this topic.
v3 of actions/cache provides an example for Haskell using Stack here.
I think this example is insufficient, [...]
Agreed :100: — these samples are "starters" at best.
Let's just... let me just criticize the first sample lines we're shown at the link — I can see 4 issues here right away:
- uses: actions/cache@v3
name: Cache ~/.stack
with:
path: ~/.stack
key: ${{ runner.os }}-stack-global-${{ hashFiles('stack.yaml') }}-${{ hashFiles('package.yaml') }}
restore-keys: |
${{ runner.os }}-stack-global-
No, you don't want to cache the entire ~/.stack
. Somewhat famously (#133) stack doesn't even try to cleanup unused stuff from there; it doesn't even remove the .tar.xz archives of GHC post-unpacking. "Moved to wishlist" the issue says.
Caching ~/.stack
can cause super-weird issues with stale config.yaml
. I had that.
Caching ~/.stack/pantry
should be done, but with a different cache-invalidation-key than both ~/.stack/snapshots
and ~/.stack/programs
. Despite it being immutable and rebuilding-on-demand — rebuild of the Pantry index takes quite a long while; so instead of burning pipeline time & CPU credits, I usually make a dedicated cache specifically for ~/.stack/pantry
.
Caching ~/.stack/programs
should be done (if you install GHC using Stack), but again with yet different cache invalidation key. See below.
hashFiles('stack.yaml')
— no; this is never correct. Use hashFiles('stack.yaml.lock')
instead.
Why would you want to invalidate any cache by insignificant changes in stack.yaml? Whitespace changes, comments, package list regrouping or reordering — does not invalidate any of pre-compiled artifacts. Stack will happily reuse those, if you allow it to, saving pipeline time & CPU credits. hashFiles('stack.yaml')
has no place in cache invalidation key string.
Specifically for the cache of compiled dependency packages (see below) — hash of the lockfile is invalidation trigger of the correct granularity. I want it to change exactly whenever the dependency forest changes. hashFiles('stack.yaml.lock')
does exactly that.
Non-insignificant modifications in project's stack.yaml
(adding non-Stackage deps, switching forks, updating the resolver snapshot, etc) — will also generate changes in stack.yaml.lock
. Almost always in this scenario, you want to restore a previous already-invalidated cache copy, because often, the change in dependency forest will be small; you'll get the using precompiled package
for most of deps instead of full rebuild. Partial reuse of invalid/outdated cache is very much a thing — that's why actions/cache
has that restore-keys
option.
hashFiles('package.yaml')
— again no, absolutely not, this is completely incorrect here. ~/.stack
has very little to do with your project's package.yaml (which is a cabal-file in disguise).
Say you're compiling package acme-app
which depends on package text
(within Stackage snapshot) and package acme-missiles
(not in Stackage, but on Hackage). The acme-app's package.yaml
will declare it needs these deps, perhaps with version bounds. But it's the stack.yaml (with its lockfile) that will define which specific source-code will fulfill those deps. E.g. for text
, it will pick the version implied by resolver
snapshot; for acme-missiles
a developer will be forced to specify an extra-deps
entry... in stack.yaml.
Now. Compiled modules/bins/tests of acme-app will all go under its local .stack-work
. Compiled dependencies will go under the global ~/.stack/snapshots
. Assuming we already properly cache ~/.stack/snapshots
by the lockfile, what are we buying with hashFiles('package.yaml')
? The answer is nothing. Well, except for unnecessary full-rebuilds caused by gratuitous cache invalidations caused by factoring in the package.yaml. That file doesn't matter for validity of ~/.stack
.
There's a particularly gnarly type of issue in caching CI pipelines, once you start optimizing them. Cache bloat.
Variations and imperfections in the setup — e.g. caching too much, not invalidating correctly, re-caching anew no-longer necessary parts of outdated cache — will sometimes cause issues very difficult to pinpoint. There won't be related "recent changes" in git. For all you know, the pipeline worked "perfectly" just last year — but gradually, developers have become increasingly stern in their complaints of slow CI. You go check — and voila, the store-restore steps dominate pipeline duration, dwarfing actual compile... because there're tens of gigs in the caches.
Trust me, debugging these isn't impossible. But hell it is tedious. Will easily consume days of work.
On that background — I tend to always include a manual override style of control in my pipelines, on the top-level of most cache-invalidation-key structures. Examples below.
This is your "flush the cache now" button. Remember, CI runs in cloud... Some day, you or your successor will love having it.
runner.os
?One more: I've never used runner.os
as cache-invalidation-key component. (1 websearch later) Seems unnecessary; on GHA you have to opt-in to sharing caches across runners OS, they aren't shared by default.
id
on the stepYou'll practically always have steps that "warm up" / rebuild an invalid (outdated) or missing cache. Unless these steps are idempotent and fast, you'll often want something like this on them:
if: steps.ghcup.outputs.cache-hit != 'true'
For this to work though, the preceding actions/cache
step must say id: ghcup
or similar.
Opinions will differ, and I'm not trying to change anyone's mind or win prizes; just sharing experience. Hopefully this is illuminating or at least helpful.
Here's my recipe of optimal Haskell CI-pipeline caching, in GH Actions snippets.
For big projects, I'll create four caches:
~/.stack/pantry
;~/.stack/snapshots
;.o
's and .hi
's under .stack-work
.Smaller projects (~minute of .stack-work
-only rebuild) may run well without the last one.
Near the top of workflow-yaml file, I'll set up my manual overrides, see above:
env:
#-- increment this if you think cache of GHC installation needs cold rebuild
MANUAL_CACHE_RESET_COMPILER: v0
#-- increment this if you think cache of .stack-work needs cold rebuild
MANUAL_CACHE_RESET_PRODUCTS: v0
#-- increment this to force-rebuild the cache of dependency packages
MANUAL_CACHE_RESET_TESTDEPS: v0
#-- should never be needed, as stackage snapshots are immutable
# MANUAL_CACHE_RESET_SNAPSHOT: v0
In this case, the pipeline will compile & run tests — so I'll be building with --test
— hence the …_TESTDEPS
.
There're many ways to ~skin the cat~ install GHC :grin: E.g. with stack setup
:
- name: Cache GHC installation
uses: actions/cache@v3
id: ghc
env:
MANUAL_RESET: ${{ env.MANUAL_CACHE_RESET_COMPILER }}
with:
path: ~/.stack/programs/*/ghc-*
key: CI-ghc-${{ env.MANUAL_RESET }}--${{ env.STACK_LTS }}
- name: Install GHC using Stack
if: steps.ghc.outputs.cache-hit != 'true'
run: stack setup --install-ghc
With ghcup:
- name: Cache GHC installation
uses: actions/cache@v3
id: ghcup
env:
MANUAL_RESET: ${{ env.MANUAL_CACHE_RESET_COMPILER }}
with:
path: |
~/.ghcup/bin/*
~/.ghcup/cache/*
~/.ghcup/config.yaml
~/.ghcup/ghc/${{ env.GHC_VERSION }}
key: CI-ghcup-${{ env.MANUAL_RESET }}--${{ env.GHC_VERSION }}
- uses: haskell/actions/setup@v2
if: steps.ghcup.outputs.cache-hit != 'true'
with:
ghc-version: ${{ env.GHC_VERSION }}
enable-stack: true
stack-version: "latest"
It might be difficult to specify fixed GHC_VERSION
in advance — STACK_LTS
may substitute it in the cache key
, and is easy to grab from stack.yaml resolver
.
Pantry is perhaps the easiest:
- name: Cache Pantry (Stackage package index)
id: pantry
uses: actions/cache@v3
with:
path: ~/.stack/pantry
key: CI-pantry-${{ env.STACK_LTS }}
- name: Recompute Stackage package index
if: steps.pantry.outputs.cache-hit != 'true'
run: stack update # populates ~/.stack/pantry
It's immutable; nothing really invalidates it but time. We'll hook into our acme-app
updating its resolver
tag for invalidating/rebuilding/recaching the index; that'd be the exact moment we'll want to "pull" updates there. Hence the key.
I've never happened to need a cache_reset bust on this one.
- name: Cache Haskell dependencies
uses: actions/cache@v3
env:
MANUAL_RESET: ${{ env.MANUAL_CACHE_RESET_TESTDEPS }}
with:
#-- NOTE no, shouldn't cache the entire ~/.stack -- that'd be bad. just these 2:
path: |
~/.stack/stack.sqlite3
~/.stack/snapshots
#-- NOTE the caching key structure:
#-- * fixed ID string -- to indicate scope & purpose, descriptive;
#-- * manual reset -- on top level, stupid simple manual override;
#-- * resolver version -- helps maintain sleek size of the cache;
#-- * lockfile hashsum -- as invalidation trigger of the correct granularity.
#-- Since this cache only stores built *dependency packages* (not project code!),
#-- we should invalidate/reupload it on each change to the dependency forest (≈lockfile).
#--
#-- All this decides when cache gets REBUILT (invalidated & recreated):
key: CI-testdeps-${{ env.MANUAL_RESET }}--${{ env.STACK_LTS }}--${{ hashFiles('stack.yaml.lock') }}
#-- All this adds fallbacks to UNPACK stale cache copies, prefix-matched:
restore-keys: |
CI-testdeps-${{ env.MANUAL_RESET }}--${{ env.STACK_LTS }}--
CI-testdeps-${{ env.MANUAL_RESET }}--
The "warming up" step for this one is conceptually stack build ... --only-dependencies
conditional on the cache-hit
— but thanks to correct cache-invalidation-key plus Stack's consistency with reproducible builds, there's no need to have that explicitly. It works well as is, across years of project/Stackage/GHC upgrades.
- name: Cache per-branch Haskell project buildstate
uses: actions/cache@v3
env:
MANUAL_RESET: ${{ env.MANUAL_CACHE_RESET_PRODUCTS }}
with:
path: .stack-work
key: CI-builddir-${{ env.MANUAL_RESET }}--${{ env.GHC_VERSION }}
As mentioned above, this is optional. For smallish packages (~tens of modules) may not give any benefit, once the dependencies & compiler have been handled properly.
I didn't find a spot-on computation to nail the cache-invalidation-key for this one. The path structures under .stack-work
won't let GHC reuse .hi
's written by other versions of itself; thus at the very least GHC_VERSION
should factor in to the key, to avoid gradual cache bloat over your project going through GHC version upgrades. Whether that's also the "upper bound" (and therefore the exact solution) — I don't know yet.
It works very well in practice though. HTH
Thanks for this detailed description, @ulidtko ! I am trying to put this into practice now.
Let me raise some doubts about the key for the dependencies (snapshots):
key: CI-testdeps-${{ env.MANUAL_RESET }}--${{ env.STACK_LTS }}--${{ hashFiles('stack.yaml.lock') }}
This key only accounts for change in the resolver and other changes in stack.yaml
(e.g. added extra-deps
).
However, if my code requires a new dependency (added in the .cabal
or package.yaml
file), this will not be reflected in a change of the stack.yaml.lock
file. The latter only adds SHAs to the resolver and extra-deps, but does not specify the build plan.
Consequently, the new dependency would be built but not staved to $STACK_ROOT/snapshots
because the key has not changed and no new cache is saved.
This means the cache rots, "accumulating" missing packages. This ultimately can degrade the build times, as dependencies will always have to be rebuilt.
I think this key should have another component in the end that hashes the build plan.
I found that the output of stack build --test --dry-run
contains the build plan listing all the dependencies (and their version, but the version is anyway fixed by the resolver and extra-deps). However, it is not complete either. E.g. if I specify a flag for a dependency in the stack.yaml
file, it is neither represented in the output of dry-run
nor in stack-yaml.lock
.
So maybe taking the .cabal
file and the stack.yaml
file into key, contrary to your advise, it at least sound in the sense that different plans will have different keys. It might not be perfect, e.g. if someone adds a comment to the .cabal
file, the key changes while the plan stays the same. But this is maybe not happening frequently, and the harm is little (a redundant cache save).
Hey @andreasabel, glad to get feedback. Yup, I see, good point!
Also I definitely remember seeing this happening in practice too. Good to finally realize why :sweat_smile:
Appending cabal-file hash to the key of deps-cache indeed is a way to "solve" this (feels conceptually wrong, but will work), and not without its own drawbacks... Same for the build-plan from stack build --test --dry-run
— it appears to be stateful, producing output which depends on what's already in ~/.stack/snapshots
.
I simply didn't find anything better than stack's lockfile as "the perfect" value to hook cache invalidation onto. Perhaps an SQL query against stack.sqlite3
can provide that?..
Minutia like this is the ultimate reason I always have that MANUAL_RESET
field. In the absence of a perfect-cut path, the caching policy must necessarily be either too optimistic, or too pessimistic. Too optimistic will exhibit rotting, but yield faster builds more often. Too pessimistic will waste CI time, but cache will be correct and "maintenance-free". In this tradeoff, MANUAL_RESET
allows to lean on "too optimistic" side, simultaneously reaping the "fast more often" benefit¹ — yet reducing the rotting aspect to a single-line "bump a number" commit push once every some months (if it gets bad enough in-between resolver updates).
¹ rebuilding just the handful missing deps is still much faster than full recompile of the typical ~hundreds of deps
Is there a blessed documentation how to do caching with
stack
builds on GitHub Actions? If not, could we have one?In particular, I wonder how to correctly cache stack builds from one run of the CI to another. The resources I queried recommend to restore the whole stack root (
.stack/
). I wonder whether this would overwrite parts that shouldn't be overwritten, in particular ifstack
was updated upstream in between. Note that forcabal
, only the subdirectory.cabal/store
is cached, not all of.cabal/
.For example, is the following workflow correct?
stack update
stack build --dry-run
, generating lock file.stack/
if stack version and lock file have not changed in comparison with last run of CIstack build
Context: