Closed cmcmA20 closed 1 year ago
@cmcmA20 : Thanks for the PR (with the lucky number 1000!).
I looked at the code and it seems like you want to cache 3 things:
1+2 share the same set of paths (~/.ghcup
and ~/.cabal
), 3 uses a different path (_build
).
Since 3 was already separate from 1+2, I only comment on the first 2 caches.
The key structure in terms of program versions you suggest is (leaving out os
because it is fixed to Linux
):
Why do you separate steps 1 + 2? You want to save the GHC installation in terms Agda fails to build? Shouldn't you assume that all your external dependencies (GHC/Cabal/Agda) build fine and you only worry about building the cubical library in the CI?
Note that the cache use of your new workflow is huge: short of 2.5 GB. With the total cache limit of 10GB, you could only accommodate master and 3 PRs. Anything beyond this would bump caches and make new commits to PRs much slower. https://github.com/cmcmA20/cubical-mini/actions/caches
cabal update
shouldn't do much more than that. And these are fetched from external locations where you do not have to pay with cache space.~/.ghcup
) fix it?Your original problem seemed to have been this one:
Old action failing to populate caches on unsuccessful library tests was quite infuriating.
I agree this problem needs addressing.
Maybe it can be fixed more conservatively by just adding a actions/cache/save
step for the Agda installation before trying to build the cubical library?
My suggestion would be to make a new PR with minimal changes to the current CI that achieve your goal (caching Agda before trying building the cubical library) and see if this already does the job (the bumps to GHC and Cabal versions should also be included, of course).
Shouldn't you assume that all your external dependencies (GHC/Cabal/Agda) build fine and you only worry about building the cubical library in the CI?
You're absolutely right! Regarding cache size, I was not aware of such a strict quota (10GB), my fault.
But caching GHC/Cabal was already part of the original CI. Was this working correctly or did your changes (like also caching ~/.ghcup) fix it?
If you set a different cabal/ghc version (e.g. 3.6.2.0 + 9.2.5), you can observe non-idempotence of current workflow, it works correctly only if there was a cache miss. ghcup is the culprit, it fails to find its' config (because it's not cached) and defaults to a wrong ghc version. Seems like it's using cabal version to automatically infer ghc version.
My suggestion would be to make a new PR with minimal changes to the current CI that achieve your goal (caching Agda before trying building the cubical library) and see if this already does the job (the bumps to GHC and Cabal versions should also be included, of course).
Thank you for guidance, Andreas, will ping you when it's done!
If you set a different cabal/ghc version (e.g. 3.6.2.0 + 9.2.5), you can observe non-idempotence of current workflow, it works correctly only if there was a cache miss. ghcup is the culprit, it fails to find its' config (because it's not cached) and defaults to a wrong ghc version. Seems like it's using cabal version to automatically infer ghc version.
I was suspecting something like that. Thanks for fixing this!
@andreasabel : I've managed to shrink the cache, it's 1.2GB now. Notice that the actual library cache sizes are ~120MB (for agda/cubical) and base branch caches are shared with child branches, so you can have (10240 - 1218) MB / 120 MB = ~75 different branches cached (including master). Cache usage is optimal now and the action is idempotent for any cache state. Please let me know whether GHPages deploys correctly.
GHCup config was not An Impostor. runhaskell
is used to generate Everything.agda
files so we actually need to have it in the cache. Binary is linked dynamically so all the deps are in for a ride too, and there are 37 of them. Singling them out for caching could save about 350 MB (+3 PRs) and bloat the action config file. Guess I'll save the candle for another game.
Thank you! One last fix though: cache
-> cache/restore
in library section. I see it works fine but better be safe than sorry. Re-running the action should take 2 minutes.
could save about 350 MB
I was wrong, you can potentially trim the external deps cache down to ~300MB but the config would still become a mess.
About the further procedure: @ecavallo @felixwellen @mortberg : Who is in charge of CI here? If you feel in charge, please assign yourself to this PR and handle the merge. I'd say this PR is ready, but I don't want to overstep the owners... @cmcmA20 : If you do not think this PR is ready, please communicate this (e.g. by stating it here or converting the PR to a draft).
@andreasabel I think this is only marginally better than a previous one, so let's wait for maintainers to say their word about possible directions regarding CI architecture. Until then, I'll experiment with Wen's setup/agda.
Hi, I was also independently working on a similar fix on my end, so let me share my thoughts:
AGDA_BRANCH
ref moving around should be an immediate problem (and it's not a regression with current behavior so it shouldn't be a blocker). But, if cubical wants to test against master at some point, the agda-setup action won't help either (from what I understand). This could be done in a future PR.Regarding the documentation not building, you'd need to set "Read and write permissions" for workflows under Settings → Actions. One other solution that would work out of the box would be to add the write permission to the action in the yaml file, but I'm not a fan of this either since it grants the rights to the whole job. Some refactoring into multiple jobs could probably be done as well, or use the new Beta for deploying pages directly via actions.
My commits were a bit more conservative, see https://github.com/agda/cubical/compare/master...jpoiret:cubical:ci-clean-up The goal was to just fix the failing CI builds for now, the rest could come later.
@jpoiret: Hi! Big thanks for the doc tip.
There's no reason we should cache GHC and Cabal, they can be installed quite fast
You'r right, tests show that the difference is negligible so it's a total waste of cache space. However, the CI flow can be decomposed into these jobs: (ghc and cabal) <+> (Agda <|> fix-whitespace) <+> test library <+> docs
. And then the third stage nontrivially depending on the binaries from the first one really bites. What would be a good way to handle this?
But, if cubical wants to test against master at some point, the agda-setup action won't help either
I hadn't much time to test this but skimming through the README gave me the same impression. Anyway I've just implemented this feature.
If you are thinking about more revolutionary changes of this CI, please also consider using binary installs of fix-whitespace
(rather than building from source). Here is an action I wrote to this end: https://github.com/andreasabel/fix-whitespace-action
@cmcmA20 wrote:
Documentation deploy doesn't work as intended
According to my own experiments, deployment only works if the CI is run on master
, because one usually does not want update documentation while WIP.
NB: Yesterday I set up a web page deployment using the new fashion (in beta stage), using actions/upload-pages-artifact@v1
and actions/deploy-pages@v2
: https://github.com/teach-plt/www/blob/79b4094c422fd8d89c0bcdb83304d55947cfdf46/.github/workflows/deploy.yml#L75-L105
This requires to change Settings/Pages to "using Github Actions" in the repo, though.
@cmcmA20
@andreasabel I think this is only marginally better than a previous one, so let's wait for maintainers to say their word about possible directions regarding CI architecture.
I thought the cache improvement (status as of 3 days ago) would be an incremental improvement worth merging, until a more thorough overhaul is in place. (Atm, though, the workflow is failing on octokit/request-action
.)
@andreasabel
workflow is failing
It's exactly what is says on the tin, "v.2.6.3" is not a valid branch name, though it is a valid tag name, sorry for overlooking that, I've pushed a fix. This raises a question: do we want to leave it as it is (the original action config was actually misleading about the nature of these things) or do we want to support both branches and tags?
For now I consider this change ready for merge but your fix-whitspace and doc deploy actions would be a great addition, so all the CI tweaks are grouped under the same PR. Sorry for dragging this, I was under the impression that doing more than was originally planned is fine. If you have specific preferences about PR decomposition, please let me know, so we can split it into a few fine-grained ones.
It's exactly what is says on the tin, "v.2.6.3" is not a valid branch name, though it is a valid tag name, sorry for overlooking that, I've pushed a fix. This raises a question: do we want to leave it as it is (the original action config was actually misleading about the nature of these things) or do we want to support both branches and tags?
I think so. Tags are fixed targets, branches are movable targets. There is no guarantee that release-2.6.3
will forever match v2.6.3
. In fact it even does not so atm. (And they might deviate further in the future if we push to release-2.6.3
e.g. to prepare 2.6.3.1, who knows?)
For now I consider this change ready for merge but your fix-whitspace and doc deploy actions would be a great addition, so all the CI tweaks are grouped under the same PR. Sorry for dragging this, I was under the impression that doing more than was originally planned is fine. If you have specific preferences about PR decomposition, please let me know, so we can split it into a few fine-grained ones.
Well, in terms of PR strategy, I think one should separate concerns.
AGDA_BRANCH
to a SHA rather than using it verbally. You started to work on this (which is great), but now we are again in the middle of something and cannot merge. Also, this is somewhat orthogonal to the cache improvement, it is more about getting the right version of Agda, which is another improvement.fix-whitespace
etc.All these are separate concerns and should go into separate commits, so then in case on of these points actually wasn't a strict improvement, it can be reverted easily without affecting the others.
If you agree, then let's work like this in small steps, each of which solves one topic and can be reviewed easily. (Sorry, I didn't communicate this clearly before.)
Thank you for the explanation! I've reverted the changes to the first stage where we got the caching right, so we can already reap the benefits. Everything else will go into a new folow-up PR.
Current state of affairs:
AGDA_BRANCH
doesn't really track the branchAlso rebased on the current master
Thanks for this journey, where we all learned new things.
Old action failing to populate caches on unsuccessful library tests was quite infuriating. Example usage: https://github.com/cmcmA20/cubical-mini/actions/runs/4527022375 Documentation deploy doesn't work as intended but I'm not sure if I haven't botched setting it up in my repo.
@andreasabel FYI