Open me-and opened 1 year ago
Thanks! This is awesome!
I vagely recall trying to do this initially but running into some problem, so thanks for adding this!
To speed up Cygwin installations, allow users to store the downloaded packages using GitHub's caching, rather than needing the packages to be downloaded from the Cygwin mirrors on every run.
Some numbers showing that restoring the cache is actually typically faster than fetching the packages again would help give me confidence that this is a good change.
This speeds up the installation when there's a good cache hit, at the expense of adding some additional complexity, and using the limited cache space (although with 10GB assigned to every repository on GitHub, I can't imagine most folk are getting close to the limit!).
Unfortunately, setup allows it's package cache to grow without bound. Something which has needed fixing for literal decades...
I don't think that's blocking for this PR, since, yes, the likelhood of hitting the cache limit is small, but something you should be aware of.
To avoid unnecessary cache churn, the new steps check if the cache has actually meaningfully changed before storing a new cache. This uses b2sum.
Presumably there's some analysis been done that all the commands used are safe because they are in 'base' installation, but a comment to that effect might be worthwhile.
Thanks! This is awesome!
I vagely recall trying to do this initially but running into some problem, so thanks for adding this!
np! This is about my fifth attempt at getting it working without getting stuck and giving up in disgust…
To speed up Cygwin installations, allow users to store the downloaded packages using GitHub's caching, rather than needing the packages to be downloaded from the Cygwin mirrors on every run.
Some numbers showing that restoring the cache is actually typically faster than fetching the packages again would help give me confidence that this is a good change.
Yep, very fair. I have some rough anecdata that says it helps, but I've not really stress tested it to prove it makes a difference for installations that are big enough for it to be substantial. I should be able to have a go at that shortly :)
This speeds up the installation when there's a good cache hit, at the expense of adding some additional complexity, and using the limited cache space (although with 10GB assigned to every repository on GitHub, I can't imagine most folk are getting close to the limit!).
Unfortunately, setup allows it's package cache to grow without bound. Something which has needed fixing for literal decades...
I don't think that's blocking for this PR, since, yes, the likelhood of hitting the cache limit is small, but something you should be aware of.
I had been thinking about that. GitHub evicts caches that aren't used for 7+ days, but if a repo were getting rebuilds more regularly than that, the cache would just keep getting larger and larger. I think that's a good reason for it to not be default behaviour for this action (yet), and I should probably note it explicitly in the documentation, but I think it mostly means that anyone using this action might need to manually flush the cache every so often, to force a new smaller cache to be generated.
To avoid unnecessary cache churn, the new steps check if the cache has actually meaningfully changed before storing a new cache. This uses b2sum.
Presumably there's some analysis been done that all the commands used are safe because they are in 'base' installation, but a comment to that effect might be worthwhile.
They're all provided as part of the Git Bash / Git for Windows installation on the GitHub Windows runners. I've not actually checked how this works for self-hosted runners, and I could believe it's not guaranteed to work there. I'll add some notes to that effect…
Presumably there's some analysis been done that all the commands used are safe because they are in 'base' installation, but a comment to that effect might be worthwhile.
They're all provided as part of the Git Bash / Git for Windows installation on the GitHub Windows runners. I've not actually checked how this works for self-hosted runners, and I could believe it's not guaranteed to work there. I'll add some notes to that effect…
Aha! I hadn't noticed that all this happens before we modify that PATH, so it'll never be using cygwin tools to do these things, it's assuming they are already provided by something in the VM image.
On the performance topic: initial numbers seem to show using a cache saves very roughly 4 minutes off a Cygwin installation that would otherwise take about 10 minutes in total. Restoring a cache of a few hundred MB seems to take only a few seconds, which clearly beats the download time from the Cygwin mirrors.
Currently the hold-up is the reason I raised #5: if Cygwin is in the PATH, the caching code will still try to use the Git for Windows tar
, but will fail. Using the Git for Windows tar
is hardcoded in the GitHub cache code. Removing Cygwin from PATH just for the caching steps doesn't seem to work, I don't think "you can't use caching if you add Cygwin to PATH" is a reasonable limitation, so I'm currently trying to find other ways to get the caching to work…
Currently the hold-up is the reason I raised #5: if Cygwin is in the PATH, the caching code will still try to use the Git for Windows
tar
, but will fail. Using the Git for Windowstar
is hardcoded in the GitHub cache code. Removing Cygwin from PATH just for the caching steps doesn't seem to work, I don't think "you can't use caching if you add Cygwin to PATH" is a reasonable limitation, so I'm currently trying to find other ways to get the caching to work…
I could live with "if you want to use this action multiple time in the same workflow, you must use package-cache: disabled or add-path: false"?
I'm hoping it might be something that can get fixed fairly soon; it looks like the problem was added relatively recently, in https://github.com/actions/cache/commit/9b0be58822, and based on actions/toolkit#1311 I'm not the only person to be seeing similar problems.
Another option – which I've not tested properly but is next on my idea list – is to try just using the parent of that commit in actions/cache. I created a simple test case to demonstrate the issue at https://github.com/me-and/repro-cygwin-cache-woe, and it looks like the parent commit might let the caching work as currently designed, but without this issue.
Otherwise, I agree: just documenting it as a limitation is probably going to be the best option for now!
To speed up Cygwin installations, allow users to store the downloaded packages using GitHub's caching, rather than needing the packages to be downloaded from the Cygwin mirrors on every run.
This speeds up the installation when there's a good cache hit, at the expense of adding some additional complexity, and using the limited cache space (although with 10GB assigned to every repository on GitHub, I can't imagine most folk are getting close to the limit!).
To avoid unnecessary cache churn, the new steps check if the cache has actually meaningfully changed before storing a new cache. This uses b2sum for speed, since we're just checking for unexpected changes in the cache, and we don't need cryptographic security.
I was originally working on this for my own builds, e.g. at https://github.com/me-and/Cygwin-inih/blob/fb7534276feeb2a5217df5013c679a370ee6013e/.github/workflows/ci.yml, but it occurred to me that this might be more useful here than in my own repos.