actions / cache

Cache dependencies and build outputs in GitHub Actions
MIT License
4.39k stars 1.17k forks source link

Cache saving is best-effort, restoring cache is not #1169

Open KFearsoff opened 1 year ago

KFearsoff commented 1 year ago

I stumled upon this bug while using self-hosted runners.

Background

Cache version is derived with the compression tool in mind. So an identical file can be compressed with zstd or gzip, and that would yield two different cache entries (despite the file being identical!). This is expected and well-documented, but:

When you save the cache, on linux-x86_64, the action optimistically chooses zstd as the compression method. If it can't find zstd on the host, it downgrades to gzip. If it can't find gzip too, it fails to save the cache.

When you restore the cache, on linux-x86_64, the action optimistically expects zstd as the compression method. If it can't find zstd on the host, it downgrades to gzip. If it can't find gzip too, it fails to restore the cache.

The issue

Let's say we have two runners: Saver and Restorer. Saver only has gzip, so it caches the file with gzip. Restorer has both zstd and gzip, BUT it fails to restore the cache, because, seeing that zstd is available, it expects the cache to also be zstd-compressed. Seeing that there is no zstd-compressed cache (because Saver doesn't have zstd), it doesn't even try to check if there's gzip-compressed cache and says there is no cache at all.

Seeing that not all Linux distributions provide zstd out of the box (especially older ones), this little caveat can take a lot of manhours to debug: it really is not trivial to trace.

Solutions

github-actions[bot] commented 8 months ago

This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.

KFearsoff commented 8 months ago

This is still the case.

ruffsl commented 3 months ago

Hey @KFearsoff , just wanted to say thanks for doing the leg work for this and writing up such an informative bug report. I remember reading this last year, back when I was skimming through tickets prior to planning our teams' migration to GitHub Actions, and recall thinking:

Wow, that's bizarre behavior! Also sound's like something that would have driven me nuts to debug.

Well, just yesterday I found I couldn't restore caches between jobs using container and non-container runners. The fact that I could see the paths and keys matching superficially, from the actions cache web-UI on github, made it all seem super inconsistent until I recalled your ticket, and while my minimal container runner included tar and gzip, it probably didn't have zstd installed like the default runs-on: ubuntu-latest does. And wallah, one little apt install later via a Dockerfile and all was well:

RUN apt-get install -y zstd

so it will only fail if gzip is not installed; this will also provide a very nice and clear error message for debugging

Yes! Better error handling, transparency, and user feedback on why caches fail to restore would be so much appreciated.

Related:

t3chguy commented 2 months ago

Thanks for this write-up, I just hit the same issue and the error was useless. Following @ruffsl's workaround resolved it for me.