actions / cache

Cache dependencies and build outputs in GitHub Actions
MIT License
4.46k stars 1.19k forks source link

Improve debug messages / docs / error handling about duplicate cache keys #1100

Closed davidegreenwald closed 1 year ago

davidegreenwald commented 1 year ago

Hi all, I have a similar issue to this one about multiple copies of cache keys.

The restrictions on cache scoping for branches and PRs is well-documented, but how the cache action is going to implement this is not clear from the repo docs or the debug logs. It has made debugging issues and understanding what the normal behavior is confusing.

For example: a second branch with the same cache key will report Cache not found for input keys:. This is misleading, since this cache under this key exists, it is just not available for this branch. I suggest this message says something more like, Cache not found for input keys for this ref or default branch, which is the actual check that's being conducted.

Additionally, I had an error where a cache was unable to upload because a previous job had been cancelled in the middle of the cache upload, leaving an incomplete copy of the cache with the same key blocking future uploads. The text on this error was just: another job may be creating this cache—this was partially true, but there was no active job doing this, and it would've been helpful to have a more verbose message. Suggesting to manually delete the cache if no other jobs were active would be great to note here.

Finally, this another job error did not actually behave as an exit 1 in the workflow and the job continued as if this was a successful step. A separate cache validation check is required to check if there was an error here. For people using cache for build artifacts between jobs, which is what GitHub recommends, this creates a situation where this error is not just a harmless cache miss but can cause a broken deployment. This should error and end the CI workflow to prompt human intervention (such as manually deleting the corrupt cache) and prevent possible harm.

Thanks for such a helpful tool—I hope we can clear up some of these confusing bits.

lvpx commented 1 year ago

@davidegreenwald thank you for the valuable suggestions to improve the debug/info messages. Regarding another job may be creating this cache usually occurs when the same key cache is tried to be created from parallel running jobs in a matrix strategy. It would help if I could see the run that received this error to debug this further.

davidegreenwald commented 1 year ago

@pdotl Unfortunately the run is in a private repo. But I can tell you the issue came from the Post Cache step being manually cancelled, and not from parallel runs. So it appears what happened was the Post Cache step had begun uploading the cache under the key but didn't complete. This left a corrupt cache file in place occupying that key that became visible for upload purposes, but wasn't visible when the cache was initially checking for a key match.

lvpx commented 1 year ago

@davidegreenwald that helps. I'll try to reproduce this and investigate further.

smil2k commented 1 year ago

Can you also add an info level report that the key found, but not the right key.

This is the actual debug:

No matching cache found for cache key 'setup-java-Linux-maven-d37ab3c81c405b14edc923804f66de1fb978a181e3fac3d09caae691f725c092', version 'ac7b87d6b5c9f96107338c9c954925dce707a6dee9a64f7baaaf9e2f615ea597 and scope refs/heads/NLSS-1016-Authentication-for-backend. There exist one or more cache(s) with similar key but they have different version or scope

I would have been very happy to see it as a warn or something similar.

I feel that:

Cache not found for input keys: setup-java-Linux-maven-d37ab3c81c405b14edc923804f66de1fb978a181e3fac3d09caae691f725c092, setup-java-Linux-maven-

is a little thin, because on the GUI I can see that there is cache with this branch and number and still not used... ... because of the version

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 5 days since being marked as stale.