Closed grzleadams closed 1 year ago
Some additional info that might be useful (since there are other issues that are sort-of like this, but with different circumstances):
So, it looks like retries are built in with SystemErrorRetryPolicy. Is the maximum number of retries configurable? In restore/index.js it looks like it's hard-coded to 3.
Closing this because it's not caused by the Action.
For posterity: we had an MTU mismatch, and applying the changes described here seems to have resolved the problem.
We regularly see the following error when either saving or restoring a cache using on-prem Runners (deployed in Kubernetes via ARC):
Checking the UI, it's clear the cache exists with the key we're using, and sometimes the save succeeds while the restore fails, or vice versa, with no really obvious pattern as to what will happen on any given run. We've ruled out network/firewall issues on our end and these failures don't seem to necessarily correspond with any reported GitHub service outages so I'm kind of at a loss as to how to resolve this.
It seems like other actions (such as actions/upload-artifact) have dealt with this by implementing retries with exponential backoff. Is there any kind of retry mechanism built into the save/restore functions to deal with transient failures like this in this action?