actions / cache

Cache dependencies and build outputs in GitHub Actions
MIT License
4.57k stars 1.21k forks source link

Clear cache #2

Closed Lauszus closed 2 years ago

Lauszus commented 5 years ago

Thanks for the action. It would be great if it was possible to clear the cache as well either using the action or via the web interface.

Be-ing commented 3 years ago

Edit: even if you don’t add something in the UI please atleast get it added to your official cli.

Yeah something like adding an invalidateCache input to the action might work.

burn2delete commented 3 years ago

We ran into an issue when using Github Packages, then deleted the package and re-deployed new packages (with the same versions, but different hash) - now our cache is corrupted with no "official" way to clear it.

Since we are using the node-setup action, I don't believe the options above will work for us, since we don't have direct access to the cache action keys or restore-keys. Our only option is to disable caching in node-setup until we pass the package versions that are corrupted in cache.

Be-ing commented 3 years ago

That is also why I stopped using the run-vcpkg action.

ahdbilal commented 3 years ago

Hello folks! We finally have started looking into it and will soon begin engineering on it. We’d greatly appreciate it if you can complete a 2-min survey to provide us your feedback on a few of the decisions variables! Thank you

sideeffffect commented 3 years ago

Are the radio buttons in the forms for the questions

working as intended?

ahdbilal commented 3 years ago

@sideeffffect the format is definitely not ideal. You just have to pick one response option per row and column.

Thanks to everyone that have completed the survey. Really appreciate it!

jvacek commented 2 years ago

Would be nice to have an option to auto-clear caches in case an action fails, so that the next run can be "from scratch"

Rashmi-278 commented 2 years ago

Would love to have this feature

JavierSegoviaCordoba commented 2 years ago

Is there an ETA about this? The problem is not only with this action itself, other actions can be using this action to manage cache with internal keys, so can be hard to trigger cleaning up the cache without knowing the keys and so on...

vsvipul commented 2 years ago

@JavierSegoviaCordoba Cache management experience is currently targeted for end of Q3 2022, and ability to view cache usage is targeted for end of Q1 2022. Details are updated at public roadmap for github here - https://github.com/orgs/github/projects/4247/views/1?filterQuery=cache

t2y commented 2 years ago

@vsvipul I tried to access your link, but I couldn't. Is that project private?

vsvipul commented 2 years ago

@t2y Yes thanks for pointing that out. Updated to the public one now - but that doesn't have management experience added yet. might be added later.

t2y commented 2 years ago

@vsvipul I can access. Thank you. I'm looking forward to using that feature. :smiley:

0x2b3bfa0 commented 2 years ago

GitHub will remove any cache entries that have not been accessed in over 7 days. There is no limit on the number of caches you can store, but the total size of all caches in a repository is limited to 10 GB. If you exceed this limit, GitHub will save your cache but will begin evicting caches until the total size is less than 10 GB.

Crystal clear, GitHub! :octocat: Inspired by https://github.com/actions/cache/issues/2#issuecomment-767721928

on: workflow_dispatch
jobs:
  flush:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/cache@v2
        with:
          path: /tmp/flush
          key: ${{ github.run_id }}-${{ github.run_attempt }}
      - run: dd if=/dev/random of=/tmp/flush bs=1M count=10000

Data is compressed with zstd so fallocate(1) won't do the trick.

adamant-pwn commented 2 years ago

Still nothing? :(

vsvipul commented 2 years ago

@adamant-pwn This is being targeted to be done by end of this quarter. You can track the progress at the public roadmap at https://github.com/github/roadmap/issues/502 .

vsvipul commented 2 years ago

Good News folks, we have shipped the delete caches by key and delete cache by id APIs which will help you to delete caches for a repository. This is in addition to the API which lets you list caches for a repository. Go ahead and try it out and let us know feedback if any in this issue itself. Here are the official docs- List - https://docs.github.com/en/rest/actions/cache#list-github-actions-caches-for-a-repository Delete by ID - https://docs.github.com/en/rest/actions/cache#delete-a-github-actions-cache-for-a-repository-using-a-cache-id Delete by Key - https://docs.github.com/en/rest/actions/cache#delete-github-actions-caches-for-a-repository-using-a-cache-key

P.S. This is currently not shipped to GHES and is expected to ship in GHES 3.7.

hovancik commented 2 years ago

If anyone creates action with this, please, share :)

Be-ing commented 2 years ago

Good news this was finally added to the GitHub API! Will it be added to the web UI too?

vsvipul commented 2 years ago

@Be-ing Eventually yes, this will be brought to UI as well, but we don't have an expected ship date for that. Keeping this issue open for that particular reason.

chenrui333 commented 2 years ago

Good News folks, we have shipped the delete caches by key and delete cache by id APIs which will help you to delete caches for a repository. This is in addition to the API which lets you list caches for a repository. Go ahead and try it out and let us know feedback if any in this issue itself. Here are the official docs- List - https://docs.github.com/en/rest/actions/cache#list-github-actions-caches-for-a-repository Delete by ID - https://docs.github.com/en/rest/actions/cache#delete-a-github-actions-cache-for-a-repository-using-a-cache-id Delete by Key - https://docs.github.com/en/rest/actions/cache#delete-github-actions-caches-for-a-repository-using-a-cache-key

P.S. This is currently not shipped to GHES and is expected to ship in GHES 3.7.

@vsvipul thanks for adding this endpoint, but however I am not able to use CACHE_ID to do the cache pruning.

$ curl -X DELETE -H "Accept: application/vnd.github+json" -H "Authorization: token $GITHUB_CACHE_TOKEN" https://api.github.com/repos/xxx/xxxx/actions/caches/67233
{"message":"Not Found","documentation_url":"https://docs.github.com/rest/actions/cache#delete-a-github-actions-cache-for-a-repository-using-a-cache-id"}

Also the purging cache by using CACHE KEY seems a bit wrong.

chenrui333 commented 2 years ago

Actually the cache purging works as expected, (I used gh cli earlier in the same session, which confused myself), sorry about the noise.

However, the documentation for cache purge using CACHE KEY could still use some help.

jacksongoode commented 2 years ago

However, the documentation for cache purge using CACHE KEY could still use some help.

Has anyone figured out how to use this endpoint? I keep getting Missing required query parameter key (HTTP 422).

vsvipul commented 2 years ago

@jacksongoode key is an essential query parameter. Can you please tell how are you passing it and I might be able to help more.

jacksongoode commented 2 years ago

Sure, I'm just not even sure where to put it in regards to the docs.

gh api \
  --method DELETE \
  -H "Accept: application/vnd.github+json" \
  /repos/OWNER/REPO/actions/caches

I'm not sure where the key ought to go in this query?

vsvipul commented 2 years ago

@jacksongoode Try this -

gh api \
  --method DELETE \
  -H "Accept: application/vnd.github+json" \
  /repos/OWNER/REPO/actions/caches\?key\=YOUR_KEY_HERE
jacksongoode commented 2 years ago

@vsvipul Thanks! Would be nice to have that mentioned in the docs as well :)

vsvipul commented 2 years ago

I'll look into getting this changed in the docs.

oddhack commented 2 years ago

Clarification request: can I use this API to force pulling a new image from Dockerhub used in the 'container:' field of a job? Currently having problems with a out-of-date cached image vs. a PR that requires the latest Dockerhub version.

wpbonelli commented 2 years ago

Thanks for this feature. Would it be possible to add the ability to match partial keys, like the cache action's restore-keys attribute does? It would be nice to be able to invalidate any cache entries matching a prefix. Naively, this seems compatible with the API's current implementation, since it already returns a list actions_caches of deleted entries.

alexklibisz commented 2 years ago

I would find it very useful to specify that the cache key can be overwritten from the post action.

Something like:

with:
  overwrite-key: true

Example: I have a cache keyed on "${{ github.workflow }}.${{ github.job }}.${{ github.ref_name }}". Call it "CI.tests.main" for the CI workflow, running the tests job, on the main branch. My job pulls the cache and runs the tests, which results in some modifications to the cache (maybe from compiling new files). In the post-action, the current behavior results in a message like:

Cache hit occurred on the primary key CI.tests.main, not saving cache.

If my job made significant changes to the cache (e.g., from a big refactor), then I'd like to update the cache. Otherwise, the next time it runs, it's going to pull the 7-day-old copy, and it's going to re-do a lot of work.

The current workaround I've found is to key on the commit hash:

- uses: actions/cache@v3
   with:
     key: ${{ github.workflow }}.${{ github.job }}.${{ github.event.after }}
     restore-keys: |
       ${{ github.workflow }}.${{ github.job }}.${{ github.event.before }}
     path: ...

The workflow tries to pull "CI.tests.", doesn't find it, instead pulls "CI.tests.", and then creates an entry for "CI.tests.". The next time it runs, the new before is the old after, so it repeats the same process.

This works, but it's confusing and is going to create a lot of churn in the cache. There will basically be zero cache hits on the key.

cdce8p commented 2 years ago

The current workaround I've found is to key on the commit hash:

- uses: actions/cache@v3
   with:
     key: ${{ github.workflow }}.${{ github.job }}.${{ github.event.after }}
     restore-keys: |
       ${{ github.workflow }}.${{ github.job }}.${{ github.event.before }}
     path: ...

The workflow tries to pull "CI.tests.", doesn't find it, instead pulls "CI.tests.", and then creates an entry for "CI.tests.". The next time it runs, the new before is the old after, so it repeats the same process.

I've implemented something similar to cache downloaded dependencies. However instead github.event.after, I use a timestamp. The result is the same though. There is never an actual cache hit, the action will just restore the last saved cached.

      - name: Generate partial restore key
        id: generate-key
        run: >-
          echo "::set-output name=key::$(date -u '+%Y-%m-%dT%H:%M:%s')"
      - users: actions/cache@v3
        with:
          path: ${{ env.CACHE }}
          key: >-
            ${{ runner.os }}-${{ env.CACHE_VERSION }}-${{ steps.generate-key.outputs.key }}
          restore-keys: |
            ${{ runner.os }}-${{ env.CACHE_VERSION }}-

Since I only need the latest cache entry, it would save a lot of storage if all old cache entries (which didn't match) could be deleted, with a config option.

alexislefebvre commented 2 years ago

Check this announcement: Manage caches in your Actions workflows from Web Interface.

mkurz commented 2 years ago

Check this announcement: Manage caches in your Actions workflows from Web Interface.

Thanks, however it is not possible to clean the whole cache of a repo right now, only single entries. That's why I created

Please vote for it, thanks!

vsvipul commented 2 years ago

@alexislefebvre @mkurz Thanks for adding the announcement here. https://github.blog/changelog/2022-10-20-manage-caches-in-your-actions-workflows-from-web-interface/ We have released the UI to manage your caches from web interface. Do check it out. I'm going to go ahead and close this issue now.

For bulk delete, we might support it in the future, and I would suggest you to open a separate issue as well for that for more discussion. Thank you.

mkurz commented 2 years ago

For bulk delete, we might support it in the future, and I would suggest you to open a separate issue as well for that for more discussion. Thank you.

Is https://github.com/orgs/community/discussions/36878 enough? Or do you mean in this repo here?

vsvipul commented 2 years ago

@mkurz That should be enough but if you want you can open an issue in this repo as well.

seh commented 2 years ago

It's worth noting that if you've been interpolating Actions secret values into your cache names in order to be able invalidate them forcibly by changing that secret value, the Web interface now reveals your secret values, embedded within the displayed cache names.

Now, that was an abuse of the secret system, and the values were immaterial—such as "v2"—but still, it's worth a warning that GitHub isn't protecting against revealing these secret values here.

nulano commented 2 years ago

I believe the cache names were already visible in the log of the cache step, e.g.:

Run actions/cache@v3 Received 7637907 of 7637907 (100.0%), 16.7 MBs/sec Cache Size: ~7 MB (7637907 B) C:\Windows\System32\tar.exe -z -xf D:/a/_temp/06f09ba7-bec2-461a-a241-d11afa9480f0/cache.tgz -P -C D:/a/Pillow/Pillow Cache restored successfully Cache restored from key: fbc4ecc14e7c86fb662b1be66ad2567466c767e957ec2f89ba03ac810e92d716-b1599657002fbcb4f133085efc810231ed266be17f248422ad42405170e10ac3-C:\hostedtoolcache\windows\Python\3.10.8\x86-17.3.32929.385

seh commented 2 years ago

I believe the cache names were already visible in the log of the cache step

But weren’t the secret values masked, as they are in other logs from job steps using secret values?

nulano commented 2 years ago

Oh right, they would be masked in the log. I always forget about that detail as I find it very unintuitive. But yes, I would not expect a secret value to be used as part of the cache key.

jmgilman commented 1 year ago

Adding cache management to the UI doesn't address the use case where we need to overwrite the contents of a specific cache programmatically. We have a bit of code that does an extended evaluation, and we would like to cache the results across runs; however, trying to get the cache key unique enough to do the right thing has been non-trivial. It'd be preferable to maintain a single cache across runs and overwrite it as required, but the current implementation of the action does not support this.

For now, it seems the best solution is calling the API to delete the cache from within the action when we need to overwrite its contents.

Betristor commented 1 year ago

I tried to manage the caches using timestamps and seek to find a way which could erase those outdated caches within a workflow. Motivation: I'm developing some optimization models. And I cached some base results for my code to compare every time I made a pull request so I could compare the difference between previous version and current version. Problem: The cached base results would not update as some changes are introduced and approved into the base results. Solution: Cache the results with timestamps and use the latest version of results. New cache will be triggered when a pull request is merged which means everything is approved including changes to the code and influence onto optimization model results. Only one thing left: The cache size will grow larger and larger. Some outdated caches will take the github quota. Fix: According to github official website, "GitHub will remove any cache entries that have not been accessed in over 7 days."

Everything done.

ihostage commented 1 year ago

Only one thing left: The cache size will grow larger and larger. Some outdated caches will take the github quota.

@Betristor If you are sure, that cache entry is outdated, you can remove it itself with help GitHub CLI.

For example in Playframework we remove outdated cache entries by simple script in schedule workflow https://github.com/playframework/playframework/blob/3a78ae68acb10e614e8b1f1411ff4f7b93cd909e/.github/workflows/delete-caches.yml

Betristor commented 1 year ago

For example in Playframework we remove outdated cache entries by simple script in schedule workflow https://github.com/playframework/playframework/blob/3a78ae68acb10e614e8b1f1411ff4f7b93cd909e/.github/workflows/delete-caches.yml

@ihostage That's quite an elegant way for my situation. Much thanks. Maybe temporarily github could do the thing for me and I will introduce this feature in later development. And each time when a new cache is generated, whether its contents are updated or not, it's outdated for me. That's why I used time stamp to tag them.

AntoinePrv commented 1 year ago

Hi there, anyone managed to call this in GHA? It works without any issue locally, but in the CI I keep getting Resource not accessible by integration (HTTP 403). I feel like I tried all possible combination of GITHUB_TOKEN GH_TOKEN, secrets.GITHUB_TOKEN, github.token, permissions and so on...

ihostage commented 1 year ago

@AntoinePrv You can see my example that I linked above. And two things in this example are related with your question.

  1. You need to add permission
    permissions:
    actions: write # this permission is needed to delete cache
  2. GITHUB_TOKEN works only in protected branches. If you want to delete cache entry in the job for PR for example, you need to create a personal token with right permissions.