actions / cache

Cache dependencies and build outputs in GitHub Actions
MIT License
4.58k stars 1.22k forks source link

Feature request: option to update cache #342

Open fsimonis opened 4 years ago

fsimonis commented 4 years ago

Problem Description Currently, the cache action either restores an existing cache on cache-hit, or generates a missing cache on cache-miss. It does not update the cache on cache-hit.

This works well for caching static dependencies, but not for caching build artefacts.

Proposed Solution Add an option allowing the user to enable cache updates. This should be false by default to retain backwards-compatibility.

uses: actions/cache@v2
with:
  path: ccache
  key: ${{ matrix.CONFIG }}-${{ matrix.CXX }}-${{ matrix.TYPE }}
  update: true   # <~~ explicitly request an update

Motivation Some programming languages benefit greatly from build caching. C++ in conjunction with ccache is the prime example. Using caching commonly decreases compilation times by at least 70%. Medium-sized projects easily take 20 minutes to compile. ccache also manages the cache size itself and automatically removes obsolete entries, thus the cache won't explode with continuous updates.

It also saves time and money for both user and provider. The environment will be happy too.

eine commented 4 years ago

I'm having this issue when trying to use @actions/cache for reducing update time of the built-in MSYS2 installation on windows-latest virtual environments. The virtual environments are outdated quite fast, and currently 5503.95 MiB need to be downloaded and installed on each job. It takes 8-10 min.

As commented in eine/setup-msys2#23, I'm trying to save /var/cache/pacman/pkg/. However, actions/checkout@v2 allows to do it once only. Later executions skip it: https://github.com/eine/setup-msys2/runs/737062486?check_suite_focus=true#step:12:2

Post job cleanup.
Cache hit occurred on the primary key msys, not saving cache.

Then, I tried using the npm package: https://www.npmjs.com/package/@actions/cache. Unfortunately, it fails: https://github.com/eine/setup-msys2/runs/738298072?check_suite_focus=true#step:4:116

##[error]reserveCache failed: Cache already exists. Scope: refs/heads/tool-cache, Key: msys2-nokey, Version: 000d31344dacf74d63d9e122f85409f68c5697c2aa32c5626452e8301c5d0c66

As an alternative to updating an existing key, it would be feasible to remove it explicitly (ref #340).

davidsbond commented 4 years ago

This would also be super handy for persisting the test cache when using GitHub Actions in go projects. For example, I could download the latest test cache, run my tests then update the existing cache with the results of the tests in the current run.

This way, I can have a global test cache across all my workflow runs.

zen0wu commented 4 years ago

This also applies to things like webpack loader cache, small, have their own key management, needs to be updated each time

Mordil commented 4 years ago

This would be super valuable for monorepo's where each subproject has its own dependencies it wants to cache and build so that upstream projects have faster build times.

I ran into the expectation that this was already the default behavior - so I wrote #392 under that assumption

potaito commented 4 years ago

Is there any workaround for forcing the update of the cache even on a hit? I can't think of any way... In my case it would reduce the compilation time from 20 minutes to 3 minutes if I could use ccache for QT/C++.

Vampire commented 4 years ago

Maybe save as ccache-${{ github.run_id }} and restore with restore key ccache-. github.run_id is unique id for the workflow run, so every time a new cache is saved. When restoring you will never have an exact match but then the ccache- restore key will restore the latest one that started with that string and in the end create a new one with the current state.

potaito commented 4 years ago

@Vampire that's brilliant, thanks mate! You are right, forgot about the pattern matching that the cache finding does. Perhaps this is then a non-issue and your solution is the intended way of doing things?

Vampire commented 4 years ago

Nah, that's merely a work-around. It will fill up your 5 GiB of cache and then evict things that might not have been evicted if the cache would have been updatable.

PathogenDavid commented 3 years ago

@HebaruSan It won't save anything at all.

https://github.com/actions/toolkit/blob/73d5917a6b5ea646ac3173cfceb727ee914ff6ed/packages/cache/src/cache.ts#L166-L175

HebaruSan commented 3 years ago

Duplicate of #171

akmjenkins commented 2 years ago

This is absolutely required, I use it to speed up eslint and my eslintcache file always gets updated on run even if it restores. I need to force save this file somehow after the run is complete.

WestonThayer commented 2 years ago

Also helps Typescript tsc --build --incremental (on by default with Project References), caching the outDir and any *.tsbuildinfo files. We don't need to key these based on any src files, need the results from a previous build. tsc will handle changes in src (well, for the most part https://github.com/microsoft/TypeScript/issues/16057).

bityob commented 2 years ago

Any update?

bityob commented 2 years ago

@HebaruSan It won't save anything at all.

https://github.com/actions/toolkit/blob/73d5917a6b5ea646ac3173cfceb727ee914ff6ed/packages/cache/src/cache.ts#L166-L175

It's now have been updated and new cache will override old ones.

Caching dependencies to speed up workflows

GitHub will remove any cache entries that have not been accessed in over 7 days. There is no limit on the number of caches you can store, but the total size of all caches in a repository is limited to 10 GB. If you exceed this limit, GitHub will save your cache but will begin evicting caches until the total size is less than 10 GB.

@PathogenDavid @HebaruSan

ulterzlw commented 1 year ago

if the workflow works with a large cache and ${{ github.run_id }} mentioned above is not applicable, I found a custom pipe with restore -> clear -> save as another workaround. A simple example would be

  - name: restore
    id: cache-restore
    uses: actions/cache/restore@v3
    with:
      path: path/to/cache
      key: $KEY
  - name: do stuff
    run: echo stuff
  - name: clear
    run: |
      gh extension install actions/gh-actions-cache
      if ${{ steps.cache-restore.outputs.cache-hit == 'true' }}; then
        gh actions-cache delete $KEY --confirm
      fi
  - name: save
    uses: actions/cache/save@v3
    if: always()  # save cache even fails
    with:
      path: path/to/cache
      key: $KEY

It clears the old cache and replaces it with a new one. Though it is not a cache update, it should suit most cases.

andreasabel commented 1 year ago

@ulterzlw: Do you need to explicitly delete the existing cache? Wouldn't actions/cache/save overwrite an existing cache entry with the same key? (This is the behavior I would naturally expect.) Unfortunately, the README is silent on this crucial information.

Update: To answer my own question: Yes, you need to delete an existing cache entry, otherwise actions/cache/save does nothing but emit a warning like:

Failed to save: Unable to reserve cache with key ..., another job may be creating this cache. More details: Cache already exists.

andreasabel commented 1 year ago

@ulterzlw Thanks for your workaround! I used it with some modifications in the clear step:

  - name: restore
    id: cache-restore
    uses: actions/cache/restore@v3
    with:
      path: path/to/cache
      key: $KEY

  - name: do stuff
    run: echo stuff

  # only execute this step when cache was restored
  # do not fail hard here, as the $KEY might not exist; cache could have been restored from $KEY-something
  - name: clear
    if: ${{ steps.cache-restore.outputs.cache-hit }}
    shell: bash
    env:
      GH_TOKEN: ${{ github.token }}
    run: |
      gh extension install actions/gh-actions-cache
      gh actions-cache delete $KEY --confirm
    continue-on-error: true

  - name: save
    uses: actions/cache/save@v3
    if: always()  # save cache even fails
    with:
      path: path/to/cache
      key: $KEY
Luc45 commented 1 year ago

I once again circle back to this issue, needing this feature. Is it possible to have an update on this by a maintainer? Will the option to save a cache on cache hit (force update) ever make it into this action?

ro0NL commented 1 year ago

:sob:

azu commented 1 year ago

With reference to the comments, I have created a sample repository of cache overrides that actually work.

name: Update Cache
on:
  workflow_dispatch:
permissions:
  contents: read
  # require to delete cache
  # https://docs.github.com/en/rest/actions/cache?apiVersion=2022-11-28#delete-github-actions-caches-for-a-repository-using-a-cache-key
  actions: write
jobs:
  update:
    runs-on: ubuntu-latest
    env:
      # overwrite cache key
      cache-key: your-cache-key
    steps:
      # This job implements overwrite cache using restore + delete + save
      - name: Checkout
        uses: actions/checkout@v3 # gh command require repository
      - name: Restore Cache
        id: cache-restore
        uses: actions/cache/restore@v3
        with:
          path: ./cache
          key: ${{ env.cache-key }}
      # Main Task
      - name: Main Task
        run: |
          # generate current time to ./cache/time
          mkdir -p ./cache
          previous_date=$(cat ./cache/time || echo "No previous date")
          current_date=$(date "+%Y-%m-%d %H:%M:%S")
          echo "Previous: $previous_date"
          echo "Current: $current_date"
          # Save current time to ./cache/time
          echo "$current_date" > ./cache/time
      # overwrite cache key: delete previous and save current
      - name: Delete Previous Cache
        if: ${{ steps.cache-restore.outputs.cache-hit }}
        continue-on-error: true
        run: |
          gh extension install actions/gh-actions-cache
          gh actions-cache delete "${{ env.cache-key }}" --confirm
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      - name: Save Cache
        uses: actions/cache/save@v3
        with:
          path: ./cache
          key: ${{ env.cache-key }}

The process of deleting the cache is complicated by the use of the gh command. Perhaps a good place to start is to consider adding actions/cache/delete.

ro0NL commented 1 year ago

Perhaps a good place to start is to consider adding actions/cache/delete.

i'd prefer proposed update: true

GuyAv46 commented 1 year ago

For anyone performing the gh workaround and doesn't want to checkout the repository in the job running it, you just need to provide a GitHub token and the repository name:

cache-sha:
  # Caches the SHA of the last successful build
  runs-on: ubuntu-latest
  steps:
    - name: Clear cache
      continue-on-error: true # Don't fail if the cache doesn't exist
      env:
        GH_TOKEN: ${{ github.token }} # required by gh
      run: |
        gh extension install actions/gh-actions-cache
        gh actions-cache delete ${{ env.CACHE_NAME }} --confirm -R ${{ github.repository }}
grosser commented 6 months ago

FYI when using the cache deletion trick and getting Error: Resource not accessible by integration you have to enable Read and write permissions under repo settings under Actions -> General -> Workflow permissions

lcswillems commented 5 months ago

Hey @bethanyj28, what's the reason such a feature has not been shipped yet? It is built-in in Gitlab, etc. It is the 2nd top most requested. And it seems it is doable (they are workarounds but cumbersome)

romani commented 1 month ago

mutable cache that exists just during single run of workflow would be awesome option. it should not be default behavior but possibility will be convenient, it would help to checkout sources ones and keep them from job to job (of single workflow) to let each job mutate/update file system as required and help following job to reuse it.

our workaround was: to create one more cache file.

assignUser commented 1 month ago

from job to job (of single workflow) to let each job mutate/update file system as required and help following job to reuse it.

@romani you can use artifacts for that too, just overwrite the existing one and cache it in the last job to make it permanent across workflows (if desired).

romani commented 1 month ago

https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-and-sharing-data-from-a-workflow#comparing-artifacts-and-dependency-caching

Use artifacts when you want to save files produced by a job to view after a workflow run has ended, such as built binaries or build logs.

Nuance is that in our case we do not want to share some files after workflow ended. We can hack everything to do what we need, but we tried to not polluting outside of workflow.