actions / cache

Cache dependencies and build outputs in GitHub Actions
MIT License
4.46k stars 1.19k forks source link

Add a parameter to ignore the existing cache but still store the cache at the end #481

Closed cecton closed 2 years ago

cecton commented 3 years ago

Feature Request

Description

Add a parameter "no-restore" that will ignore the existing cache but still store the cache at the end of the job.

Use Case

I'm building a Rust project and I try to keep a cache "main" that is used in fallback of PR if there is no pre-existing cache for that PR. In other words, newly created PR use this cache as "base".

Unfortunately the dependencies can change, the version of the compiler too, and the target (build) directory gets bigger over time.

I don't need a cache on the main branch, I only use it to accelerate the first build of the PR. So it would be nice to have a flag "no-restore" that would ignore the existing cache but still save and replace the cache at the end.

Workaround

In the workflow that builds the main branch:

      - name: Get main commit hash
        id: main_commit
        run: echo "::set-output name=hash::$(git rev-parse origin/main)"

      - name: Cache target, rustup and cargo registry
        id: cache-target
        uses: actions/cache@v2
        with:
          path: |
            ~/.rustup
            ~/.cargo/registry
            ~/.cargo/bin
            target
          key: main-${{ steps.main_commit.outputs.hash }}

In the workflow that builds the PR:

      - name: Get main commit hash
        id: main_commit
        run: echo "::set-output name=hash::$(git rev-parse origin/main)"

      - name: Cache target, rustup and cargo registry
        id: cache-target
        uses: actions/cache@v2
        with:
          path: |
            ~/.rustup
            ~/.cargo/registry
            ~/.cargo/bin
            target
          key: ${{ github.event.issue.number }}-${{ runner.os }}-pr
          restore-keys: main-${{ steps.main_commit.outputs.hash }}
b-onc commented 3 years ago

We also have a need for this: In our iOS project, we cache Carthage folder, but sometimes the cache becomes invalid even if it was a hit. In that case, we'd like to be able to invalidate cache.

eyal0 commented 3 years ago

@b-onc #498 is a solution that I created that would fix this problem.

riccardoporreca commented 3 years ago

@cecton, @b-onc, @eyal0, This is surely an interesting situation.

Based on Matching a cache key in GitHub Docs, I am not sure I fully understand @cecton's workaround and way of defining your key and restore-keys, but probably I cannot see the full picture of your setup and usage of PRs.

In general, you want to construct a key based on the information that might affect the content of the paths you are caching. Also keep in mind that, if there is a hit on the primary key, the cache is not saved. This makes

key: ${{ github.event.issue.number }}-${{ runner.os }}-pr

perhaps not the best choice, since if you run the workflow multiple times for the same PR but on different commits updating the content of the cached paths, the cache would not be updated (this might not be always relevant). One possible approach if you cannot infer a good key from your code-base is to simply include the commit SHA in the key, e.g.:

key: ${{ runner.os }}-my-cache-${{ github.sha }}
restore-keys: ${{ runner.os }}-my-cache-

In any case, my $0.02 on the topic of preventing cache hits when you sort of want to "start afresh" in an ad-hoc manually-controlled way. Based on the same principle for constructing a key expressed above (and on the fact that you normally want to have stable workflow files across branches), you could include a file whose content you will change whenever you want to avoid hitting any cache created with a different content of that file. Such file could be something like .github/workflows/my-cache.lock and you would use it as follows (based on the example above):

key: ${{ runner.os }}-my-cache-${{ hashFiles('**/my-cache.lock') }}-${{ github.sha }}
restore-keys: ${{ runner.os }}-my-cache-${{ hashFiles('**/my-cache.lock') }}-

As soon as the content of my-cache.lock changes so will ${{ hashFiles('**/my-cache.lock') }} and as such no key or restore-keys produced with a different content will be matched. It will be up to you to decide when (and how) to change its content.

A Hybrid approach is also to not include -${{ hashFiles('**/my-cache.lock') }} in the restore-keys: this will potentially hit and restore an existing cache but not on the primary key, making sure the new content of the cached paths is cached.

eyal0 commented 3 years ago

Putting the git SHA hash into the key means that you will be saving a new cache each time. That's way too often! Because the cache is limited in size, old stuff will expire too frequently!

Think about what functions we'd actually like from a cache:

  1. Fetch from cache, possibly not finding a hit.
  2. Save to cache if there is no entry.
  3. Save to cache overwriting an entry.

This GitHub action does the first two but there is no way to do the third. If the entry exists, you can't overwrite it. That's why you need a way to force an overwrite. That's what I did in my PR linked above.

You can mess around with the cache key and restore-keys but you'll not find a way to update an existing key. So either you can't refresh or you get a solution that will churn the cache entries.

riccardoporreca commented 3 years ago

Totally agree the SHA is a very aggressive and "extreme" approach, which I have been using for cases where the cache is very tiny (like for caching curl responses), and if there isn't much else one can do it might be better than having a key that is not really a key for the cached content. I used it here also to have a somewhat reasonable context-agnostic example.

I agree in general about the possible need for overwriting a cache in case of somehow corrupted content previously cached, and there is surely room for extensions like the one you propose to make the cache actions more workable: :+1: there!

I just think this might not be the best approach to work around situations where e.g. "the dependencies can change, the version of the compiler too", since such things should be captured in the key itself. Using cache overwriting "easily" for such things is sort of taking a shortcut I would personally not recommend. It is sort of going against the whole concept behind key-based caching, which might blow up for larger collaborative projects. Hence the suggestion: if you cannot construct a meaningful key based on your code-base and the build environment, that would prevent using an inconsistent cache from previous builds or possibly lead to a new inconsistent / redundant / ever growing cache saved for a new build, you can control it using a file in your repo

It is a different perspective, and I think it is good to have a few options out there, so that people in the community can see what fits best their needs.

cecton commented 3 years ago

I switched to Swatinem/rust-cache@v1 just for simplicity reason.

I don't think it solves what I wanted but I don't want to bother too much at the moment. It's working enough.

So I think we can close the ticket unless someone else has interest in it.

perhaps not the best choice, since if you run the workflow multiple times for the same PR but on different commits updating the content of the cached paths, the cache would not be updated (this might not be always relevant).

I didn't know but that is very blocking for the change I asked I think. I guess I had a different idea of how things work.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 365 days with no activity. Leave a comment to avoid closing this issue in 5 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 5 days since being marked as stale.