actions / cache

Cache dependencies and build outputs in GitHub Actions
MIT License
4.57k stars 1.21k forks source link

Clear cache #2

Closed Lauszus closed 2 years ago

Lauszus commented 5 years ago

Thanks for the action. It would be great if it was possible to clear the cache as well either using the action or via the web interface.

chrispat commented 5 years ago

We will evict the least recently accessed cache automatically. Eventually we will add a way to clear out the cache from the UI.

stevencpp commented 5 years ago

Maybe it could be a part of something to list the individual caches and their sizes in the UI and allow clearing only specific caches ?

chrispat commented 5 years ago

Caches will all expire eventually so I don’t think being able to clear specific ones is going to be very useful.

jcornaz commented 5 years ago

Caches will all expire eventually so I don’t think being able to clear specific ones is going to be very useful.

Please consider the following use case:

A gradle project caching the gradle caches like this:

    - name: Gradle caches
      uses: actions/cache@v1
      with:
        path: ~/.gradle/caches
        key: gradle-cache-${{ hashFiles('**/*.kt*') }}
        restore-keys: |
          gradle-cache-

Each build will:

Which means the build cache will grow over time and will eventually reach the size limit.

But expiring the old caches doesn't help, because the build always re-upload it.

Being able to manually clear the cache from time to time, would definitely be nice and useful.

I would personally do when:

stevencpp commented 5 years ago

My use case for clearing only specific caches is that I have a huge dependency that takes two hours to build, and while it may eventually expire, I really don't want to have to rebuild that unless absolutely necessary. I also have a bunch of other smaller caches, and in some cases getting the right hash at the end of the key to make it rebuild exactly when it needs to can be difficult to implement, so I'd like to just use a key either without a hash at the end, or with an imperfect hash, and manually clear only that specific cache when I know it needs to be rebuilt. My current workaround is to rename the key when I want to rebuild it, but it would be better to avoid those dummy commits and just do it from the UI.

eregon commented 4 years ago

In https://github.com/eregon/use-ruby-action/issues/7 we noticed an issue because using subtly different prebuilt Ruby binaries corrupts the cache. Being able to clear the cache when it's corrupted or contains errors would be very useful as a debugging mechanism.

Right now the only workaround seems to use another key for the cache action.

firefart commented 4 years ago

Same here. Dealing with a corrupted binary in a gem cache and the only way to get rid of it is to use another key. A method to clear the cache would be nice

adithyabsk commented 4 years ago

Thought I'd ping this as well. Running into cache issues after the repo's name was changed.

knubie commented 4 years ago

I need to clear the cache to get my yarn installation working again. An interface would be nice, but in lieu of that, how can i clear it manually?

dhadka commented 4 years ago

@knubie Change the key (and restore keys). For example, you could add -v2- to the key.

matks commented 4 years ago

Would be awesome to have a way to clear cache from the UI 👍

I am setting up a GA using Composer dependencies (using dev-master) so I need Composer to grab the very latest versions from my github repo instead of the one he has in cache

ioquatix commented 4 years ago

Running into the same issue. Cached the wrong directory, the cache archive is empty and cannot regenerate it.

kubukoz commented 4 years ago

This seems like a really obvious feature, I really wish it was available :)

any way to clear the cache without editing the workflow files would be awesome!

beatngu13 commented 4 years ago

One can abuse secrets to clear the cache via the UI, i.e. without editing the workflow file. For instance, create a secret using the key CACHE_VERSION and as value a Unix timestamp, a counter, a version string or something else. Then use it as follows:

key: ${{ runner.os }}-mycache-${{ secrets.CACHE_VERSION }}-${{ hashFiles(...) }}

Whenever the secret is updated via the UI, the workflow will use a new cache.

marco-schmidt commented 4 years ago

After a version upgrade a plugin did somehow not properly update its internal database kept in the cache (NVD, rather large): https://github.com/jeremylong/dependency-check-gradle/issues/196 On my machine, rm -rf ~/.gradle/dependency-check-data/ was enough to solve this, forcing the plugin to recreate that database. My attempt to add that call to rm to my GitHub action failed for some reason after less than a minute, no idea what caused a cancellation:

##[error]The operation was canceled.

https://github.com/marco-schmidt/am/runs/1087579674?check_suite_focus=true

With Travis CI I could simply delete the cache, a new build then leads to the database being recreated and that solves the issue. It would be very helpful to be able to do this in GitHub actions as well.

dhadka commented 4 years ago

@marco-schmidt While it's not quite the same as deleting a cache, you can change the key to effectively start with a new cache. For example, it's common to add a version number, such as ${{ runner.os }}-gradle-v2-${{ hashFiles('**/*.gradle') }}. Please make sure to update the key as well as any restore-keys with the change.

As for the operation was canceled, that's caused by the fail-fast behavior. Since the windows-latest job failed, it canceled the ubuntu-latest job. You can change this behavior by setting fail-fast to false. Here's an example: https://github.com/actions/cache/blob/main/.github/workflows/workflow.yml#L23

ahmadnassri commented 4 years ago

2 years in, still no manual cache control... because the team who built this clearly never uses it in their own work.

allow user to clear cache (like every other system in the planet) is bad ... yet charging every penny possible for storage of said cache is somehow okay?

beatngu13 commented 4 years ago

2 years in, still no manual cache control... because the team who built this clearly never uses it in their own work.

allow user to clear cache (like every other system in the planet) is bad ... yet charging every penny possible for storage of said cache is somehow okay?

Yes, the feature request should get more attention, but that comment is a bit harsh and unobjective IMO. Also, there are various practical workarounds, so it's not really a blocker.

ahmadnassri commented 4 years ago

yes, that was harsh, however no less harsh and unobjective than the Staff Engineer for Actions stating:

Caches will all expire eventually so I don’t think being able to clear specific ones is going to be very useful.

all while, GH continues to charge for storage without any indication on cache expiry rules (or control over it)

I've been steadily updating the $ limits in one account month over month as storage usage keeps going up, and had to migrate away from caching big items (like docker image builds)

also, somewhat related: have some ghost workflows that were deleted months ago, yet still show up in the UI... seemingly cached forever!

the only workaround is essentially changing the cache key, but that does not give you control over cache.

filips123 commented 4 years ago

all while, GH continues to charge for storage without any indication on cache expiry rules (or control over it)

What about this:

GitHub will remove any cache entries that have not been accessed in over 7 days. There is no limit on the number of caches you can store, but the total size of all caches in a repository is limited to 5 GB. If you exceed this limit, GitHub will save your cache but will begin evicting caches until the total size is less than 5 GB.

ahmadnassri commented 4 years ago

the 5 GB number doesn't square with what I'm seeing, I wish the billing page showed more breakdown, but currently it's a total cost of "shared storage between both Actions & Packages" then I'd be able to share exactly how much storage is being used for caching, cuz it clearly not all packages ... 🤷‍♂️

thanks for sharing that, I clearly missed it

dhadka commented 4 years ago

@ahmadnassri Cache storage is free and is not included in the "shared storage for Actions & Packages" shown on the billing page.

also, somewhat related: have some ghost workflows that were deleted months ago, yet still show up in the UI... seemingly cached forever!

Can you still view the logs and any artifacts created by that workflow? I believe the record of the workflow will still exist (so links to old workflows still work), but old logs and artifacts will be deleted after the workflow expires.

If you don't need to keep artifacts around for the full 90 days, you can also consider using an action like gha-remove-artifacts to clean up artifacts sooner as a way to reduce storage costs.

I wish the billing page showed more breakdown, but currently it's a total cost of "shared storage between both Actions & Packages" then I'd be able to share exactly how much storage is being used for caching, cuz it clearly not all packages

I'll track down the team that works on billing and open an issue for this. Please also consider sending this request via contact us.

ahmadnassri commented 4 years ago

funny how everything I posted is getting a reply except for the original topic: allowing clearing cache

ahmadnassri commented 4 years ago

Can you still view the logs and any artifacts created by that workflow?

no logs & artifacts, but the workflow entry and the execution history still show, as well as a cached version of the workflow file (4+ months after they've been deleted)

I realize this particular issue is not related to this particular action though, so can post more details elsewhere..

kcgen commented 4 years ago

@chrispat said: "Eventually we will add a way to clear out the cache from the UI."

Coming up on a year; @chrispat, can you please give us an update on this?

I'm not sure why something so trivial yet sought-after hasn't been implemented.

Forget the GUI for now (I don't have more years to wait for it) The MVP for this is a single API call that can be added to ones YAML and rm -rf the project's cache. Done.

Jared-Dev commented 3 years ago

As others have mentioned. We have a corrupt package that is failing the build. I appreciate the "hack" listed above, but this should really be added into the UI.

Could the "Re-run jobs" dropdown not have an option to "Re-run all jobs & clear cache" added?!

image

Borda commented 3 years ago

I guess that the problem is that cache is handled by another/independent action/bot and as a user you can set it anywhere, so there is probably not simple way how to tell which cacher you want to drop... well maybe drop all? :]

ghost commented 3 years ago

Can confirm: this needs to be implemented at some point. :/

ellneal commented 3 years ago

+1

Just encountered a situation where I need to clear the cache to solve a build failure, and the lack of support for it is frustrating (especially given how long this issue has been open).

gustavopch commented 3 years ago

@ellneal You can use this workaround for now: https://github.com/actions/cache/issues/2#issuecomment-673493515

ThiefMaster commented 3 years ago

How is this still open over a year later?! The workarounds - even with the variable from secrets - are super ugly... I'm tempted to just run a dummy action that dumps 5GB of garbage in the cache to evict old data but that's ugly as well (and a waste of space).

Milo123459 commented 3 years ago

Is there a way to just clear it via the action? For example use actions/cache@v2 with: clear or smth.

WolfgangFahl commented 3 years ago

My need derives from this. The original tests had an issue leading to a 3 GByte download. After a fix it's only 2.5 MByte The download results seem to be in the cache - the machine fails before even trying the new download.

ocelotl commented 3 years ago

As others have mentioned. We have a corrupt package that is failing the build. I appreciate the "hack" listed above, but this should really be added into the UI.

Could the "Re-run jobs" dropdown not have an option to "Re-run all jobs & clear cache" added?!

image

Yes, please. This seems intuitive and practical, users will greatly appreciate this feature :+1:

bbimber commented 3 years ago

Having the ability to clear cache has been an open issue forever with no change. What about allowing some kind of flag that simply tell the cache plugin "dont query the cache on this job". I know that's not the same thing as clearing a bad cache, but if a given job was prevented from querying the existing cache, completed, and then re-populated the cache on success, this might be a pragmatic solution to clear that bad cache. Presumably implementing this kind of thing is a different level of complexity than true cache clearing. It could probably be implemented on this plugin.

I dont know if there are more elegant ways to pass per-job environment or config, but the same concept as the 'hack' for ${{SECRETS.CACHE_VERSION}} could be applied. This plugin could support:

key: restore-keys: $ skipCacheLoad: <SOME CONDITION or ${{ secrets.SKIP_QUERY_CACHE }}>

jpvajda commented 3 years ago

Adding to the pile of requests, this would be a very useful feature. 🙏

sjackman commented 3 years ago

I would scream with joy for

lavfilip commented 3 years ago

How is this still not done

Milo123459 commented 3 years ago

Not sure maybe the maintainers have lives?

kubukoz commented 3 years ago

How is this still not done

@lavfilip pull requests are open.

NyCodeGHG commented 3 years ago

Would be awesome with an easy way to clear the cache. I know there are some workarounds, but that would be easier with something in the ui.

WolfgangFahl commented 3 years ago

Ubuntu had bug #1. github actions has bug # 2

samwightt commented 3 years ago

Is this being actively considered / worked on? It's the second issue on the repo and one of the most requested features.

OmgImAlexis commented 3 years ago

@chrispat can we get an update on if this is even being looked into? Your first comment suggested that you’d be adding it at some point. Well we’re over a year later and still nothing.

Edit: even if you don’t add something in the UI please atleast get it added to your official cli.

edaemon commented 3 years ago

While searching for a solution or workaround to this I came across this Stack Overflow answer which suggested including a secret in the cache key, e.g.:

key: ${{ runner.os }}-example-${{ secrets.CACHE_VERSION }}

This allows you to change the cache key without changing the workflow file. It isn't quite as nice as a native feature but it gets the job done; if you need to re-run a job with a fresh cache just update the CACHE_VERSION secret and re-run the job.

OmgImAlexis commented 3 years ago

While searching for a solution or workaround to this I came across this Stack Overflow answer which suggested including a secret in the cache key, e.g.:

key: ${{ runner.os }}-example-${{ secrets.CACHE_VERSION }}

This allows you to change the cache key without changing the workflow file. It isn't quite as nice as a native feature but it gets the job done; if you need to re-run a job with a fresh cache just update the CACHE_VERSION secret and re-run the job.

There's a big caveat to this vs actually clearing the cache and that's if you keep doing this you'll likely hit the cache size limit.

edaemon commented 3 years ago

There's a big caveat to this vs actually clearing the cache and that's if you keep doing this you'll likely hit the cache size limit.

That's true and worth noting. Is there much downside to hitting the limit, though? As far as I understand it, once you hit the limit your least-recently-used cache items are evicted, so I don't think workflows would be held up or impaired by reaching the cache size limit.

nulano commented 3 years ago

There's a big caveat to this vs actually clearing the cache and that's if you keep doing this you'll likely hit the cache size limit.

Isn't that a good thing? IIUC when you hit the cache limit, the least recently used cache entry is deleted, so this should just clear the cache entry you want to get rid of.


key: ${{ runner.os }}-example-${{ secrets.CACHE_VERSION }}

It's probably better to put the secret at the front so that you can use something like

key: ${{ secrets.CACHE_VERSION }}-example-${{ runner.os }}-${{ hashFiles("example") }}
restore-keys:
  ${{ secrets.CACHE_VERSION }}-example-${{ runner.os }}-
  ${{ secrets.CACHE_VERSION }}-example-
gugaiz commented 3 years ago

What if you do something like this?

name: manual deploy with invalidate cache

on: 
  workflow_dispatch:     
     inputs:      
        invalidate_cache:        
           description: Invalidate GitHub Action cache        
           required: true        
           default: 'false'      

jobs:
 tests:
    steps:
      - uses: actions/checkout@v2
      - name: Cache Libs
        id: cache-libs
        uses: actions/cache@v2
        with:
          path: |
            node_modules
            public
          key: cache-node-${{ hashFiles('**/yarn.lock') }}
          restore-keys: |
            cache-node-

      - name: Invalidate cache
        if: ${{ github.event.inputs.invalidate_cache != 'false' && steps.cache-libs.outputs.cache-hit == 'true' }}
        run: |
          rm -rf node_modules
          rm -rf public
beatngu13 commented 3 years ago

There's a big caveat to this vs actually clearing the cache and that's if you keep doing this you'll likely hit the cache size limit.

That's true and worth noting. Is there much downside to hitting the limit, though? As far as I understand it, once you hit the limit your least-recently-used cache items are evicted, so I don't think workflows would be held up or impaired by reaching the cache size limit.

If, for instance, you maintain different caches for multiple scopes such as branches, you might evict valid caches due to the overall size limit of 5 GB.

It depends on your project/workflow if that is actually a "big caveat", but currently I'm not aware of any other workaround other than enforcing a cache miss by changing the key(s).