dataaxiom / ghcr-cleanup-action

GitHub Container Registry Cleanup Action
BSD 3-Clause "New" or "Revised" License
22 stars 5 forks source link

[Feature request] Keep only n tagged images (and ignore specific tag) #1

Closed ManiMatter closed 2 months ago

ManiMatter commented 6 months ago

Hi,

Fantastic work, thank you so much for solving this headache. I have a feature request to keep only n-tagged images, and to always keep a specific tag as well.

Background: I have a setup where I have two types of containers: 1) One "dev" container 2) Ongoing versioned containers (ie. latest)

Whenever I push to "dev" branch, it will create a new container tagged "dev", and untag the old "dev" container; this helps me to test the image before releasing it.

When I then merge to the "main" branch, a new versioned (e.g. 4.2.1) container is created.

Proposal What I would love to do is the following:

rohanmars commented 6 months ago

Thanks! That's a good idea. Let me look into it.

Something like?

    steps:
      - uses: dataaxiom/ghcr-cleanup-action@v1.0.0
        with:
          keep-n-tagged: 10
          exclude-tags: dev
          token: ${{secrets.GITHUB_TOKEN}}
ManiMatter commented 6 months ago

I think your proposal is perfect

rohanmars commented 6 months ago

I've implemented the feature. Please test it out on the v1.0.1 or v1 tag/release. I added a dry-run option also if you want to test with that first to simulate everything

ManiMatter commented 6 months ago

Hey, super cool that you built it so quickly, much appreciated.

I let it run with these settings, and got the following error

Settings:

      - name: "Clean up docker images"
        uses: dataaxiom/ghcr-cleanup-action@v1
        with:
          keep-n-tagged: 10
          exclude-tags: dev
          dry-run: true
          token: ${{secrets.GITHUB_TOKEN}}

Error:

Error: Cannot read properties of undefined (reading 'id')

Job: https://github.com/ManiMatter/decluttarr/actions/runs/9083315844/job/24961774859

ManiMatter commented 6 months ago

Btw - "cherry on top" - does the script print out at the very end how many containers it removed? Would be cool to know (makes one feel good to know how much crap was removed) 🧹

rohanmars commented 6 months ago

Yes, it outputs the following kind of log currently:

deleting tagged images, keeping 0 versions
deleting package id: 216128070 digest:sha256:a6d2b38300ce017add71440577d5b0a90460d0e57fd7aec21dd0d1b0761bbfb2 tag:ubuntu
deleting package id: 216128003 digest:sha256:2af372c1e2645779643284c7dc38775e3dbbc417b2d784a27c5a9eb784014fb8 tag:architecture amd64
deleting package id: 216128018 digest:sha256:2f63021dc56651000aa1e250d42c3aa898a5cd61120aeb8daf9e7e0fd20b84e5 tag:architecture arm
deleting package id: 216128036 digest:sha256:462e829de9164b6c066246cddc265a936071744f689f0ea73daa92b4f9feb47e tag:architecture arm64
deleting package id: 216128051 digest:sha256:6250b8edd7248ca0764e8c10069113ac1c837becd6e1e5a92991dfa14dce842f tag:architecture ppc64le
deleting package id: 216128067 digest:sha256:4b24be9d94475438fe8313d8772be9c94e7c89d4e2b2d2a7570dcb3a7f51ee80 tag:architecture s390x

I've just found some further optimizations which I'll push and release right now, it includes a wildcard matching feature for the tag/exclude tags

It doesn't yet print out total number of images deleted.

ManiMatter commented 6 months ago

cheers. will test it when the next version is ready, let me know

rohanmars commented 6 months ago

I've push the latest changes, v1 or v1.0.2.
I've verified the image order and some a bunch of different tests, however the code equality test is different between keep-n-tagged and keep-n-untagged which I'm still investigating/testing.

rohanmars commented 6 months ago

Can you trigger your pipeline again and see if the id error still exists?

rohanmars commented 6 months ago

I think I can harden then platform image deletion.I think you might have some manifests that don't link to actual images. I'll make the change right now.

ManiMatter commented 6 months ago

re-ran the job right now, ended up in the same place: Error: Cannot read properties of undefined (reading 'id')

https://github.com/ManiMatter/decluttarr/actions/runs/9083315844/job/24964895577

Not sure I understand this one?

I've verified the image order and some a bunch of different tests, however the code equality test is different between keep-n-tagged and keep-n-untagged which I'm still investigating/testing.

rohanmars commented 6 months ago

I'm testing a fix. I've added some extra guards to handled corrupt manifests.

On the image order I have to sort the remaining "images" once all the processing is done, what I found was the sort function had to be reversed between keep-n-tagged and keep-n-untagged logic. Which I wouldn't expect, but my test showed it needed it. (Still need to verify it more)

rohanmars commented 6 months ago

Thanks for the PR!!!!! I've made some changes on main branch. Can you retest using main?

uses: dataaxiom/ghcr-cleanup-action@main
ManiMatter commented 6 months ago

pleasure.

Now it passes, but I see these warnings:

Warning: couldn't find image digest sha256:b25c425cb7b29392a93342a9f9b2d3f3655bf1bba0626d4949ccb29d9a2f2d94 in repository, skipping Warning: couldn't find image digest sha256:9290ba706a8173b39325236be6e0803cbb0835b7cf4022aa63315a3709628539 in repository, skipping Warning: couldn't find image digest sha256:45db222b16bd5b922b6cb9e7b08ea6486b9b72cf9d6a321041ec688348114a07 in repository, skipping Warning: couldn't find image digest sha256:30d7d0a3727d7a82ccfb835281495bf45d96829efc1cb043308917fceab4c71d in repository, skipping deleting package id: 195021580 digest:sha256:2d49d6d1a3fa6a9188d4f23ce3e4a1b1987ea57aa29dcb97e8fd4c9c64bcfab5 tag:v1.31.0 Warning: couldn't find image digest sha256:bb1dd8101ad4e1d372470cd5451ee20b9d46b7cfc680df682970eaddb8f49b80 in repository, skipping Warning: couldn't find image digest sha256:f5715c3bb6f7886d201ebc7278e58e674a5eaa3f92fa07dfe9bdb5e0d7819213 in repository, skipping Warning: couldn't find image digest sha256:1f9ee2ae5b0eb06322458d3e8c506e696f2ddbf72996cd909630071081e3c79a in repository, skipping Warning: couldn't find image digest sha256:e272e95bbfd56c14318d38d36719b7a7d85055cf0f1c904a0259acb1e15a3c13 in repository, skipping deleting package id: 195021156 digest:sha256:9273a4198824a8a15d1e0856e56e498a6f6f938116121e756dd9e2148ceb4fac tag:v1.30.0 Warning: couldn't find image digest sha256:7a4ba8b4397c604529532a4c945984e93d66dd1524a1a690b75494d9baf4b0aa in repository, skipping Warning: couldn't find image digest sha256:7e9b63975b3803fe421b628987fd942f351985496f13355e15bb6911a2984ef0 in repository, skipping Warning: couldn't find image digest sha256:0a39d1fa1509f0885a9a990a3515f42b69c59e9cef87fe295f0cc5cc82bead8b in repository, skipping Warning: couldn't find image digest sha256:890a89fca7d0507d484f96d1f742a87d84f180331f149dd491136256e768f5cb in repository, skipping deleting package id: 197541388 digest:sha256:07fd4058a1c233662ebdbc6c5bbd008fde13e1e9494451197ec96811dbf8e1f3

https://github.com/ManiMatter/decluttarr/actions/runs/9085371392/job/24968536043

rohanmars commented 6 months ago

thats great. yes I added those warnings, you have some corrupt image manifests, but all good. It just skips over them. Any way to tell if the action keep the most recent tags?

ManiMatter commented 6 months ago

number of multi architecture images deleted = 70 total number of images deleted = 357

nice one! 🥇

From what I see, the following are not removed, which looks good. I see that the next version v1.35.0 is removed, so everything looks right.

  1. latest/v1.42.0
  2. v1.41.2
  3. v1.41.1
  4. v1.41.0
  5. v1.40.0
  6. v1.39.0
  7. v1.38.0
  8. v1.37.0
  9. v1.36.0
  10. v1.35.1

Job: https://github.com/ManiMatter/decluttarr/actions/runs/9085371392/job/24968536043

Images: https://github.com/ManiMatter/decluttarr/pkgs/container/decluttarr/versions?filters%5Bversion_type%5D=tagged

On the warnings regarding the corrupt image manifests, would these remain there after the job has run? Or asking differently, will these never be cleaned up? If there was a way to kill those automatically, that'd be great (else the user sees the same message over and over again)

Warning: couldn't find image digest sha256:b25c425cb7b29392a93342a9f9b2d3f3655bf1bba0626d4949ccb29d9a2f2d94 in repository, skipping

rohanmars commented 6 months ago

That's great to hear it's all calculating the cleanup correctly. I would expect those warnings to be on only that job. The image manifests which have corruptions are deleted on that run so it won't loop through that manifest file again in a future job. When new tags come up for deletions on new jobs it would asses those manifests then for processing, which would be new errors if there are any.

I'm making a few minor changes (for stats output) and then I'll release 1.0.3 later today and retag v1

Thanks so much for the feature request and testing. It made me fix a bunch of things with the keep-n scenarios.

ManiMatter commented 6 months ago

On the warnings regarding the corrupt image manifests, would these remain there after the job has run? Or asking differently, will these never be cleaned up? If there was a way to kill those automatically, that'd be great (else the user sees the same message over and over again)

I think I can answer my own question.

Here is what's happening. This github action doesn't properly work with multi-arch images, and kills the untagged images belonging to multi-arch images.

Here's an example: https://github.com/ManiMatter/decluttarr/pkgs/container/decluttarr/195679913

This image has 3 architectures, but they were deleted by the github action. Multi Arch Package: sha256:502b0b859a36e03643941232530b4a6ccf3a8a2bca1cb90700207f83b2b784c3

Your script correctly says that the 3 latter ones can't be found (no wonder, they were deleted). This error goes away, if the parent package gets deleted (which it will, if by the settings it is in scope of deletion, for instance as in my case when it's older then 10 versions).

I would have a suggestion though here: Why not add another setting "remove-broken-multi-arch-images" (or something along these lines, or just do the following additional cleanup without even having an explicit setting for it)? ->Remove any multi arch package where the underlying manifests are not found anymore. My thinking: A multi-arch package that points nowhere is useless... Any my take is that many people switching over from the other github action to yours will probably suffer from that problem.

What do you think?

Update: Just seen that you posted this while I wrote:

The image manifests which have corruptions are deleted on that run so it won't loop through that manifest file again in a future job.

Does this mean that you have already baked the removal in (without user having to specifically put a setting), or you are removing them "by chance" because they may fall into scope (e.g. because they are older than 10 versions)?

rohanmars commented 6 months ago

Yeah, that's exactly what is likely happening. It's the main reason I wrote this version, none of the existing cleanup actions (even dedicated container ones supports multiarch images) which corrupted some of my images so I disabled the actions.

That's a interesting idea to proactively validate the image manifest. I almost think that should be a different standalone mode producing a report. As multi arch image may be corrupt but still work for a given platform, if that underlying image still exists the pull would likely work on a given host. In your case it looked like all the underlying images were gone. This issue did make me think of a scenario where maybe multiple multi-architecture manifests may point to the same image digest. I need to walk through the code on that potential.

I was thinking this morning another standalone mode could be a "recovery" mode whereby you could specify the package id's to recover any packages that were accidentally deleted from the action directly (unrelated to the issue above). Otherwise you have to jig some web api call manually to recover them.

Does this mean that you have already baked the removal in (without user having to specifically put a setting), or you are removing them "by chance" because they may fall into scope (e.g. because they are older than 10 versions)?

Yes, that's correct for the images that fall into scope on each run. But your idea would proactively valid the 10 other versions in your scenario, but it probably shouldn't touch them.

ManiMatter commented 6 months ago

you're right, there could be multi-arch containers that miss some manifests, but not others.

Would this be a valid approach?

        with:
          keep-n-tagged: 10
          exclude-tags: dev
          dry-run: true
          token: ${{ secrets.GITHUB_TOKEN }}
          partial-manifest-removal: true

1) By default, all multi-arch containers are removed where all underlying manifests are invalid (no setting is needed for that, imo should be default behavior of this action) 2) If partial-manifest-removal is true, also multi-arch containers are removed, where only some manifests are invalid These two rules are executed before any other rules (for instance, selecting the keep-n-tagged)

In the part where the images are deleted, I would then not show the warning that the respective manifest is missing (simply skip that)

Thoughts?

rohanmars commented 6 months ago

Good suggestions. It agree removing the warning for in scope deletions is probably a good idea. I'll make that change.

remove multi arch containers that do not have a single valid underlying manifest (e.g. if i have 15 images, and nr. 9 is completly broken, it would take 1-8, 10&11, rather than 1-10 when selecting 10 images to keep)

That's a good idea for fully broken images in scope. Out of scope is a bit more problematic for the other cleanup modes (not keep-n-tagged) Let me review the code.

spit out a warning for multi arch containers that have partially missing manifests (but some are valid), and keep the container

For out of scope processing that might be more challenging as the current different modes: delete by untagged, delete by tag and keep-n-untagged operate a bit different to the keep-n-tagged. In my case I'm currently using the delete untagged (default), and expect the total number of images to grow without ever deleting released images. Maybe a "verify" option could be used to enable out of scope scanning/reporting, which then could be turned off when the images 'healthy'

You are probably right there are a lot of broken manifests out there in ghcr.io considering the lack of multi-arch support, so this kind of functionality would be useful for projects that switch to this action.

rohanmars commented 6 months ago

Let me think about your latest suggestion more, I'll get back to you. As long as it can support all the cleanup modes I agree some better validation should be used.

ManiMatter commented 6 months ago

just wanted to again thank you - fab work. really enjoing the exchange with you here on this :)

rohanmars commented 6 months ago

thanks for being my first external user/tester. it's validated that it's currently needed.

ManiMatter commented 5 months ago

I've seen this new option:

Ghost Images

Multi architecture images which have no underlying platform packages are automatically removed for the keep-n-untagged and keep-n-tagged modes and not include in their count. Partially corrupt images are not removed by default, use the validate option to be able to identify and then fix them.

with "fix them", does this mean it will remove those partially broken platform packages as we discussed above with the "partial-manifest-removal: true" flag? Or does this simply mean that the logs spit out which packages are broken, so that I can then go in and fix them myself?

Many thanks for clarifying :)

rohanmars commented 5 months ago

Hey I haven't yet implemented a partial-manifest-removal mode, I feel like it pulls into scope images that you may not want to be deleted (especially for the other non- keep-n-tagged modes) (but it still may be a valid use case). So currently you would have to then go and put the invalid images in for deletion on a next workflow run after running the validate mode.

I did add a few extra features:

I planning to build some automated tests next, it's quite a manual process to test all these scenarios. Once that is done then there is other options which could be added: deleting by date, by download count, package restoration etc

ManiMatter commented 5 months ago

I'm impressed by how many features you have already added to this action in so little time!

Based on your this statement, my understanding is that packages that are fully broken (no working underlying manifests) are removed already.

Removal of ghost images in scope for (keep-n-tagged/keep-n-untagged modes. So it doesn't include them in the count and they get auto removed

However, I am not sure I understand how the partial-manifest-removal mode may conflict with the different action modes.

I feel like it pulls into scope images that you may not want to be deleted (especially for the other non- keep-n-tagged modes)

I understand there are essentially 4 modes. 1) Remove all untagged and keep all tagged 2) Remove n untagged and keep all tagged 3) Remove all untagged an keep n tagged 4) Remove specific tags

In all options, I would expect that if partial-manifest-removal is on, packages that are partially broken would be removed in any case. Meaning: Even if they are listed as "excluded" or if there are 10 untagged items of which one is partially broken and we say keep 100, it would still reduce the number to 9. E.g., the "fixing" should supersede everything.

I was thinking first whether something that is "excluded" should also be excluded from partial-manifest-removal. However, if we clearly document that partial-manifest-removal takes priority over exclusion, then I think it's fine that partial-manifest-removal supersedes. If we were to give priority to exclusion over partial-manifest-removal, it would start getting tricky because of wildcard support. If somebody is not OK with exclusion being ignored for partial-manifest-removal, then I guess the user should not use partial-manifest-removal, but instead use the "validate" option and remove these items manually.

What do you think?

I am probably missing an important aspect that you have in mind, when cautioning that the partial-manifest-removal may conflict with the other modes. Could you elaborate please? Maybe we can brainstorm it out together :)

Cheers

rohanmars commented 5 months ago

I currently have coded that exclusion takes priority over everything, and it currently supports wildcards. I have a pre processing step that process all the tags for exclusion. I think that should stay that way, while ironically it's a cleanup action it should take a super cautious approach to deletion and cleanup.

Your point about partial-manifest-removal in all cases for the other modes is where it gets tricky. As from a user perspective it kind of implies including into scope all images as you point out. But the untagged and tags modes currently leave all out of scope images alone. So it might lead to more user error unless there is more general approach about deleting.

I would propose changing the partial-manifest-removal to a delete-partial-manifests. I was thinking yesterday that there may be a number of delete- options which are additive. Like delete-older-then, delete-downloads-less-then and delete-partial-manifests, which get processed before the keep and tags modes. In this setup strangely mode 1 is really an explicit delete-untagged option, might might be default option.

So the core logic would be to process excludes, process delete options, then process the keep options.

Which then got me thinking the project needs automated testing asap. So that these other functions can be added easily and without breaking all the other modes. haha

rohanmars commented 5 months ago

makes me think that maybe the tags option should really be delete-tags instead to be clearer and in line with the other delete- options

ManiMatter commented 5 months ago

agree to all you said :)

typescript / github actions etc are not my forte at all, and i'm actually studying how you are coding it, and am learning a lot from it. thank you

rohanmars commented 3 months ago

Hi ManiMatter I've finally added a delete-partial-images option to the action (and a delete-ghost-images) Both of these options are set to false by default (previous the ghost image support was built in (and true). If delete-partial-images is set it will delete an images which are not complete before it process any other rules. I'm still testing and plan to put in a test case (or 2) before I release it. R.

ManiMatter commented 3 months ago

Great news! I‘m excited to test it. Hope to find time tmrw to do so!

thanks so much for your work and for considering this feature suggestion!

rohanmars commented 3 months ago

I've also removed the delete untagged mode from the keep-n-tagged, so you'll need to add --delete-untagged to your setup if you want to delete untagged images. I've changed the setup so no you can have all the modes --keep-n-tagged --keep-n-untagged --delete-untagged --delete-tags now at the same time.

ManiMatter commented 3 months ago

thanks for the info!

Maybe stupid question: how can I test this action before you will have released it?

rohanmars commented 3 months ago

You can reference the main branch like this in your action setup, which will allow you test before release.

uses: dataaxiom/ghcr-cleanup-action@main

ManiMatter commented 3 months ago

I've let it run on "dry-run", looks promising

cleanup statistics: multi architecture images deleted = 111 total images deleted = 570

I'm glad I won't have to delete those manually.

You mentioned that you want to add a few more checks and features before releasing v1.0.8. Let me know when you think you are ready, I will then 1) let it run in dry-run again and see if I get above number still 2) let it run for real and let you know if the outcome looks right (a.k.a me volunteering my app as guinea pig)

rohanmars commented 2 months ago

I've released v1.0.8/v1 ff99a6e, so closing this issue.

BIG thanks for your feedback/help.

ManiMatter commented 2 months ago

.. thanks so much for your help, I let it run productively... so happy I didn't have to do it manually.

One observation: The Dry-run number and the actual deletes differed. Could it be that dry-run double counts certain things? Actual stats; multi architecture images deleted = 104 total images deleted = 503

Dry-run; multi architecture images deleted = 111 total images deleted = 570

rohanmars commented 2 months ago

That's very weird. I can t see how looking at the code, so I'll need to run some tests and analyze more Are you 100% sure that there was no change in the package repository between these runs?

ManiMatter commented 2 months ago

I changed from the @main to @v1.0.8, remove dry-run and pushed the change, which would (based on my git flow) create 4 new containers (multi arch + 3 versions) and then clean up the packages.

so if anything, I‘d have expected the nr of packages go up (by 4) rather than down.

the only other 2 explanations I can think of is that what I tested a few days back (@main) and what i ran now (v1.0.8) are different, or that the dry run has some sort of double counts…

not a big issue though: the result looks correct after the cleanup

thanks again for your great work on this

rohanmars commented 2 months ago

I think the reduced count is likely as you commented out the keep-n-tagged, which then switched to cleaning up only untagged images. So that didn't delete any tagged images (excepted 2 it looks like which were ghost images). https://github.com/ManiMatter/decluttarr/commit/7ae0ed513472b3ad878b11c5bc7c5e803ce93670

ManiMatter commented 2 months ago

arghhhhhhhhhhh You are 100% right. Sorry for wasting your time on the false alert

rohanmars commented 2 months ago

All good. It was good to verify. Thanks for all your help from the beginning. I think the 1.0.8 version is pretty solid, we'll see, haha.

ManiMatter commented 2 months ago

Yes, it seems to work like a charm. The only feature I‘m sad about is the one that considers #downloads, ie what we dont get on the API by github.

did you raise a feature request on their end by any chance?

rohanmars commented 2 months ago

Ah, downloads, that's true. I noticed in your repo you have some images in the 5k mark, so it will become and issue. I've yet to look to see if there is an existing issue raised with GitHub. Seems strange that the info isn't available through the API. I saw some posts from people who scrap the webpage, but that that's not a good idea for this action I feel.

ManiMatter commented 2 months ago

Yeah if we can avoid scrapping we should at all cost; would make the solution brittle and if scrapping incorrectly (and there are no safety measure to detect that) may lead to involuntary deletion.

Threshold-wise I would even set it lower in my case, 500 or 1000.. guess that should be parametrizable. That‘s a detail though, and irrelevant if we can‘t get hold of the download stats in the first place

If we were able to solve it, I was thinking of using it in combination with keep-n-tagged. Exclude anything > download threshold and from the rest keep the latest 10 versions.