goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
23.98k stars 4.75k forks source link

Garbage collection does not respect Retention Policy #17230

Open dioguerra opened 2 years ago

dioguerra commented 2 years ago

Garbage collection with 'Allow garbage collection on untagged artifacts' enabled does not respect repository set Tag Retention policy.

Expected behavior and actual behavior: When running the GB with 'Allow garbage collection on untagged artifacts' enabled, untagged artifact should be cleaned but respecting the tag retention policy of the project they belong to

Steps to reproduce the problem: Create a repository with onetag only image with multiple overwrites on it so that there are untagged images. image

Set up a Tag Retention policy rule to keep the most recently pushed n artifacts including untagged artifacts. image

Now, Activate 'Allow garbage collection on untagged artifacts' on Garbage Collector and do a 'GC NOW': image

In the end, and despite the Tag Retention policy, notice that the untagged artifact is gone: image

OPTION: Maybe the 'Allow garbage collection on untagged artifacts' is instead 'FORCE garbage collection on untagged artifacts'???

Versions: Version v2.4.1-c4b06d79

wy65701436 commented 4 weeks ago

The current behavior is by design. GC cannot determine whether an untagged artifact will be retained by any policy. It would be a good idea to update the claim to FORCE and inform users of this intentional behavior.

If we want GC to be able to check the tag status, it would need to execute the retention policies for each project individually to gather results, and then allow the sweeper to decide whether to remove the untagged artifact. This would fundamentally disrupt the current GC design and significantly complicate the logic, and take more time & resource for GC execution.