Open dmakeienko opened 1 month ago
Hi @dmakeienko ,
Same issue, Harbor reports a global size of 4 giga while my s3 bucket is 170 giga.
The issue is that, if for any reason GC will fail, database and registry can be in desync (database state != registry storage state), subsequent GC won't fix it because gc takes database as the only source of truth. Maybe something like Extensive GC
could be implemented where it would scan all files in the registry backend and make it sync with the database?
A workaround would be to replicate all images to another harbor instance then remove all projects contents from it, purge registry storage and replicate back
Yes, that is exactly what I did. Replication is the only way to do it now. Few problems that I've encountered during this is that after my "manual GC", some images became broken and couldn't be replicated by pull or push rules. BUT in some cases I was able to pull that "broken" image to local machine and push into another harbor. I believe, it is because I had some of the manifests/layers cached locally
My question is related to https://github.com/goharbor/harbor/issues/20598 (which I created). We found out that in 99% harbor didn't clean up blobs fro S3 while GC shows in UI that everything is OK, X amount of space freed. So, I tried to delete blobs manually: I ran GC in dry-run mode, get blobs's sha256 digest and deleted it from bucket. However, while doing so, I've discovered next issue: removed blobs affected images and manifests, docker pull/build shows following error
When I looked into S3 for that manifest, I found that it is present, but the
link
file appeared to be empty. So, how exactly can I delete data from S3 according to GC logs? And ss there any safe way to delete old data from harbor storage, that harbor doesn't even know about?