Closed IvanDev closed 2 months ago
Thanks for the contribution! I agree we should have some kind of garbage collection to remove dangling layers. However, I might have some concerns with having garbage collection done by a DO alarm or asynchronously.
AFAIU, there is a small chance that there is a new manifest uploaded while the garbage collection is happening and new layers might be removed. I think this is just something we will have to accept for this in some way to keep the registry implementation simple. If we go with this route, users of the registry that use garbage collection really need to know that it comes with some risks.
I am going to propose some things and let me know what you think (just throwing some stuff here for discussion):
IMO I'd go for the GC endpoint and just let anybody that needs GC to explicitly call it.
If we are feeling fancy with DOs we could do these passes in there, but still with workers unbounded plans + ctx.waitUntil() we can extend computation time already by a bit.
Hope you find this overall feedback fair! I am open to have GC just wanting to see your point of view on these 😄.
Here are my 2 bits:
I agree with you on your concerns and I'm too a bit afraid of over-engineering :) I would prefer to keep the implementation simple and introduce complexity only when we really need it.
I like the date change. I'd suggest we remove the asynchronous garbage collection if we have the endpoint. And in that case I do not think DO is necessary (trying to stay within R2 as much as possible here).
Sure, let's keep it safe. Had to remove date checks since it's useless now. Let's keep users responsible for their own actions :)
I think you will have to run the formatter on this PR and the other https://github.com/cloudflare/serverless-registry/pull/30/files
Sure!
upvote on this
Hello folks! Due to garbage collection popular demand I have cherry picked @IvanDev's commit for rebase and add a bit of concurrency safety that I thought could be nice. Thank you for the feedback. https://github.com/cloudflare/serverless-registry/pull/48
Cherry picked commit from this PR has been merged here https://github.com/cloudflare/serverless-registry/pull/48. Thank you @IvanDev for the implementation and the discussion!
This PR introduces a garbage collector. When we remove an image or tag from the repository, blobs referenced by deleted manifests are not removed from R2. With this new PR, we'll schedule garbage collecting after any modifying operations. GC will wait 10 minutes after any modifications are done to the repository this ensures we'll not start garbage collecting of freshly uploaded blobs without parenting manifest.
We have 2 modes for the garbage collector, unreferenced and untagged:
Users can skip the GARBAGE_COLLECTOR_MODE variable which will disable GC.
Some considerations: