distribution / distribution

The toolkit to pack, ship, store, and deliver container content
https://distribution.github.io/distribution
Apache License 2.0
8.86k stars 2.46k forks source link

HA setup and running GC on all three nodes (GC enabled by default) #3937

Closed MindTooth closed 1 year ago

MindTooth commented 1 year ago

Hi,

We are running registry in a HA setup using Anycast and three nodes. I tried to understand Load balancing considerations. But after some problems using S3 as a storage driver, I started to think whether GC should be enabled on all three or this could cause conflicts and only one node should run GC.

What are your thoughts on this? If desired with only one node running GC, I can create a PR to update the docs.

deleteriousEffect commented 1 year ago

@MindTooth GC is a separate process from the registry server. Garbage collection isn't something that a running registry instance will do periodically.

Garbage collection as implemented isn't something that can be ran concurrently at all, only one garbage collection process should be running against a given storage backend at a time.

MindTooth commented 1 year ago

So, if I understand correctly: running three instances sharing a HTTP-key should not pose an issue. And the GC feature is an opt-in using the same binary against a read-only registry?

On a side note, GC should be ran from time to time I guess? And, the registry needs to be put in read-only?

Thanks for the quick reply.

deleteriousEffect commented 1 year ago

running three instances sharing a HTTP-key should not pose an issue. And the GC feature is an opt-in using the same binary against a read-only registry?

@MindTooth Don't think about running GC against a registry, it's more accurate to think about running the GC command against the underlying storage. The GC process needs exclusive write access to that underlying storage, you can't, for instance, switch to read-only mode for one of the registry instances and run the GC on that node — the other nodes writing to that underlying storage bucket will result in data inconsistency.

GC should be ran from time to time I guess?

There's no perfect cadence. Generally, the more data that's written between GC runs, the longer those runs will take. However, we have to mark all the images to keep, so even if you were to run two GC commands back-to-back, the second command would likely take a comparable amount of time to complete.

If you want the most out of any particular GC run, make sure that unneeded images are being deleted and/or untagged, so that the most storage possible can be removed.

MindTooth commented 1 year ago

Thanks again for the long explanation. 😁 I think I start to get it more. My understanding was more in the realm that GC was part of the same flow as the registry. But it now makes more sense why the registry needs to be in a read-only mode.

Will a GC run clean some even if we don’t manually mark it? In an attempt to not get out of hand.

This API removes references to the target and makes them eligible for garbage collection. It also makes them unable to be read via the API.

Does this mean that if ever an item is deleted, we need to run a GC for it to be available again? If one were to download same image in the future without a GC, it won’t be able to pull as the item is on disk, but marked for deletion, or something.

Sorry that I ask many questions. It’s just that I did some GC runs against the live registry (three nodes against S3) and stuff broke. Trying to understand why. 😅


Going forward and more specifically to run garbage collection correctly:

  1. I need to shut down all registries that writes to the same data storage
  2. Start the one binary in garbage collection mode, when finished:
  3. Start the registries back up

This is at least what I’ve got so far as to understand this.