[DO NOT MERGE] Use OCI store as blob cache

qweeah commented 1 year ago

Instead of caching based on the fully qualified reference of an artifact, this PR provides another implementation of caching:

Remote registries are cached behind a local OCI layout proxy. Anything pulled from the registry will be cached into the proxy.
Manifest and layer blobs are persistently cached in the OCI layout proxy.
Tag resolutions are cached in-memory (TODO: add expiration).
Blob can be mounted across repositories.

To make the exmaples runnable, this PR renames some of the dependent charts so it cannot be merged.

vsoch commented 1 year ago

@qweeah this looks really interesting! I'm not familiar with stores - can you walk me through the differences, pros and cons for each approach?

qweeah commented 1 year ago

@qweeah this looks really interesting! I'm not familiar with stores - can you walk me through the differences, pros and cons for each approach?

Sure. This PR utilizes the oci.Store provided via oras-go, which is an implementation of OCI layout spec.

The main improvement is on network traffic: In your implementation, files (layer blob with org.opencontainers.image.title) are cached based on the reference, which is a combination of names for registry, repository and tag. If two files in different artifacts have same content, they will still be fetched from remote registry twice, causing extra network overhead. Implementation in this PR utilize an OCI store to do content deduplication, blobs with same digest are guaranteed to be fetched only once from the remote registry.

~~Still, there are drawbacks, the files need to be unzipped from blob every time when it's requested.~~ Still, there are drawbacks, the files need to be copied from the OCI store every time when it's requested.

Will add some examples to make my point clearer.

vsoch commented 1 year ago

If two files in different artifacts have same content, they will still be fetched from remote registry twice, causing extra network overhead.

What is the real chance of that? Also with the store - are we then mounting them the same file? That would have issue if, for example, you wanted to change one (but not the other) or if it's just crossing a user namespace.

Still, there are drawbacks, the files need to be unzipped from blob every time when it's requested.

Hmm and that doesn't seem ideal.

@sajayantony what do you think?

qweeah commented 1 year ago

What is the real chance of that?

Well, considering how container images are built with layer blobs shared, the chances are high (if we don't want to exclude caching image artifacts from the scope this project).

Also with the store - are we then mounting them the same file? That would have issue if, for example, you wanted to change one (but not the other) or if it's just crossing a user namespace.

OCI store acts like a read-only cache, cache user provides digest and get what they want from the cache and don't need to modify files within the OCI store.

Hmm and that doesn't seem ideal.

Yes, but it's just copying the file and I guess it unavoidable if user wante to change their own copy but not effecting others.

vsoch commented 1 year ago

OCI store acts like a read-only cache, cache user provides digest and get what they want from the cache and don't need to modify files within the OCI store.

So this would only support read only?

qweeah commented 1 year ago

OCI store acts like a read-only cache, cache user provides digest and get what they want from the cache and don't need to modify files within the OCI store.

So this would only support read only?

OCI store is readonly and not accessible to user, what is mounted to the consumer pod is copied from the readonly store and safe to be modified.

vsoch commented 1 year ago

Gotcha. I can't say I have a strong opinion about this change. I don't like needing to untar archives every time, and I don't like the added complexity, but I see what you are saying about having a sort of shared cache. That shared cache would still need to be unique unique namespace, so I'm not sure it would give us that much.

My preference tends to be to keep things simple until more complexity is afforded. Let's wait to hear what @sajayantony has to say! I do really appreciate that you opened this and started the discussion - let's make sure we decide on a good path forward!

qweeah commented 1 year ago

@vsoch Just to clarify, by the files need to be unzipped from blob every time when it's requested I mean the work of copying the blob from the OCI store to artifact root (see related code here)

I didn't read through the code of pullBlob and I thought it was doing unzip for the blob.

vsoch commented 1 year ago

I didn't read through the code of pullBlob and I thought it was doing unzip for the blob.

If a blob isn't an archive it shouldn't need to be unzipped? But maybe I'm missing a detail.

qweeah commented 1 year ago

If a blob isn't an archive it shouldn't need to be unzipped? But maybe I'm missing a detail.

I didn't read through the code of pullBlob and I thought it was doing unzip for the blob. I know I was wrong now.

In you implementation: 1) fetched the manifest; 2) readout manifest content, for all named blobs: 2.* fetch and save to artifact root

I used to think 2.* contains unzip, but turn out it's just simple read and save with verification.

Just to clarify that there is no unzip in both implementations. Updated my original comment to fill the gap.

converged-computing / oras-csi

[DO NOT MERGE] Use OCI store as blob cache #27