NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.74k stars 1.52k forks source link

OCI registry as binary cache #8400

Open akiroz opened 1 year ago

akiroz commented 1 year ago

I currently host an OCI registry for containers but the OCI registry API is basically a generic blob store.

It would be nice if I could use my existing OCI registry as a nix binary cache as well.

As far as I know, nix currently supports using SSH or S3 as binary caches so I wonder if OCI registries would be considered for nix binary caching as well. As in, if I submit a PR for supporting OCI binary caching, would it be considered for merging?

thufschmitt commented 1 year ago

As in, if I submit a PR for supporting OCI binary caching, would it be considered for merging?

YES (at least from my end).

Getting OCI registry support is an old dream of mine, amongst other things because it would make the caching situation incredibly better for GitHub runners (you'd get a very fast binary cache nearly for free for open-source project) and a bunch of other cloud scenarios

YorikSar commented 1 year ago

I think this would be a great idea (because I also had it)! With OCI registry we could:

By the way, there is https://oras.land that seems to provide quite flexible CLI for storing random stuff in OCI registries.

Ericson2314 commented 1 year ago

FWIW https://github.com/theupdateframework/taps/pull/156, which is associated with a GSOC I am about to mentor, OCI registries also came up.

akiroz commented 1 year ago

After looking around the codebase, it seems like using OCI registries as a proper nix store is more suitable than a binary cache since OCI blobs are accessed by SHA256 hash but the cache interface is only exposed to the opaque path.

Ericson2314 commented 1 year ago

@akiroz part of https://github.com/NixOS/rfcs/pull/133 is that I want content addressing stores that need not be aware of Nix.

I wanted it for Git, but there is no reason it can't work for flat files and nars too (the remote side just sees a nar as yet another flat file).

I would comment in that use-case over there so you can in fact use OCI blobs. It will also remind me to return to that RFC! :)

akiroz commented 1 year ago

Okay, I see 2 ways of going about this now.

  1. Easy one of simply using the OCI registry as a generic blob store through the existing BinaryCacheStore interface
  2. Smart one of leveraging the OCI registry as a CAS by putting path whole closures in the manifests with a completely new Store implementation.

Since there's already an RFC for generic CAS-backed stores I think I'll try to go for the first option in the meantime and see how the RFC develops before trying anything too clever.

YorikSar commented 1 year ago

using the OCI registry as a generic blob store

I don't think many registries will allow storage of "orphaned" blobs. I'd expect them to require a manifest to keep blob alive.

andreabedini commented 1 year ago

I'd love to see this idea pushed further. We had a chat with @thufschmitt and @YorikSar few week ago and I wrote a poc script. Let me share the whole brain-dump :-)

Initially, I wasn't really sure how to map Nix's concepts to OCI's. I think my latest approach (and what @YorikSar describes above) was to map paths (i.e. the output of nix-store --dump) into layers and closures (generically) into images, defined with a OCI manfest.

An image would be a closed collection of paths (perhaps in topological order?) and could include all the metadata we like (derivation path, system? ...). Layers within the image can be also annotated (e.g. with they path, list of references, signatures, ...). Note that the annotation on the layers are indepented from the blob being content addressed, since they only appear in an image manifest.

We can then tag the image with its input-addressed hash so we can find it again. You can have multiple tags per image, corresponding to different derivations leading to the same closure.

I expect a registry to share and dedup identical layers (since they are content addressed) but I am not sure if this is per-repository or per-egistry (The terminology seems to be registry/namespace/image:tag with registry/namespace/image also referred to as image repository).

Nix would never use a path without its closure so fetching a image sounds like the right thing to do. Layers can be GC'ed when no image refers to them anymore (i.e. when they are not part of any closure). Adding native support to Nix might require a bit of code changes because, again IIRC, fetching paths from a cache seems to be path-by-path, while in this way one it would be by closure.

If we point nix to a registry/namespace, the path /nix/store/hbr5j4sd0rrdlh9hybzsiglvnf2j80la-nix-2.13.3 would be uploaded as registry/namespace/nix-2.13.3:hbr5j4sd0rrdlh9hybzsiglvnf2j80la.

The script linked above uses the different convention for the destination registry/namespace/hbr5j4sd0rrdlh9hybzsiglvnf2j80la-nix:2.13.3 (which seems to be less optimal since every different derivation will map to a different image and doesn't make use of the derivation name which is a well defined concept in Nix).

Lastly, I don't think this necessarily needs to build on top of CA support in Nix. That would allow to fetch images by their hash, but fetching them by their input-address is also fine.

akiroz commented 1 year ago

I expect a registry to share and dedup identical layers (since they are content addressed) but I am not sure if this is per-repository or per-egistry (The terminology seems to be registry/namespace/image:tag with registry/namespace/image also referred to as image repository).

I think using an OCI repository as the whole store makes more sense on an operational standpoint since registry access control is on a per-repository basis and some registry providers have per-registry billing / registry count limits.

andreabedini commented 1 year ago

I think using an OCI repository as the whole store makes more sense on an operational standpoint since registry access control is on a per-repository basis and some registry providers have per-registry billing / registry count limits.

:thinking: but in that case the image is fixed and you cannot use names right?

/nix/store/hbr5j4sd0rrdlh9hybzsiglvnf2j80la-nix-2.13.3 would be pushed to registry/namespace/image:hbr5j4sd0rrdlh9hybzsiglvnf2j80la-nix-2.13.3.

TBH the derivation names sometimes don't really convey any information (e.g /nix/store/*-source), so perhaps it's ok to ignore them.

akiroz commented 1 year ago

🤔 but in that case the image is fixed and you cannot use names right?

Yeah, basically every store path would be a tag under that image. In private registries you'd only have to grant permission to that one "image" instead of a huge list of permissions for every name.

/nix/store/hbr5j4sd0rrdlh9hybzsiglvnf2j80la-nix-2.13.3 would be pushed to registry/namespace/image:hbr5j4sd0rrdlh9hybzsiglvnf2j80la-nix-2.13.3.

Yep.

thufschmitt commented 1 year ago

I was recently made aware of https://github.com/linyinfeng/oranc (linked from https://github.com/zhaofengli/attic/issues/63). I didn't seriously tried it (beyond “I can get it to work”), but it's definitely worth looking at (cc @linyinfeng)