Open nzhang-zh opened 5 years ago
There are multiple ways to construct an artefact with different level of sharing. 4 approaches are considered at this point as shown below, going from least shared to most shared.
Full copy
This is how dockerTools.buildImage
creates new image layers.
When a new layer is built with packages from nixpkgs, file contents of the packages are copied to the layer's directory. The closure of the layer is then copied from host nix store to create a local nix store within the layer.
$ find -maxdepth 3
.
./lib
./lib/bash
...
./bin
./bin/bash
./bin/sh
./nix
./nix/store
./nix/store/n2hjbpkf4c0m48945ivxs3lmsczzw2rg-bash-4.4-p23
./nix/store/wcg2bimzqhqza2vz86hbf2q904pkfy85-glibc-2.27
When base image is specified, the its content is copied into a new tarball and placed in the image. The image below is built with busybox
as base image. The tarball 8a788232037eaf17794408ff3df6b922a1aedf9ef8de36afdae3ed0b0381907b.tar
is a full copy of the busybox image.
$ tree
.
├── 8a788232037eaf17794408ff3df6b922a1aedf9ef8de36afdae3ed0b0381907b
│ ├── json
│ ├── layer.tar -> ../8a788232037eaf17794408ff3df6b922a1aedf9ef8de36afdae3ed0b0381907b.tar
│ └── VERSION
├── 8a788232037eaf17794408ff3df6b922a1aedf9ef8de36afdae3ed0b0381907b.tar
├── a9b02606b84128209b6087cc97710d8f9d17c71ac694f5e7994abddd4e6e1053
│ ├── json
│ ├── layer.tar
│ └── VERSION
├── eb0b7ee2b0093056c8435cea85aed9522a31bba2ce1fb9529029d0aba35bdb4b.json
├── manifest.json
└── repositories
Shared image layers
Since images themselves are built up from layers, it would make sense to share some common base layers between images.
When pulling images or building images sharing the same layers, these layers could be stored as nix store outputs and shared between images.
However this level of sharing is limited because what goes into a layer can be arbitrary. An image layer containing coreutils
and moreutils
and another image layer with bash
and coreutils
will not have any sharing between them, as a layer is the smallest sharable unit in this case.
This is still a good way to share common layers if most of the images we use are pulled from external image registry.
Shared package layers
Further extending the idea of sharing image layers, we can take advantage of nix to maximise sharing between images, as seen in dockerTools.buildLayeredImage.
The simplest approach would be to construct a layer for each nix package in the dependency graph and we would have perfect sharing.
However, since Docker does not support infinite number of layers, layers less likely to be a cache hit should be combined. By prioritizing frequent and deep dependencies, we could derive the popularity of each dependency and have popular packages each in its own layer and merge less popular ones into a single layer.
Complete sharing
For Matrix, our main concern is to build filesystem bundles conforming to OCI runtime specification that can be executed by runc as an container. Building OCI images is not required for our use case. (But we may add the support to export OCI image at a later stage.)
With pure nix based artefacts, we could simply create symbolic links to outputs in nix store and mount host nix store into an automaton. This approach provides the highest level of sharing.
In addition, we do not need a serialised image archive to transfer between nodes. Nix derivation used to build an artefact is in fact capable of rebuilding the artefact on a different node. This is still lightweight because building an artefact is essentially creating directories of symbolic links. This does require all nodes we are deploying to to have Nix package manager available.
Alternatively, we could serialise the closure of artefacts and deliver that along with the artefact to nodes where Nix is unavailable.
runc
requires a container to be organised as a filesystem bundle shown below.
./rootfs/
config.json
./rootfs/
is container's root filesystem. This is what an artefact should directly or indirectly contain.
config.json
this is a required configuration file used by runc
. For high composability, this configuration file should be separate from an artefact.
We could directly construct rootfs
in the output of our nix derivation by mirroring a merged directory structure of all input packages with symbolic links.
For example, an artefact with bash
and tree
as input would have the following output
$ tree .
.
├── bin
│ ├── bash -> /nix/store/czx8vkrb9jdgjyz8qfksh10vrnqa723l-bash-4.4-p23/bin/bash
│ ├── sh -> /nix/store/czx8vkrb9jdgjyz8qfksh10vrnqa723l-bash-4.4-p23/bin/sh
│ └── tree -> /nix/store/bskfav26x2xify79w2kc824k3fiwyika-tree-1.7.0/bin/tree
├── lib
│ └── bash
│ ├── basename -> /nix/store/czx8vkrb9jdgjyz8qfksh10vrnqa723l-bash-4.4-p23/lib/bash/basename
│ ├── dirname -> /nix/store/czx8vkrb9jdgjyz8qfksh10vrnqa723l-bash-4.4-p23/lib/bash/dirname
│ ├── finfo -> /nix/store/czx8vkrb9jdgjyz8qfksh10vrnqa723l-bash-4.4-p23/lib/bash/finfo
│ ├── ...
└── share
└── man
└── man1
└── tree.1.gz -> /nix/store/bskfav26x2xify79w2kc824k3fiwyika-tree-1.7.0/share/man/man1/tree.1.gz
Notice how bash
, sh
and tree
(as well as bash libs) are all symbolic links to nix store which will be bind mounted by runc
.
Docker
This is the method shown by runc doc
# export busybox via Docker into the rootfs directory
docker export $(docker create busybox) | tar -C rootfs -xvf -
pkgs.dockerTools
dockerTools.exportImage {
fromImage = dockerTools.pullImage {
imageName = "busybox";
imageDigest = "sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812";
sha256 = "<hash>";
};
};
Note this outputs a tar archive.
skopeo
skopeo --override-os "linux" --override-arch "amd64" copy "docker://busybox@sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812" "oci:busybox:<tag>"
This gives us an OCI image layout that needs to be further unpacked.
$ tree .
.
├── blobs
│ └── sha256
│ ├── 90e01955edcd85dac7985b72a8374545eac617ccdddcc992b732e43cd42534af
│ ├── b4f1424d3f12ec809dc268c6c87d25d8b3869813aea92c9dfaf7429de802030e
│ └── d98834fba17e4121dc21f65d6ddf2f648da119c86d72ffea8145e496bda621fd
├── index.json
└── oci-layout
To unpack, we could
oci-image-tool unpack
(or create
), ordockerTools.runWithOverlay
, orWith pure nix based artefacts, we could simply create symbolic links to outputs in nix store and mount host nix store into an automaton. This approach provides the highest level of sharing.
Bind mounting nix store directly into the container may not work well with some applications that require write access to its own directory.
For example, by default nginx tries to write log files and temp files to /nix/store/<nginx-store-path>/
.
Nginx does support changing the default directory prefix, but this involves preparing a minimal directory structure elsewhere.
Instead, we could choose to mount host nix store with storage driver when preparing root filesystem. With OverlayFS, host nix store will be a lowerdir and application can make changes to the local nix store on the merged filesystem.
That issue exists on NixOS anyway, no Nix packages should be configured to write to /nix/store as it's always immutable. So I don't think this is unique to bind mounting in containers.
Provide capability to build OCI image based artefacts and Nix based artefacts.