NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.8k stars 1.52k forks source link

Allow some derivations to hardlink to other files in the store #1272

Open Ekleog opened 7 years ago

Ekleog commented 7 years ago

Context

I am currently writing a nixos module that allows to easily generate VMs, and need a way to pass to the guest its store and only it (not giving it full access to the store so that he cannot see secrets that could be in there).

I could have gone with mount --bind, as is done for derivation building, but making this a permanent choice with ~1k bind-mounts per VM seems really unsustainable.

So I chose to generate the VM's store in a derivation, and to give this derivation to the guest as though it was its store (this being the less bad of the ways I could think of doing it).

Issue

In order to do this I'd have liked to just hardlink the required derivations, instead of copying everything and waiting for nix-store --optimize to come and remove the copies and replace them with hardlinks that I could have done from the beginning.

This would reduce disk dereliction and a lot less time would be spent copying things that will anyways be hardlinked later.

However, derivation building seems to happen in an environment where its buildInputs are mount --bind, which means hardlinks are impossible as the vfs driver doesn't recognize they are on the same underlying filesystem.

Proposed solution

Add a derivation option that requests direct access to /nix/store, not through a mount --bind "sandbox" (I tried both with nix.useSandbox = true; and nix.useSandbox = false;, and it seems to happen anyway, so I guess that's not what's called sandbox in nix vernacular).

What do you think about this? Is it too narrow a use case to deserve such a change?

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

Ekleog commented 3 years ago

Still important

thufschmitt commented 3 years ago

Might be a wrong trail, but at a glance it looks like https://github.com/grahamc/netboot.nix is solving a pretty similar issue using recursive-nix. Might be worth a try

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

VanCoding commented 1 year ago

I think it would be really good to have this feature. Sometimes we want to copy files from one derivation to a dependent derivation. At this point, we already know that the content will be completely identical and we could just make a hardlink right there instead of copying first and then wait for nix to figure this out by itself.

One use-case that I currently have in mind is building something like a "nix-native" version of PNPM. PNPM makes heavy use of hardlinks to be fast and disk-space efficient. In Node.js, you often need the same package, but wired up with different versions of its dependencies, and the only way to not copy the whole package is to hardlink all of its files. Symlinks won't work for this.

I have 2 additional ideas how we could achieve hardlinking support: 1) Nix could provide a special hard-link command that could be called to create hardlinks 2) We could write all hardlinks that we want to be created as a list to a file inside the derivation. After the derivation is built, nix looks at that file and then creates these hardlinks on its own. Or maybe there's some way to pass this list to nix directly, without writing it into a file.

Both of these approaches would have the benefit of nix knowing exactly which files of a derivation hardlink to other derivations. This could be useful information when copying around store paths (but maybe this is already handled very well).

What do you think?

Dessix commented 1 year ago

Wait, Nix doesn't do this on its own already? Is there some other pattern we're supposed to be using for the moment, like symlink-trees to parent derivations? I assumed derivations that add or remove one file were effectively just UnionFS-like projections, to allow for lightweight dependent derivations.

flokli commented 11 months ago

I don't think this is actual a beneficial feature to have. Whether something is using the same inode or not is an implementation detail of the filesystem, and --optimize doing hardlinks an implementation detail as well.

Even without it being different inodes, you filesystem might already have decided to deduplicate the underlying data internally (--reflink style).

Inside the build, you shouldn't have any assumptions about being on the same filesystem as your other store paths, and during substitution, you don't have a way to signal this points to data similar to somewhere else either.

I'd leave this up to the nix store layer, it could do some deduplication post-build, but I would not expose / use more builder sandbox internals.

VanCoding commented 11 months ago

@flokli I agree that it may not be a good idea to allow creating actual links, because in the build we should not make any assumptions about how this all is going to be stored.

But it could still be beneficial to be able to tell nix "hey, I want to put a file here that's exactly the same as the file from this other derivation". Then the store layer could use this information to improve performance upfront, because it could save on hashing or unnecessary copying and comparing the contents.

flokli commented 11 months ago

Then the store layer could use this information to improve performance upfront, because it could save on hashing or unnecessary copying and comparing the contents.

I don't think it matters. The build exposes a filesystem that the build process can write to, post-build we must feed all contents in the right order into sha256 to calculate the narhash, so we need to traverse all contents anyways.

If you're copying files from another store path and make it easy for the filesystem to deduplicate, best you can do is probably copy with cp --reflink=auto - that should perform a lightweight copy if it's the same filesystem, and if the filesystem supports it, but falls back to a regular copy otherwise.

VanCoding commented 11 months ago

@flokl I see... but for calculating the sha256 it's only required to read the file, and not write it. But yeah, having to re-hash the files is not optimal. In theory, if the files that are being linked are all known upfront, before the build of the derivation even starts, it'd be sufficient to feed a list of their paths into the narhash, no?

I really see that for a lot of scenarios it would be better to solve this outside of nix, but for some scenarios like a PNPM-like package manager that uses the nix-store, it could be useful. At least if you we don't want to tell everybody which filesystem or store-layer to use.

flokli commented 11 months ago

There's no primitive to copy things around that is not a build - other than maybe builtins.filterSource, though that's another usecase and doesn't allow moving things.

For everything that is a build, the opportunistic relink copy seems the least annoying method, and requires no changes in Nix itself.