NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.73k stars 1.52k forks source link

Copy local flakes to the store lazily #3121

Open edolstra opened 5 years ago

edolstra commented 5 years ago

Currently flakes are evaluated from the Nix store, so when using a local flake, it's first copied to the store. This means that

$ cd /path/to/nixpkgs
$ nix build .#hello

is a lot slower than the non-flake alternative

$ nix build -f . hello

Ideally, we would copy the flake to the store only when its outPath attribute is evaluated. However, we also need to ensure that it's not possible to access untracked files (i.e. we need to check every file against git ls-files).

nixos-discourse commented 3 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/flakes-without-git-copies-entire-tree-to-nix-store/10743/2

hmenke commented 3 years ago

Still important.

nixos-discourse commented 3 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/my-painpoints-with-flakes/9750/20

L-as commented 3 years ago

It would be nice if Nix could take advantage of the filesystem's native CoW functionality (if present) in order to speed up copying. We discussed this briefly in #offtopic:nixos.org.

nixos-discourse commented 2 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/is-it-possible-to-make-a-flake-that-has-no-source-tree/16037/2

lilyball commented 2 years ago

I just hit this. My first attempt at a workaround was to remove the self arg from outputs (as without self I can't access the source tree at all). It turns out that makes it copy the tree and then throw an error about how the outputs function doesn't take a self arg.

Lazily copying the flake only when outPath is evaluated would be ideal, but being able to just drop the self arg to suppress the copy would be a great first step.

@L-as Taking advantage of CoW would be nice but it's not doable on macOS where the Nix store lives on a separate volume (separate volume group even).

lilyball commented 2 years ago

Also for context, in my case the flake was not in a git repo, it was just in a folder. Copying to the nix store is unacceptable because the folder contains multiple git repos along with all their build artifacts. Copying a git repo to the Nix store at least would avoid copying untracked files, but in my case it had hundreds of thousands of files and multiple gigabytes of data to look at and copy.

Atemu commented 2 years ago

@lilyball @L-as Taking advantage of CoW doesn't work on Linux either due to a VFS limitation: https://github.com/NixOS/nix/issues/5513

One thing I'd like to understand in this issue is why a local flake can't be evaluated "directly" just like the old default.nix-style file evaluation.
I know copying has benefits for hermetic evaluation and such but I don't need that, like, at all.
Sure, remote flakes should be copied to the Nix store and that's really great functionality but I see no point whatsoever in doing the same for local flakes that are already in the FS and not expected to change without the user's knowledge.

TLATER commented 2 years ago

@Atemu from what I've read, it's to help enforce hermetic evaluation and avoid impurities. Presumably it also has advantages for code simplicity, because you don't need to write something separate for local flakes.

I agree it's not great UX for those of us who use flakes just to keep track of a dev shell, of course :)

Atemu commented 2 years ago

it's to help enforce hermetic evaluation and avoid impurities.

And that's great but I don't see any point in hermetic eval on local files.

it's not great UX for those of us who use flakes just to keep track of a dev shell

It's also bad UX for anyone working on nix-built projects.
Correct me if I'm wrong here but if I was I'm hacking on Nixpkgs to solve some some bug in NixOS with dirty trees (because obviously, I'm hacking), Nix copies the entire 313MiB Nixpkgs checkout to the Nix store every time I eval.
Not only does that take quite a while (even on an SSD it's multiple seconds) but it also causes unnecessary writes. After 70 Nixpkgs evals, you've exhausted the expected daily writes to an SSD. That can't be good for endurance.

Is it just me or is that insane?

TLATER commented 2 years ago

but I don't see any point in hermetic eval on local files.

You might not realize you're using local files, accidentally sneak in state, and then be surprised when it doesn't evaluate in deployment (and be all "wait, isn't nix supposed to prevent this?"). Even with fully local files, I'd expect things still to work if I move my directory to a new computer from a restored backup. While I've personally learned when and where local state might happen, it's still a safety net that I consider nice to have.

Of course, giant copies for the tiniest delta is way too much of a cost to incur for that, but this is why we're here - to make sure that flakes don't blow up SSDs all over the place when they finally become non-experimental ;)

Atemu commented 2 years ago

You might not realize you're using local files, accidentally sneak in state, and then be surprised when it doesn't evaluate in deployment (and be all "wait, isn't nix supposed to prevent this?").

I don't understand what you mean by that.

How is copying the accidentally added state over to the Nix store first and then evaling it any better than just evaling it directly?

Even with fully local files, I'd expect things still to work if I move my directory to a new computer from a restored backup. While I've personally learned when and where local state might happen, it's still a safety net that I consider nice to have.

How is the location of the directory related to any of this? A direct eval of the same state of a directory in another location will have the same result. How should copying improve anything?

andir commented 2 years ago

IIRC files that are tracked with git already (and changed) are being staged and then copied to the store. I can see how this ensures that at least the files are tracked and marked as updated (by staging them). I also kind of agree that I think this is the wrong solution to the problem, or perhaps a solution in search of a problem? Most of the time it is very expensive to copy my working directory into the store.

Since I can see why that feature is useful, I'd argue that it should be configurable if you want your flake repos to be copied to the store or not. As far as I know, the hash of the path that is added to the store is also currently used for the eval caching.

Perhaps the current implementation is a nice PoC of how more proper hermetic eval could look like and what it gives us in terms of capabilities (caching, ...).

Atemu commented 2 years ago

I can see how this ensures that at least the files are tracked and marked as updated (by staging them).

That sounds like a sound reason but I can's see how that wouldn't just be possible with direct eval too.

@edolstra could we get some insight from you here?

Kha commented 2 years ago

It's definitely possible, just more work as described in the initial post:

However, we also need to ensure that it's not possible to access untracked files (i.e. we need to check every file against git ls-files).

Nix already has a "eval may only access these store paths" logic, but no "may only access tracked files of this Git checkout" logic yet, so using the former was the simplest solution I assume.

Atemu commented 2 years ago

Another important point @rnhmjoj mentioned in Discourse is security. A user can easily unknowingly expose private/secret information globally on a system by building a local flake.

thufschmitt commented 2 years ago

However, we also need to ensure that it's not possible to access untracked files (i.e. we need to check every file against git ls-files)

I guess this could also be implemented by creating a shallow copy of the flake directory (by creating a forest of symlinks to the original source tree rather than really copying it). That could already make things notably faster (not entirely free, but cheap-enough in most cases), and might be simpler to implement.

nixos-discourse commented 2 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/is-nix-2-4-significantly-slower/16218/3

nixos-discourse commented 2 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/locally-excluding-nix-flakes-when-using-nix-independenly-of-upstream/16480/17

yajo commented 2 years ago

Linking here yesterday's chat, which I think is relevant.

tl;dr: IMHO nix develop should be the exception: impure and not-in-nix-store by default. Just like nix-shell.

bmabsout commented 2 years ago

I believe local impure flakes are also very useful if you're constantly editing a file that is in your repo but you don't want the flake to be reevaluated and the file constantly copied to the store. Eventually when you're done you can just not have a --local flag (which can imply --impure as well I guess) or something. One use case I have in mind is having a flake where a python package is in scope but is also editable. I can drop into an appropriate shell with nix develop but it requires the path for the package which is a relative path. Ofcourse for hermetic evaluation you want the whole package source to be copied to the store, but I want to edit the package as I develop and use it at the same time. This means I either have to hardcode the path of my local package to the absolute path in my system which makes things very non-reproducible, or I have to update the flake-inputs constantly which keeps copying the thing to the store.

Atemu commented 2 years ago

What @bmabsout just said precisely outlines the whole reason I still haven't adopted flakes yet.

Hermetic eval is very useful for general building etc. but not when I'm in the middle of hacking on things. Nix flakes need to offer the same speed and convenience that i.e. nixos-rebuild -I nixpkgs=... -I nixos-config=... provides.

yipengsun commented 2 years ago

I also just encountered this problem. We store large amount of data (several TBs) with git annex (see our project at https://github.com/umd-lhcb/lhcb-ntuples-gen). Today we just annexed ~100 GB of new data (so the local repo size grows to around 100 GB, without downloading any other previously annexed data) and it took a whooping 8 minutes to finish a nix develop command, without any changes in flake.nix.

If I'm reading correctly, reverting back to a nix-shell based approach with flake-compat would mitigate our problem until the lazy copy lands. Is that right?

Atemu commented 2 years ago

git-annex and LFS are an interesting case here. Should large files be available in flake eval?

edolstra commented 2 years ago

@yipengsun Yes, that's correct.

L-as commented 2 years ago

Even outside nix develop, this seems highly problematic. Couldn't you make use of Git's information to detect what has changed? We have a Merkle tree of the files after all.

edolstra commented 2 years ago

@L-as This behavior being problematic is why this issue exists...

yipengsun commented 2 years ago

git-annex and LFS are an interesting case here. Should large files be available in flake eval?

I'd say no unless the flake itself is used as an output. We currently have no such usecase but it could be nice if we can have something like a .flakeignore file to explicitly forbidden copy of files in certain paths.

yipengsun commented 2 years ago

I did a bit more investigation, and found out the slowness of the nix develop was due to us accidentally added large files directly to git, and copying these files took a long time.

Also, I tried to setup a minimal flake repo to test the availability of the annexed files:

flake.nix:

{
  description = "test";

  inputs = {
    nixpkgs.url = "nixpkgs/nixpkgs-unstable";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    flake-utils.lib.eachDefaultSystem (system:
      let
        pkgs = import nixpkgs { inherit system; };
      in
      {
        devShell = pkgs.mkShell {
          name = "test-git-annex";
          buildInputs = with pkgs; [
            git-annex
          ];
        };
      }
    );
}

I generated a large file (~100 MB) with dd, then first added it with git annex add and a nix develop.

After that, I inspected the /nix/store:

❯ ls -l
total 18
-r--r--r-- 3 root root 1001 Dec 31  1969 flake.lock
-r--r--r-- 4 root root  538 Dec 31  1969 flake.nix
lrwxrwxrwx 2 root root  202 Dec 31  1969 my_big_file.bin -> .git/annex/objects/05/12/SHA256E-s104857600--f6e654508eac102f1efecae5248ca66ea5193d5edf86c895843188d06deff947.bin/SHA256E-s104857600--f6e654508eac102f1efecae5248ca66ea5193d5edf86c895843188d06deff947.bin

The symbolic link is broken, because, well, files inside .git folders are not copied over, which is to be expected.

I then tried to unlock the file (see here for more info) with git annex unlock then git commit. Now the store looks like this:

❯ ls -la
total 16583
dr-xr-xr-x    2 root root       5 Dec 31  1969 .
drwxrwxr-t 6826 root nixbld 30212 Feb 26 00:18 ..
-r--r--r--    4 root root    1001 Dec 31  1969 flake.lock
-r--r--r--    5 root root     538 Dec 31  1969 flake.nix
-r--r--r--    2 root root     104 Dec 31  1969 my_big_file.bin

And now my_big_file.bin is a git-annex pointer file:

/annex/objects/SHA256E-s104857600--f6e654508eac102f1efecae5248ca66ea5193d5edf86c895843188d06deff947.bin

To conclude, I think annexed files will NEVER be available for flake eval

yipengsun commented 2 years ago

I think it would be similar for git-lfs that the files are not available for flake eval. Because one of the main goal of both git-annex and git-lfs is to NOT add large files directly to git, and only the git part of the flake gets copied.

yajo commented 2 years ago

Still trying to workaround this issue, I just found out that all nix subcommands feature an --impure flag.

What if, when running under nix develop --impure, nix resolves paths such as ./. to the local flake directory instead of copying it to the store and resolving it to there?

That would solve all problems here, and still be predictable because we're passing --impure explicitly.

Of course, this behaviour should be the same for all other nix commands...

WDYT?

erikarvstedt commented 2 years ago

Edit: The feature requested below has been implemented in https://github.com/NixOS/nix/pull/6530/commits/cbade16f9ef1e06b40b379863556157b6222a13b. See also: https://github.com/NixOS/nix/pull/6530#issuecomment-1262580303.

A way to greatly improve the flake developer experience is to allow evaluation from local sources not only for the main flake, but also for arbitrary local inputs.

Very often, local Nix development is spread out over multiple flakes. Examples:

In these cases, the quickest way to evaluate changes in a library flake via a client flake is:

nix eval/build client-flake#output --override-input my-lib /dev/my-lib

But even if the client flake is evaluated from the local source, as proposed by this issue, the library flake would still be copied to the store.

Possible solution

In addition to fixing this issue, add a flag like --local-input <flake input path> to enable the same local evaluation mode for flake inputs that have a local source.

Example:

nix build --local-input my-nixos-modules .#homeserver.vm
yajo commented 2 years ago

That's already supported with --override-input. I usually do, for example:

nix flake check --override-input 'poetry2nix' ../poetry2nix/

However it's a valid use case for lazy moving to store indeed.

Atemu commented 2 years ago

Their point is that it shouldn't be moved to the store lazily but that it shouldn't be moved at all.

When I'm hacking on something, I want Nix to evaluate the flakes as they are on-disk (just like the nix- tools do), not the current state of the on-disk git repo copied to the Nix store.

Yes, that's not an idealistic pure hermetic evaluation. I don't need or want that when I'm hacking; I need quick feedback cycles.
When actually deploying things productively (i.e. nixos switch or boot), then I want hermetic eval and don't mind actually committing my stuff or waiting for a copy to be made etc. (Though even there I might just want to quickly deploy the state on-disk to a test profile.)

erikarvstedt commented 2 years ago

@yaji, --override-input is indeed much more convient than the methods I described. I've updated my post.

rnhmjoj commented 2 years ago

I don't need or want that when I'm hacking

Or when using git-crypt or similar systems. In this case I introduce an impurity into the local checkout (decrypting the secrets), but I'm totally fine with it because I obviously don't want anyone else to reproduce my secrets. Copying the local checkout to the Nix store is also no good because it would expose the secrets to everyone on the system.

My initial expectation for the Nix flakes was to take the environment (NIX_PATH, channels, etc.) out of the picture, making packages/NixOS configurations self-contained (in a file/directory). They actually go further than that by tying the evaluation to version control, which makese sense in most cases, but it has several unintended consequences, as this issue shows. I don't know how feasible it is to implement this mode of evaluation, but it seems a necessity. Personally I'll never be able to move my configurations to flakes without this.

arashsm79 commented 2 years ago

Here is a simple approach that I have been using for a while to avoid copying the whole project to the nix store every time I commit a change to the project and want to enter a development shell.

First create a new directory in the project, called .nix for example, and add this directory to the .gitignore of the main project. Then, create hard links of the outer project's flake files into the .nix folder and initialize a new git repository there. Now every time you want a simple development shell for hacking, you can cd into the .nix directory and run nix develop in there (bringing the necessary tools into the environment) and just cd .. back to the main project and do the development from there. Depending on the project and the files referenced in flake.nix, some other files like Cargo.toml or requirements.txt might also be required to be hard-linked into the .nix directory.

.
├── Cargo.toml
├── Cargo.lock
├── flake.nix
├── flake.lock
├── .nix
│   ├── .git
│   ├── Cargo.toml
│   ├── Cargo.lock
│   ├── flake.nix
│   └── flake.lock
├── .gitignore
├── .git
├── src
│   ├── foo
│   └── bar
│       └── baz

I don't often change my project's flake.nix files across branches, but you could easily set up a post-checkout hook in your main project's .git/hooks directory with the following content for example:

#!/nix/store/dnd.../bin/bash

ln -f flake.nix .nix/flake.nix

now every time you change to a branch that has a different flake.nix, the flake.nix hard link in your .nix directory also gets updated. A similar procedure can be done for other types of hooks.

Atry commented 2 years ago

Is there any way to let nix build -f . hello support git submodules? Currently nix build -f . hello does not have access to files in a submodule.

yajo commented 2 years ago

Regarding https://github.com/NixOS/nix/issues/3121#issuecomment-1120151921, here's my workaround:

src = builtins.path {
  path = ./.;
  name = "something";
  # Filter out nix files to avoid unnecessary rebuilds
  filter = (path: type: builtins.match ".*[.]nix" (builtins.baseNameOf path) == null);
};

This way I only pass this modified src to derivations and avoid constant rebuilds when hacking on flake.nix.

nixos-discourse commented 2 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-31/19481/1

Atemu commented 2 years ago

Relevant link from the discourse post above: https://github.com/edolstra/nix/tree/lazy-trees

https://github.com/NixOS/nix/pull/6530

nigelgbanks commented 2 years ago

@yajo I can't seem to find out https://github.com/NixOS/nix/issues/3121#issuecomment-1122322745 actually is used in practice. I have a flake in folder (not a git repository) and there does not seem to be a way to apply this workaround to prevent the folder from being copied into the nix store for every nix run command (the flake does not use any sources in the directory.

Atemu commented 2 years ago

If you really need a fix now, you could try the experimental branch https://github.com/NixOS/nix/pull/6530

nixos-discourse commented 2 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/setting-up-a-new-project-with-nix-use-niv-or-flakes/22427/4

DavHau commented 2 years ago

Could someone edit the original post and add a link to https://github.com/NixOS/nix/pull/6530. It's a bit hard to find otherwise.

Et7f3 commented 1 year ago

You might not realize you're using local files, accidentally sneak in state,

The current situation solve half this issue: file knowns to git (only git add -N) are made available fully and not only staged content. So one use can still commit and get build failure. It is still slow.

So either:

If we go the second option, we will need to change the copy to match what will be evaluated:

AleXoundOS commented 1 year ago

There should be at least a warning that the whole directory contents are going to be put into /nix/store. I was surprised when nix develop ended up eating my drive space.

irisjae commented 1 year ago

As of now, given that Nix flakes still work by copying to the store (as far as I'm aware), is there any way to make Nix do the copying with hardlinks instead?

lilyball commented 1 year ago

Hardlinks wouldn't work, the nix store needs to be read-only immutable files and hardlinks mean permissions are shared and editing one file edits both. If your filesystem supports copy-on-write then that should help, but it won't work if your nix store is on a separate volume (though hardlinks wouldn't work in that case either).

Atemu commented 1 year ago

In newer version of Linux, you can reflink between vfs barriers as long as it's the same superblock. Btrfs for example supports this but ZFS does not.

Though I think the big problem with copying is mostly metadata and the associated random access, not the content.