NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.73k stars 1.52k forks source link

Unexpected remote builder behavior; `builders-use-substitutes = true` not working? #8101

Open stuser81 opened 1 year ago

stuser81 commented 1 year ago

I have two machines on my LAN:

My goals:


This is what happens on the laptop without remote building setup (i.e. without any ~/.config/nix/nix.conf and without any modifications to /etc/nix/nix.conf). I have truncated the output, but in fact everything is already available on cache.nixos.org so it gets downloaded from there.

user@laptop:~$ nix-collect-garbage
finding garbage collector roots...
deleting garbage...
deleting unused links...
note: currently hard linking saves -0.00 MiB
0 store paths deleted, 0.00 MiB freed
user@laptop:~$ nix-shell -p ghc
these 46 paths will be fetched (233.20 MiB download, 2585.36 MiB unpacked):
  /nix/store/09ybh8g6bhhq7h3lrq98rmvjcdw94iz8-gmp-with-cxx-stage4-6.2.1
  /nix/store/1zsc48wwlplpkzms83m7zr94xnfalq2q-glibc-2.35-224-dev
  /nix/store/31an24ard60kip3iaxkmd1rnflb55zfp-binutils-2.40-lib
  /nix/store/45ncc133v5ncn8ivb1lkfv0wzfab9lx2-gawk-5.2.1
  /nix/store/4vq879kpg8b3ni6awk3dphzsipkf5vdm-ghc-9.2.7-doc
  ...
copying path '/nix/store/fa0byasfkms4k570jm47b7sb9lkrj73v-linux-headers-6.2' from 'https://cache.nixos.org'...
copying path '/nix/store/y5kskikzjzv84169aa81kg6b24qq1q5p-libunistring-1.1' from 'https://cache.nixos.org'...
copying path '/nix/store/izcs0br6mfbx7rqs5ngmg47fwpwycbl1-libidn2-2.3.2' from 'https://cache.nixos.org'...
copying path '/nix/store/8bmp6r3a0xfha3wj36phlc47clh9w81l-glibc-2.35-224' from 'https://cache.nixos.org'...
copying path '/nix/store/pcslyy22s9piz2n3pckqia0k5i4ysi12-attr-2.5.1' from 'https://cache.nixos.org'...
...

Let's now set up remote building:

build-users-group = nixbld

extra-experimental-features = nix-command flakes
builders-use-substitutes = true
trusted-substituters = ssh://192.168.1.80
substituters = ssh://192.168.1.80
max-jobs = 0
builders = ssh://192.168.1.80 - - 10

builders-use-substitutes = true asks the desktop to grab from cache.nixos.org if possible, max-jobs = 0 makes sure no building takes place on the laptop, and builders = ssh://192.168.1.80 - - 10 makes sure 10 cores are used on the desktop.

Note: On my laptop I ran sudo systemctl stop nix-daemon.service and sudo systemctl start nix-daemon.service but I'm not sure if that was needed. However, I noticed I indeed had to modify /etc/nix/nix.conf on my laptop. Modifying ~/.config/nix/nix.conf on my laptop wasn't causing the remote builder to get registered at all. (This is possibly a separate bug altogether.)

The problem:

Check out what happens now. As you can see, it's building things on the desktop machine! This is unexpected because as we already saw, cache.nixos.org should already have everything needed. (I have not modified the default substituters on the desktop NixOS machine.) It seems that builders-use-substitutes = true is not working properly.

user@laptop:~$ nix-collect-garbage 
finding garbage collector roots...
deleting garbage...
deleting '/nix/store/2ymzqhzn0bayy8sgvppw38dqffk5yxx3-shell.drv'
...
user@laptop$ nix-shell -p ghc
these 655 derivations will be built:
  /nix/store/5syi8n4h7dn97ldhrjw1vbpmr2prypcp-glibc-2.35.tar.xz.drv
  /nix/store/1zpsgbsjy7spp9b2y1gmycp0yvq6mp59-linux-6.2.tar.xz.drv
  /nix/store/20d5pi1a5i9jj041i0gvr9zcs7bjbw46-binutils-2.40.tar.bz2.drv
  /nix/store/pg6zn3qqhy869kh4gwhpfyf36q0d121z-zlib-1.2.13.tar.gz.drv
  /nix/store/4lcnmk8h9jkhiqa815716rvagnli57j7-zlib-1.2.13.drv
...
copying path '/nix/store/v28dv6l0qk3j382kp40bksa1v6h7dx9p-bash-5.2.tar.gz' from 'ssh://192.168.1.80'...
building '/nix/store/a68j9bys24cr3m1bixy4bz92q27bmx7k-bash52-005.drv' on 'ssh://192.168.1.80'...
building '/nix/store/f9hs49y4q8bvg4ffdiycbafd5r1gb13r-bash52-008.drv' on 'ssh://192.168.1.80'...
building '/nix/store/sjlm8agj6m3cpglc5v11d40cj7j6kin2-fix-static.patch.drv' on 'ssh://192.168.1.80'...
warning: ignoring substitute for '/nix/store/dyamyflq6pvvnhzsj5ldzpwf31g4r5vm-bootstrap-stage0-stdenv-linux' from 'ssh://192.168.1.80', as it's not signed by any of the keys in 'trusted-public-keys'
warning: ignoring substitute for '/nix/store/8znp434sp8m46633mfcvql5czccakcsx-bootstrap-stage2-stdenv-linux' from 'ssh://192.168.1.80', as it's not signed by any of the keys in 'trusted-public-keys'
building '/nix/store/jxhwnvkpxrbl5rrf01zf3cf2p7v215pb-findutils-4.9.0.tar.xz.drv' on 'ssh://192.168.1.80'...
building '/nix/store/5syi8n4h7dn97ldhrjw1vbpmr2prypcp-glibc-2.35.tar.xz.drv' on 'ssh://192.168.1.80'...
building '/nix/store/6v6ld5igw5f9rw3gdc7w7s14cfpfq63c-gzip-1.12.tar.xz.drv' on 'ssh://192.168.1.80'...
building '/nix/store/4h6k2a3b62nkgsfjf8s53dqlay7kywwx-libidn2-2.3.2.tar.gz.drv' on 'ssh://192.168.1.80'...
...
waiting for a machine to build '/nix/store/8yhywccj73wdxv579rz5lk45dpjfgn2w-tar-1.34.tar.xz.drv'...
warning: ignoring substitute for '/nix/store/59cfn9z8i5230r5gr5wh46ki67lrvl7s-bootstrap-stage1-stdenv-linux' from 'ssh://192.168.1.80', as it's not signed by any of the keys in 'trusted-public-keys'
warning: ignoring substitute for '/nix/store/93yhirq3wghn0hp4d6yn17l4i4wgf6q8-expand-response-params' from 'ssh://192.168.1.80', as it's not signed by any of the keys in 'trusted-public-keys'
waiting for a machine to build '/nix/store/rwkpw9zf33qgwciyrg6yd4kmrcg5k4wj-gcc-12.2.0.tar.xz.drv'...
waiting for a machine to build '/nix/store/a5alapfd1s4rf9dbwb1h0mhlvanl3qvz-gmp-6.2.1.tar.bz2.drv'...
waiting for a machine to build '/nix/store/x026yqw2ch0lhcyd548qra2i6gxi2whp-libunistring-1.1.tar.gz.drv'...
...
copying 0 paths...
copying 0 paths...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/1dydp86d00qzjbncpi80sdsndf33lc5j-fix-static.patch' from 'ssh://192.168.1.80'...
copying 0 paths...
copying 0 paths...
copying 0 paths...
...
copying path '/nix/store/lwjp1v1f0dry15flc4klvag2asx9sn5f-Python-3.10.10.tar.xz' from 'ssh://192.168.1.80'...
copying path '/nix/store/y3yiminrckvhf35fh9q42vjwi0npznji-acl-2.3.1.tar.gz' from 'ssh://192.168.1.80'...
copying path '/nix/store/mz6mc8s7mrvvzjpl9322agmsq00cyrw5-attr-2.5.1.tar.gz' from 'ssh://192.168.1.80'...
...
copying 1 paths...
copying path '/nix/store/447hvnlzzi9myri1iq3bijxgx6v6b592-patchelf-0.15.0.tar.bz2' from 'ssh://192.168.1.80'...
copying path '/nix/store/slpdqm3wlhwbkzyijjz3xpifa219ac0x-bzip2-1.0.8.tar.gz' from 'ssh://192.168.1.80'...
building '/nix/store/kg5bghxkplbz38wgavcl9gd3c46bz14b-bootstrap-stage0-stdenv-linux.drv' on 'ssh://192.168.1.80'...
building '/nix/store/fk7xz02i1l0rsjrpvv94gcj0qvs6w0pp-bootstrap-stage1-stdenv-linux.drv' on 'ssh://192.168.1.80'...
building '/nix/store/sl3ypk7flwfdb6630whq2slfa11k0cs1-bootstrap-stage2-stdenv-linux.drv' on 'ssh://192.168.1.80'...
copying 1 paths...
copying path '/nix/store/33hyq1pcdw473p8r4fyqp6h9n8r6lxvj-linux-6.2.tar.xz' from 'ssh://192.168.1.80'...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/dyamyflq6pvvnhzsj5ldzpwf31g4r5vm-bootstrap-stage0-stdenv-linux' from 'ssh://192.168.1.80'...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/59cfn9z8i5230r5gr5wh46ki67lrvl7s-bootstrap-stage1-stdenv-linux' from 'ssh://192.168.1.80'...
copying path '/nix/store/p769cp9mdy7yswdhhiwdq75y03x14199-bootstrap-stage0-glibc-bootstrap' from 'ssh://192.168.1.80'...
copying path '/nix/store/f45dpx8vxhfckg2mbbns9dy1l82i74jz-coreutils-9.1.tar.xz' from 'ssh://192.168.1.80'...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/8znp434sp8m46633mfcvql5czccakcsx-bootstrap-stage2-stdenv-linux' from 'ssh://192.168.1.80'...
building '/nix/store/xlijxhdmvypilgznjg1mz46a9mkzfzw1-bootstrap-stage0-glibc-iconv-bootstrap.drv' on 'ssh://192.168.1.80'...
building '/nix/store/4lwbbznxlz8didx8103ljafarii97p5q-python-setup-hook.sh.drv' on 'ssh://192.168.1.80'...
copying 1 paths...
copying path '/nix/store/jn9f2mr2jdm9yn5hi0pws44nbfrah8d3-bash52-008' from 'ssh://192.168.1.80'...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/z9a8wa9j4z2fk5f3hp2wcyl1dywnmz17-bootstrap-stage0-glibc-iconv-bootstrap' from 'ssh://192.168.1.80'...
'/nix/store/z9a8wa9j4z2fk5f3hp2wcyl1dywnmz17-bootstrap-stage0-glibc-iconv-bootstrap/include/iconv.h' -> '/nix/store/p769cp9mdy7yswdhhiwdq75y03x14199-bootstrap-stage0-glibc-bootstrap/include/iconv.h'
copying 0 paths...
copying path '/nix/store/m8mylclf924bhpsv529hy11llq30psyb-bootstrap-stage0-binutils-wrapper-' from 'ssh://192.168.1.80'...
copying path '/nix/store/qa2hk6c245wq0lzwm3g4k6j26125xg5r-diffutils-3.9.tar.xz' from 'ssh://192.168.1.80'...
copying 1 paths...
copying 1 paths...
copying path '/nix/store/iwddjcddxwrrxjq3476fzig56ln37awj-python-setup-hook.sh' from 'ssh://192.168.1.80'...
copying path '/nix/store/z76vsdh69cvwkwhwg69k7d1znwjmx6hf-bash52-005' from 'ssh://192.168.1.80'...
copying path '/nix/store/5ghhrws7rqx4bfk5wpv3gz44vg4arqcm-bootstrap-stage1-gcc-wrapper-' from 'ssh://192.168.1.80'...
...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/93yhirq3wghn0hp4d6yn17l4i4wgf6q8-expand-response-params' from 'ssh://192.168.1.80'...
copying path '/nix/store/jjp9cm8wkglic54jk52kfv27d6233afp-gawk-5.2.1.tar.xz' from 'ssh://192.168.1.80'...
copying path '/nix/store/xgaqv1fn6jdrskv7kwkcfimpsv7h1kbs-gnum4-1.4.19' from 'ssh://192.168.1.80'...
...
copying 1 paths...
copying path '/nix/store/y954pl28vm03qfhvqrgyspwwv28b5lyi-findutils-4.9.0.tar.xz' from 'ssh://192.168.1.80'...
copying 1 paths...
copying path '/nix/store/0avnvyc7pkcr4pjqws7hwpy87m6wlnjc-make-4.4.1.tar.gz' from 'ssh://192.168.1.80'...
copying 1 paths...
building '/nix/store/3yar2pnvz7ll79z3jlzx09qnhrsi7zj5-automake-1.16.5.tar.xz.drv' on 'ssh://192.168.1.80'...
building '/nix/store/20d5pi1a5i9jj041i0gvr9zcs7bjbw46-binutils-2.40.tar.bz2.drv' on 'ssh://192.168.1.80'...
building '/nix/store/xviwx1gm25j77g6fr3crfp4m0a3jggd1-curl-7.88.1.tar.bz2.drv' on 'ssh://192.168.1.80'...
...
waiting for a machine to build '/nix/store/x026yqw2ch0lhcyd548qra2i6gxi2whp-libunistring-1.1.tar.gz.drv'...
waiting for a machine to build '/nix/store/5ggyc87vq92i999462r1qm5l1myi9sqr-libxcrypt-4.4.33.tar.xz.drv'...
waiting for a machine to build '/nix/store/mxiibcd5b5v63fas7pfg2zavgp5bi5fk-lzip-1.23.tar.gz.drv'...
...
copying path '/nix/store/506rq7p13pk2v7a63wmsv11l0ir21ab2-krb5-1.20.1.tar.gz' from 'ssh://192.168.1.80'...
copying 0 paths...
unpacking sources
unpacking source archive /nix/store/9vm1ihdg1ysmrjdbb80g834iizzxb4yk-bison-3.8.2.tar.gz
source root is bison-3.8.2
setting SOURCE_DATE_EPOCH to timestamp 1632561040 of file bison-3.8.2/src/parse-gram.output
patching sources
configuring
configure flags: --disable-dependency-tracking --prefix=/nix/store/d99iw47x8x3kfb5x4laic5a1raswyqdr-bison-3.8.2 --build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /nix/store/370ldp1qzc2zfl0kspcp137dvjmdhpsh-bootstrap-tools/bin/install -c
checking whether build environment is sane... yes
checking for a race-free mkdir -p... /nix/store/370ldp1qzc2zfl0kspcp137dvjmdhpsh-bootstrap-tools/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
...
SuperSandro2000 commented 1 year ago

Note: On my laptop I ran sudo systemctl stop nix-daemon.service and sudo systemctl start nix-daemon.service but I'm not sure if that was needed.

If you change the nix.conf file you need to restart the daemon.

builders = ssh://192.168.1.80 - - 10 makes sure 10 cores are used on the desktop.

The 10 there is for maxJobs. I would recommend to use https://search.nixos.org/options?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=nix.buildMachine to avoid confusion.

Modifying ~/.config/nix/nix.conf on my laptop wasn't causing the remote builder to get registered at all. (This is possibly a separate bug altogether.)

No, it is not. The builders are read by the daemon which only read /etc/nix/nix.conf.

This is unexpected because as we already saw, cache.nixos.org should already have everything needed. (I have not modified the default substituters on the desktop NixOS machine.) It seems that builders-use-substitutes = true is not working properly.

There is likely something else happening, too. Is there any log indicating that the substitution failed? What are the substituters on the build machine? Is the command working as expected if you run it directly on the build machine?

stuser81 commented 1 year ago

What are the substituters on the build machine? Is the command working as expected if you run it directly on the build machine?

@SuperSandro2000 I have not changed the default substituters on the build machine (it's still cache.nixos.org there). Yes, nix-shell -p ghc works fine on the desktop build machine (just like it did on the laptop before I made the nix.conf changes).

I dig a bit more digging:

Here is a wild guess (take everything below with a grain of salt because it's just a guess):

Does this guess have any merit?

SuperSandro2000 commented 1 year ago
  • Why not make the desktop build machine always try to grab things from cache.nixos.org?

I recently misconfigured my substituters setting and through that build everything on remote builders which in fact downloaded the derivations from cache.nixos.org

  • Any ideas what could be causing this?

Not really. How are you installing Nix? Are you using the installer? Can you try it with a NixOS machine?

stuser81 commented 1 year ago

Can you try it with a NixOS machine?

@SuperSandro2000 I just tried. I noticed that on NixOS, nix.settings.substituters = ["ssh://192.168.1.80"] actually results in https://cache.nixos.org/ getting appended to the end automatically (as you can see with nix show-config). This does not happen with my Nix multi-user install on Debian.

So I suspect you were hitting https://cache.nixos.org/ through the laptop machine, not the desktop build machine.

So I believe we've been comparing apples and oranges.

Is there any way to get NixOS to not do this strange automatic appending? Related: https://github.com/NixOS/nixpkgs/issues/158356 It seems people want this automatic appending, which feels strange to me. What about people who also want to offload the cache downloading to the build machine?

stuser81 commented 1 year ago

I finally got it to work. These are the main changes I made since last time:

build-users-group = nixbld

extra-experimental-features = nix-command flakes
builders-use-substitutes = true
trusted-substituters = ssh://192.168.1.80
require-sigs = false
substituters = ssh://192.168.1.80
max-jobs = 0
builders = ssh://192.168.1.80  x86_64-linux  -  10  2  benchmark,big-parallel,kvm,nixos-test  -  -

Things started working after that. (I can't guarantee I did nothing else important but I doubt it.) I don't think the root vs. regular user thing was the real issue. The real issue was maybe that I didn't have any trusted-users before (which would have been nix.settings.trusted-users = ["root"]; while I was still accessing root) but I'm too lazy to verify it now. Also require-sigs = false avoided some warnings about untrusted substituters, which was possibly very important. (The builders change was likely only a minor change to fix an error about big-parallel being missing for one of the packages.)

TODO room for improvements: Replace require-sigs = false with a thing that only trusts cache.nixos.org public key (as the NixOS build server, or any Nix installation, does by default. It somehow makes sense that downstream machines, the laptop client machine in my case, also need to trust it if they want packages from it).

Worth noting:

1.

When the client machine shows you tons of lines like

building '/nix/store/zsydnl4207d2vaa9n8kzksccqvv37npq-foo-3.0.0.drv' on 'ssh://192.168.1.80'...

it will not display if the build server is actually building it or if it's downloading from a substituter (e.g. cache.nixos.org). To find out which, I kept bmon (network monitor) and top (CPU monitor) open at the same time on the build server. I was indeed noticing downloads (no top activity) when I was grabbing something cached online and real building (top activity and build machine fan noise) when building something custom.

2.

The inability of NixOS clients to leave cache.nixos.org out of substituters (which I can on Nix on Debian) remains a real issue - but it's a separate issue.

colemickens commented 1 year ago

Yep, just started doing remote builds again recently and I'm seeing this again too.

edit: I don't mean to be too whiny, but after years, it's disappointing how little confidence I have in many scenarios around remote building.

colemickens commented 1 year ago

And now I can't tell (after adding a regular "trusted-user") if it's working or not because it's still copying sources up. It would be great if there was a way to do a remote build with NIX_STORE=ssh-ng:// that acted like a local build and didn't do any copying.

EDIT: this is probably the result of me copying derivations to the remote, and the fact that some of my config uses IFD ?

Animeshz commented 1 year ago

I'm still not able to get remote builder to pull packages from cache directly, it always goes through my local machine. Is there any exact steps if anybody got it right?

EDIT: Turns out I didn't had substituters set up on the remote build machine, adding a nix-channel and updating fixed it.

teto commented 6 months ago

I see this as well. This makes remote builds longer than building locally :s I've checked my local and remote configs and everything looks sensible to me, I have no idea why I have to push stuff from my machine :s It doesn't help that the nix-daemon is very quit. Is there any flag to make it more verbose (--help is not that helpful either) ? Also the nix client says: copying /nix/store/.... to ssh://builderwhich is a bit ambiguous since you dont know if it's copying from cache or locally. bandwhich showed me that my machine was uploading to the builder.