NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.29k stars 1.48k forks source link

Copying lots of paths fails with "too many root sets" #7359

Open lheckemann opened 1 year ago

lheckemann commented 1 year ago

Describe the bug

$ nix copy ./*.drv --to ssh-ng://root@54.229.150.174 --derivation
warning: error: SQLite database '/nix/var/nix/db/db.sqlite' is busy
[1147 copied (17.0 MiB), 5.2 MiB DL] copying 44430 pathsToo many root sets
Aborted (core dumped)

[linus@geruest:~/nixpkgs/master/to-build]$ terminate called after throwing an instance of 'nix::EndOfFile'
  what():  error: unexpected end-of-file

Steps To Reproduce

  1. Get lots of drvs (546, all the NixOS test drvs for a nixpkgs checkout in my case)
  2. Try copying them to a remote machine using nix copy --to ssh-ng://root@$host ./*.drv or similar

Expected behavior

Successful copy, or useful error message

nix-env --version output

nix-env (Nix) 2.12.0pre20221116_561440b

rickynils commented 1 year ago

I ran into this issue also. Works in Nix 2.9.1 and 2.10.3 but not in 2.11.1 and 2.12.0.

abbec commented 1 year ago

This is the callstack of the error:

#0  0x00007f341028abc7 in __pthread_kill_implementation () from /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libc.so.6
#1  0x00007f341023db46 in raise () from /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libc.so.6
#2  0x00007f34102284b5 in abort () from /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libc.so.6
#3  0x00007f34114e0da4 in GC_add_roots_inner () from /nix/store/n68j305pcfac37770hcz09iwz36xbbqf-boehm-gc-8.2.2/lib/libgc.so.1
#4  0x00007f34114f355e in GC_add_roots () from /nix/store/n68j305pcfac37770hcz09iwz36xbbqf-boehm-gc-8.2.2/lib/libgc.so.1
#5  0x00007f34112f3687 in nix::BoehmGCStackAllocator::allocate() () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libnixexpr.so
#6  0x00007f3410b95c65 in nix::sinkToSource(std::function<void (nix::Sink&)>, std::function<void ()>)::SinkToSource::read(char*, unsigned long) () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libnixutil.so
#7  0x00007f3410b95179 in nix::Source::drainInto(nix::Sink&) () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libnixutil.so
#8  0x00007f3410f0751c in std::_Function_handler<void (nix::Sink&), nix::RemoteStore::addMultipleToStore(std::vector<std::pair<nix::ValidPathInfo, std::unique_ptr<nix::Source, std::default_delete<nix::Source> > >, std::allocator<std::pair<nix::ValidPathInfo, std::unique_ptr<nix::Source, std::default_delete<nix::Source> > > > >&, nix::Activity&, nix::RepairFlag, nix::CheckSigsFlag)::{lambda(nix::Sink&)#1}>::_M_invoke(std::_Any_data const&, nix::Sink&) () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libnixstore.so
#9  0x00007f3410b96224 in void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, nix::VirtualStackAllocator, boost::coroutines2::detail::pull_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::control_block::control_block<nix::VirtualStackAllocator, nix::sinkToSource(std::function<void (nix::Sink&)>, std::function<void ()>)::SinkToSource::read(char*, unsigned long)::{lambda(boost::coroutines2::detail::push_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&)#1}>(boost::context::preallocated, nix::VirtualStackAllocator&&, nix::sinkToSource(std::function<void (nix::Sink&)>, std::function<void ()>)::SinkToSource::read(char*, unsigned long)::{lambda(boost::coroutines2::detail::push_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&)#1}&&)::{lambda(boost::context::fiber&&)#1}> >(boost::context::detail::transfer_t) () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libnixutil.so
#10 0x00007f341108618f in make_fcontext () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libboost_context.so.1.79.0
#11 0x0000000000000000 in ?? ()

Seems to have been introduced here: https://github.com/NixOS/nix/pull/6612

abbec commented 1 year ago

@thufschmitt It seems that doing the work of draining all paths here inside the same lambda is a bit too much for the garbage collector. Not sure what the correct fix is though, batching?

thufschmitt commented 1 year ago

I'm not sure what causes it to crash since that should be properly streaming :thinking:

@edolstra since you're the original author of that “low-latency ssh copying”, any idea what might go wrong? I must confess I'm not entirely clear on how it works

MrFoxPro commented 1 year ago

This happens when I'm using deploy-rs (it copies store via ssh-ng://)

colemickens commented 1 year ago

This is something I'm hitting on a nearly daily basis as nixos-unstable moves. I have a number of systems that don't really get cache hits, causing large rebuilds and thus numerous derivations to copy. Unfortunately it's causing enough noise that I'm getting close to writing another nix wrapper that looks for the crash string and just retries the copy, but that's not ideal.

(Thanks to those investigating / looking at fixes!)

MrFoxPro commented 10 months ago

Especially annoying when dealing with slow internet, as it doesn't allow to build on remote.

siriobalmelli commented 7 months ago

Running into this with nixos-anywhere using --build-on-remote:

$ nix run github:nix-community/nixos-anywhere -- --flake .#MACHINE-NAME --build-on-remote root@MACHINE-IP
...
[0 copied (514946.1 MiB)] copying 17178 pathsToo many root sets
/nix/store/1dymvajkvj3kwj2xpjz5ccab49ry6paj-nixos-anywhere-1.0.0/bin/.nixos-anywhere-wrapped: line 196: 93359 Abort trap: 6           NIX_SSHOPTS="-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i $ssh_key_dir/nixos-anywhere ${ssh_args[*]}" nix copy "${nix_options[@]}" "${nix_copy_options[@]}" "$@"
$  nix --version
nix (Nix) 2.18.1
sysedwinistrator commented 6 months ago

Since the OP and other commenters only reported getting the error when using the ssh-ng:// protocol, I'd like to mention that I'm getting the error when copying a lot of derivations from a HTTP cache (Minio S3) to the local store in a GitLab CI job (which is running in a Docker container but uses the host's Nix store via the daemon).

Background: I've been getting this error since I refactored my pipeline a few months ago to push all derivations to the cache after the initial eval and then pull those derivations from the cache in the build job in order to avoid having to re-evaluate the derivations in the build jobs.
This only occurs on my remote ARM build machine whose store is auto GC'ed due to limited disk space, meaning it sometimes has to refetch all derivations for a NixOS config. The other x86_64 build machine is also the eval machine, so it does not even have to fetch the derivations.

geoffreygarrett commented 2 days ago

sshOpts = [ "-o" "ProxyCommand=none" ] partially solved it for me