Open lheckemann opened 1 year ago
I ran into this issue also. Works in Nix 2.9.1 and 2.10.3 but not in 2.11.1 and 2.12.0.
This is the callstack of the error:
#0 0x00007f341028abc7 in __pthread_kill_implementation () from /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libc.so.6
#1 0x00007f341023db46 in raise () from /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libc.so.6
#2 0x00007f34102284b5 in abort () from /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libc.so.6
#3 0x00007f34114e0da4 in GC_add_roots_inner () from /nix/store/n68j305pcfac37770hcz09iwz36xbbqf-boehm-gc-8.2.2/lib/libgc.so.1
#4 0x00007f34114f355e in GC_add_roots () from /nix/store/n68j305pcfac37770hcz09iwz36xbbqf-boehm-gc-8.2.2/lib/libgc.so.1
#5 0x00007f34112f3687 in nix::BoehmGCStackAllocator::allocate() () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libnixexpr.so
#6 0x00007f3410b95c65 in nix::sinkToSource(std::function<void (nix::Sink&)>, std::function<void ()>)::SinkToSource::read(char*, unsigned long) () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libnixutil.so
#7 0x00007f3410b95179 in nix::Source::drainInto(nix::Sink&) () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libnixutil.so
#8 0x00007f3410f0751c in std::_Function_handler<void (nix::Sink&), nix::RemoteStore::addMultipleToStore(std::vector<std::pair<nix::ValidPathInfo, std::unique_ptr<nix::Source, std::default_delete<nix::Source> > >, std::allocator<std::pair<nix::ValidPathInfo, std::unique_ptr<nix::Source, std::default_delete<nix::Source> > > > >&, nix::Activity&, nix::RepairFlag, nix::CheckSigsFlag)::{lambda(nix::Sink&)#1}>::_M_invoke(std::_Any_data const&, nix::Sink&) () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libnixstore.so
#9 0x00007f3410b96224 in void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, nix::VirtualStackAllocator, boost::coroutines2::detail::pull_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::control_block::control_block<nix::VirtualStackAllocator, nix::sinkToSource(std::function<void (nix::Sink&)>, std::function<void ()>)::SinkToSource::read(char*, unsigned long)::{lambda(boost::coroutines2::detail::push_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&)#1}>(boost::context::preallocated, nix::VirtualStackAllocator&&, nix::sinkToSource(std::function<void (nix::Sink&)>, std::function<void ()>)::SinkToSource::read(char*, unsigned long)::{lambda(boost::coroutines2::detail::push_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&)#1}&&)::{lambda(boost::context::fiber&&)#1}> >(boost::context::detail::transfer_t) () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libnixutil.so
#10 0x00007f341108618f in make_fcontext () from /nix/store/ksb0p7wj3l5i6m8g7yhzn0593z9x3910-nix-2.14.0pre20230131_dirty/lib/libboost_context.so.1.79.0
#11 0x0000000000000000 in ?? ()
Seems to have been introduced here: https://github.com/NixOS/nix/pull/6612
@thufschmitt It seems that doing the work of draining all paths here inside the same lambda is a bit too much for the garbage collector. Not sure what the correct fix is though, batching?
I'm not sure what causes it to crash since that should be properly streaming :thinking:
@edolstra since you're the original author of that “low-latency ssh copying”, any idea what might go wrong? I must confess I'm not entirely clear on how it works
This happens when I'm using deploy-rs (it copies store via ssh-ng://)
This is something I'm hitting on a nearly daily basis as nixos-unstable moves. I have a number of systems that don't really get cache hits, causing large rebuilds and thus numerous derivations to copy. Unfortunately it's causing enough noise that I'm getting close to writing another nix wrapper that looks for the crash string and just retries the copy, but that's not ideal.
(Thanks to those investigating / looking at fixes!)
Especially annoying when dealing with slow internet, as it doesn't allow to build on remote.
Running into this with nixos-anywhere using --build-on-remote
:
$ nix run github:nix-community/nixos-anywhere -- --flake .#MACHINE-NAME --build-on-remote root@MACHINE-IP
...
[0 copied (514946.1 MiB)] copying 17178 pathsToo many root sets
/nix/store/1dymvajkvj3kwj2xpjz5ccab49ry6paj-nixos-anywhere-1.0.0/bin/.nixos-anywhere-wrapped: line 196: 93359 Abort trap: 6 NIX_SSHOPTS="-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i $ssh_key_dir/nixos-anywhere ${ssh_args[*]}" nix copy "${nix_options[@]}" "${nix_copy_options[@]}" "$@"
$ nix --version
nix (Nix) 2.18.1
Since the OP and other commenters only reported getting the error when using the ssh-ng://
protocol, I'd like to mention that I'm getting the error when copying a lot of derivations from a HTTP cache (Minio S3) to the local store in a GitLab CI job (which is running in a Docker container but uses the host's Nix store via the daemon).
Background:
I've been getting this error since I refactored my pipeline a few months ago to push all derivations to the cache after the initial eval and then pull those derivations from the cache in the build job in order to avoid having to re-evaluate the derivations in the build jobs.
This only occurs on my remote ARM build machine whose store is auto GC'ed due to limited disk space, meaning it sometimes has to refetch all derivations for a NixOS config. The other x86_64 build machine is also the eval machine, so it does not even have to fetch the derivations.
sshOpts = [ "-o" "ProxyCommand=none" ]
partially solved it for me
Describe the bug
Steps To Reproduce
nix copy --to ssh-ng://root@$host ./*.drv
or similarExpected behavior
Successful copy, or useful error message
nix-env --version
outputnix-env (Nix) 2.12.0pre20221116_561440b