NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
11.47k stars 1.44k forks source link

Multithreaded evaluator #10938

Open edolstra opened 2 weeks ago

edolstra commented 2 weeks ago

Motivation

This PR makes the evaluator thread-safe. Currently, only nix flake search and nix flake show make use of multi-threaded evaluation to achieve a speedup on multicore systems.

Unlike the previous attempt at a multi-threaded evaluator, this one locks thunks to prevent them from being evaluated more than once. The life of a thunk is now:

Also, there now is a tFailed value type that stores an exception pointer to represent the case where thunk evaluation throws an exception. In that case, every thread that forces the thunk should get the same exception.

To enable multi-threaded evaluation, you need to set the NR_CORES environment variable to the number of threads to use. You can also set NIX_SHOW_THREAD_STATS=1 to get some debug statistics.

Some benchmark results on a Ryzen 5900X with 12 cores and 24 hyper-threads:

Note: it's good to set GC_INITIAL_HEAP_SIZE to a high value because stop-the-world garbage collection is expensive.

To do:

Context

Priorities and Process

Add :+1: to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

RossComputerGuy commented 4 days ago

Tested this on my M1 Pro MBP running NixOS 24.05:

Before

14.61user 2.75system 0:23.08elapsed 75%CPU (0avgtext+0avgdata 7967760maxresident)k
17952inputs+0outputs (63major+498626minor)pagefaults 0swaps

After

28.69user 2.54system 0:05.35elapsed 583%CPU (0avgtext+0avgdata 7744096maxresident)k
0inputs+0outputs (6major+515584minor)pagefaults 0swaps
roberth commented 4 days ago

The duration numbers are good, but the throughput on @RossComputerGuy's mac looks a little worrying; it appears to be half as efficient (user<time>). This could be explained somewhat by the evaluation being memory bound, except the M1 is supposed to have amazing memory bandwidth. It'd also be interesting to compare against the multi-threaded build with NR_CORES=1, as well as the other values for it, to see how it scales.

RossComputerGuy commented 4 days ago

@roberth Yeah, I'm not sure why it was like that. I just recently did an update which might've updated the kernel from 6.8.9-asahi to something newer but I didn't reboot. I ran the first command mentioned by Eelco. I could do a fresh boot and get the times down and run the other command as well. It always could be the Asahi kernel doesn't quite have the memory timing stuff quite as optimized as macOS. Could boot macOS and see what happens. But I do have 64 cores Ampere on the way so I always could give that a try when that arrives.

roberth commented 3 days ago

Note that Determinate Systems has already blogged about this, which is fine (that's just part of content marketing, and it's more than ok to be excited), but let's recognize that more work needs to be done to:

Especially don't underestimate the first point. Nix is a critical component of users' systems, so we mitigate risks carefully. I think part of that is limiting this to comparatively non-critical use cases such as nix search, but that also means that we should all expect a delay in the delivery of this feature. To set expectations, if nix build enables this within a year, Eelco and DetSys must have done a stellar job on this.

Planning, opinion, Nix team Also note that the Nix team is unlikely to prioritize this work because other areas need attention, such as fixes and interface stability, which essentially means _finishing_ things instead of starting new projects, esp. without consulting the team. All progress is good, and if this amplifies our pace of development because it aligns with DetSys and/or other contributors, that'd be great, but if it's zero-sum, it'd be inefficient not to focus on other work that's already in flight.