Open yajo opened 1 year ago
Given that containers usually run on some restricted subset of the hostmachine, have you played with the max-substitution-jobs setting in the container?
Perhaps something in the container can't cope the increased number of tasks/processes/jobs of the 16 substitutions?
Perhaps you had more than 16 concurrent downloads before because you have a hogher build-job count, and now are slowed down actually limitting the former 30+ jobs to just half of it?
https://nixos.org/manual/nix/stable/command-ref/conf-file.html#conf-max-substitution-jobs
I have checked with 1 and 2 max-substitution-jobs:
time podman container run --rm -it docker.io/nixos/nix:2.16.0 nix --extra-experimental-features 'flakes nix-command' --accept-flake-config --max-substitution-jobs 1 build gitlab:moduon/mrchef/009d93eb7f0a9eabd3b248f5c10c7523422a7f56#devShells.x86_64-linux.default
time podman container run --rm -it docker.io/nixos/nix:2.16.0 nix --extra-experimental-features 'flakes nix-command' --accept-flake-config --max-substitution-jobs 2 build gitlab:moduon/mrchef/009d93eb7f0a9eabd3b248f5c10c7523422a7f56#devShells.x86_64-linux.default
Both attempt to rebuild nodejs
and take forever, while the 2.15.1 image still works perfectly and downloads everything from binary cache.
Could you attempt to reproduce locally? The commands I posted are quite reproducible.
With a single substitution job I indeed see a massive slow down. With the original 16 I do not see any slowdown compared to running nix without docker (I tested in docker not podman) and the concurrent downloads are saturating my downstream.
I can not see any noticeable slowdown in the build of nodejs itself.
I will post timings of both runs once I have them avaible, this can take quite a while, as I let the first build finish first, then do the next, and I am leaving for a doctors appointment now.
I can reproduce your issue, though this looks much more like a reproducibility issue rather than a substitution issue:
$ docker container run --rm -it docker.io/nixos/nix:2.15.1 nix --extra-experimental-features 'flakes nix-command' --accept-flake-config eval gitlab:moduon/mrchef/009d93eb7f0a9eabd3b248f5c10c7523422a7f56#devShells.x86_64-linux.default.outPath
warning: refname 'HEAD' is ambiguous.
"/nix/store/p060fx46vy1sywd1i3hbhxhcr715480q-devshell"
$ docker container run --rm -it docker.io/nixos/nix:2.16.0 nix --extra-experimental-features 'flakes nix-command' --accept-flake-config eval gitlab:moduon/mrchef/009d93eb7f0a9eabd3b248f5c10c7523422a7f56#devShells.x86_64-linux.default.outPath
warning: refname 'HEAD' is ambiguous.
"/nix/store/ywpr4b5d28gsvngq2q6bj755ambcasa9-devshell"
I did a slight cleanup of the output, as my terminal messed it up during resize. I hope I haven't removed relavant things…
Anyway, there is a warning that concerns me and which also might possibly be related:
warning: refname 'HEAD' is ambiguous.
I currently do not have the time to dig through your code to figure out where to problem arises.
Therefore I also can not say, nor quickly check if this reproducibility issue actually is your fault or nix'.
I identified the commit that causes this (e4aebccf209c12714f5547ec5acf08b1fd7f51cc):
commit e4aebccf209c12714f5547ec5acf08b1fd7f51cc
Author: Eelco Dolstra <edolstra@gmail.com>
Date: Sat May 27 17:53:30 2023 +0200
Restore Nix 2.3 behaviour for {__impure,__contentAddressed} = false
Fixes #8405.
src/libexpr/primops.cc | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
As the commit says, it was changing how __impure
and __contentAddressed
arguments in a drv work, it is very likely that you have one of those in your dependency graph with an explicit = false
.
Given #8405 the current behaviour is correct and 2.15 (and before) behaved wrong.
From my understanding it is impossible to fix both #8405 and this at the same time. One of these issues will have to be closed as a "wont-fix", sorry.
Not sure whom to ping here to receive confirmation.
Repeating what NobbZ said; unless you're setting __contentAddressed = false;
manually somewhere, the issue should really only show up if you use one of the nixpkgs commits after nodejs was explicitly set to not being content-addressed, and before this bug was worked around in nixpkgs.
Though, manually looking at all three of your nixpkgs revisions (why's there three??) i don't instantly see a revision in this situation. If you can run nix --extra-experimental-features nix-command show-derivation -r
on both versions and find out if it really is a __contentAddressed
(or __impure
?) that'd probably let you figure it out.
Describe the bug
Some of my CI pipelines are starting to break after starting to use Nix 2.16, without any other change. The break happens because the jobs last more than 1h, reaching their limit. Downgrading to 2.15.1 fixed the issue.
Steps To Reproduce
Try to build the devshell with Nix 2.16. It will take forever. Probably due to having to compile nodejs-18.15.0. I run it inside a container to imitate better how those CI jobs work, and to avoid local nix store to pollute fetch and build times:
My goal was to paste the time this takes to build, but after a couple of hours I got tired. It's too much anyway.
Expected behavior
Nix 2.15.1 was much faster. It didn't have to build anything, as all is in binary caches:
Additional context
Here you have a CI job running on the docker image
docker.io/nixos/nix
, which resolved tosha256:01ab413500cf58ee2291607debcec6d1fc62a1ad2cb31f7022e8c1c25db45662
(2.15) some weeks ago; it ends in 3:05 minutes: https://gitlab.com/moduon/mrchef/-/jobs/4308352194I retried the same job today, which resolves the image to
sha256:18daab29b75942fe338afdb4198eb7df9944df7919a5a455c2d051f4c0394f78
(2.16); i stopped the job manually after 11:04 seconds because it was already too much: https://gitlab.com/moduon/mrchef/-/jobs/4461501535Priorities
Add :+1: to issues you find important.