Hydra: nixos/release-20.03 and unstable fails to evaluate

FRidh commented 4 years ago

Describe the bug The nixos/release-20.03 jobset fails to evaluate:

hydra-eval-jobs returned signal 9:
(no output)

I've tried several times to trigger an evaluation, yet every time it fails.

cc @disasm @worldofpeace @grahamc @vcunat

vcunat commented 4 years ago

The last few weeks felt like we're slowly making the big nixos eval too expensive (again). Maybe not just nixos, as I've seen increase in out-of-memory failures also in jobs like tarball, but perhaps it was just a feeling as I see no significant increase in these graphs: https://hydra.nixos.org/job/nixpkgs/trunk/metrics#tabs-charts

Disasm commented 4 years ago

cc @disassembler

worldofpeace commented 4 years ago

Noticed this as well, can't open ZHF until there's an eval on the jobset.

grahamc commented 4 years ago

The thinking from Eelco is the growth of NixOS tests is causing memory pressure problems. Each VM in the tests adds a few hundred MB of RAM consumption for hydra's evaluator.

worldofpeace commented 4 years ago

These got added https://github.com/NixOS/nixpkgs/commit/7a625e745346fbc87952a7af23ae6c88ade80ede https://github.com/NixOS/nixpkgs/commit/bf49181373643d3e2df1a06cb6e1efde1a2dec3e

grahamc commented 4 years ago

It feels bad to be "within 5 tests" of being unable to move forward. :(

grahamc commented 4 years ago

To clarify, @edolstra's suggestion short-term is to remove some of the tests. For example, those key map tests were commented for a very long time. It would be sad to drop them again but it may be the best short-term solution. Long-term, there is a branch for a more precise GC, and possibly some optimisation work which could be made in how NixOS is evaluated.

but I don't know if either of these more long-term things are possible today.

That said, I'm 100% not the right person for this problem, and possibly @LnL7, @samueldr, or @fpletz, @Ma27 have advice on how to tune hydra's evaluator.

flokli commented 4 years ago

@grahamc does it just run out of memory, or why does it fail to evaluate?

LnL7 commented 4 years ago

This was for different reasons but I've been tracking the stdenv requisite size for quite a while now, could be totally unrelated but that had a rather large jump recently.

Update this was between fa7445532900f2555435076c1e7dce0684daa01a and d453c2f5d8e211a6f45f59f67da8a22625627b86. Most likely the libidn2 change at first glance.

flokli commented 4 years ago

@LnL7 could we bisect around that?

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-20-03-feature-freeze/5655/32

LnL7 commented 4 years ago

Does anybody know why this only occurs for 20.03 and not trunk-combined? Evaluation for those should be equivalent (except for stableBranch but is/should be purely metadata).

andir commented 4 years ago

I've seen killed trunk-combined tasks earlier today while trying to trigger a eval.

LnL7 commented 4 years ago

@flokli 447edaa32fcee706be24db4389f4759fad68a785 looks like python (and not a minimal build) was introduced in the stdenv. A minimal python would bring it down from ~270 to ~240.

flokli commented 4 years ago

If that's the case, we might just want to remove that reference -

I don't really see a reason why python should become part of glibc's runtime closure.

vcunat commented 4 years ago

I'm not sure how the closure sizes are relevant to this thread, but I can't see a significant increase of (runtime) closure size for stdenv output path on x86_64-linux (and python is not there).

jonringer commented 4 years ago

stdenv size didn't change much:

[13:37:37] jon@jon-workstation ~/projects/nixpkgs (master)
$ nix path-info -Sh ./result
/nix/store/5gc1hyqbxwfwcw7l1bs7gy6rw9zbnc09-stdenv-linux     231.6M
[13:39:35] jon@jon-workstation ~/projects/nixpkgs (release-19.09)
$ nix path-info -Sh ./result
/nix/store/qghrkvk86f9llfkcr1bxsypqbw1a4qmw-stdenv-linux     224.4M

and python is not in the runtime closure:

[13:40:47] jon@jon-workstation ~/projects/nixpkgs (master)
$ nix-store -q --tree ./result | grep python
[13:40:58] jon@jon-workstation ~/projects/nixpkgs (master)

LnL7 commented 4 years ago

@vcunat It could be something totally different, but given that nixos instances will evaluate pkgs multiple times it's something that increases evaluation for each test.

jonringer commented 4 years ago

I did notice that the hydra jobsets for "trunk" now take over 100 seconds to evaluate, where they use to be significantly lower when I first started viewing hydra >6 months ago.

FRidh commented 4 years ago

The evaluator dies with hydra-eval-jobs returned signal 9 but also random builds fail with 9. Would the evaluator kill remote jobs when it runs out of memory? Or could those be builds that happen to run on the evaluator?

vcunat commented 4 years ago

No, I believe there are no such connections.

edolstra commented 4 years ago

Ouch, having glibc depend on python is really unfortunate.

vcunat commented 4 years ago

It was upstream decision to use python in the build process (build-time only dependency). I don't think we can do much about that. EDIT: using some minimal python could be nice, though.

flokli commented 4 years ago

@vcunat you could probably switch that occurence to python3Minimal, introduced in https://github.com/NixOS/nixpkgs/pull/66762, which should have a smaller build and runtime closure - if you don't rely on things like libreadline or ssl support.

vcunat commented 4 years ago

OK, I submitted #80112, but I still can't see how it's relevant to this thread.

LnL7 commented 4 years ago

Based on the gc stats from nix the memory needed to evaluate eg. hello increased from 26mb -> 29mb with the glibc update (this has now doubled compared to 18.03 btw). This indeed isn't a big deal since it's a flat cost per architecture. However that's not the case for nixos instances, since each test imports it's own instance of nixpkgs.

I can't evaluate everything on my machine with the current settings, but evaluating just the tests seems to use between 600mb and 1.5Gb more before reverting that commit. With the way evaluation currently works that's a problem if this bumps up the memory usage enough to require a larger heap.

I don't know how much memory the hydra evaluator has available, but with GC_INITIAL_HEAP_SIZE=20G both 20.03 and older releases evaluated without issues. The larger heap size does result in higher average memory usage however which might be a problem for concurrent evaluations.

vcunat commented 4 years ago

If I look correctly, using python3Minimal recovers only a small fraction of this increase.

LnL7 commented 4 years ago

Yeah, I'm not sure there's a good solution for this other than trying to reduce the memory "enough" without more fundamental changes.

I took a quick look at the evaluation for tests, this probably isn't the right place to change and I think it would break tests that use overlays as well as multiple architectures. But something similar might work to reduce the overhead for tests quite significantly.

diff --git a/nixos/lib/build-vms.nix b/nixos/lib/build-vms.nix
index 1bad63b9194..8da2504bea9 100644
--- a/nixos/lib/build-vms.nix
+++ b/nixos/lib/build-vms.nix
@@ -36,6 +36,7 @@ rec {
       baseModules =  (import ../modules/module-list.nix) ++
         [ ../modules/virtualisation/qemu-vm.nix
           ../modules/testing/test-instrumentation.nix # !!! should only get added for automated test runs
+          { key = "nixpkgs-pkgs"; nixpkgs.pkgs = pkgs; }
           { key = "no-manual"; documentation.nixos.enable = false; }
           { key = "qemu"; system.build.qemu = qemu; }
           { key = "nodes"; _module.args.nodes = nodes; }

vcunat commented 4 years ago

However that's not the case for nixos instances, since each test imports it's own instance of nixpkgs.

My reading of that part is that pkgs is passed through and not re-imported.

The idea for VM tests seems intriguing. Overlays appear considered at a quick glance.

vcunat commented 4 years ago

I tried your patch with evaluation of just a pair of tests at once, and it decreased gc.totalBytes by ~22%

LnL7 commented 4 years ago

Yeah, I linked the wrong thing.

Overlays appear considered at a quick glance.

That looks promising, threading through pkgs for the correct system instead of just pkgs (which is always x86_64-linux) to buildMV might be an option then. I won't have time to look into this further for a few days however.

vcunat commented 4 years ago

I don't know these parts of code well, but I looked around and I still can't see any problem with that patch. I tried on Hydra, but it's still getting killed: https://hydra.nixos.org/jobset/nixos/nixos-test-expensive-eval (2/2 eval attempts killed)

vcunat commented 4 years ago

When I restricted it to just x86_64-linux, it succeeded on second attempt. I'm hopeful to use this approach for now. Note that 20.03 was also created just for x86_64-linux and couldn't get evaluation even after cutting some tests in ceb90b08e... at least until a while ago (not sure what's changed).

Therefore I still expect that patch helped significantly; I'd still check diff in test failures before using it for real.

grahamc commented 4 years ago

It looks like Eelco has whipped up a miracle and got evaluations passing, and in less time too.

vcunat commented 4 years ago

Bought a better server? :-) In any case, it will be nice to know how he managed it, as it's a never-ending problem. EDIT: I suspect it was some kind of cheating, as we no longer have the aggregate tested job, neither in trunk-combined nor in release-20.03.

For long-term solutions of RAM consumption I have high hopes for https://github.com/NixOS/hydra/issues/715

jonringer commented 4 years ago

Evaluation 1570647 of jobset nixos:nixos-test-expensive-eval
Compare to...

This evaluation was performed on 2020-02-15 00:59:23. Fetching the dependencies took 3s and evaluation took 1109s

oof, 20mins for an eval. That's rough

vcunat commented 4 years ago

That seems quite a normal number IIRC. (for our big jobsets like trunk-combined)

vcunat commented 4 years ago

OK, let me ask explicitly about that miracle: how are channels going to work when we have no tested job anymore? Perhaps I just don't understand the intentions.

edolstra commented 4 years ago

The tested job is back (it was never gone but it did have an evaluation error). We'll need to backport 2de3caf01109891cfc2645b0ad07ac36aedadd1e and 895042956f279ae8ebc9fd026664cea8198f71ec to the 20.03 branch.

vcunat commented 4 years ago

Great :heart: I pushed 20.03 backports.

I believe the issue is fixed and shouldn't re-appear anytime soon. Possible TODOs:

[ ] backport to 19.09. It probably will keep evaluating without that, but we could have it cheaper (for the several remaining months). It surely doesn't apply cleanly, but it should be a mechanical change.
[ ] still consider the approach from LnL; perhaps we can get even better performance thanks to that.

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-20-03-beta/5935/1

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/firefox-not-up-to-date/5941/2

vcunat commented 4 years ago

No good, even the small channels are blocked now: https://github.com/NixOS/hydra/issues/715#issuecomment-587693274

vcunat commented 4 years ago

Resolved and today all channels even got updated.

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-20-03-beta/5935/7

NixOS / nixpkgs

Hydra: nixos/release-20.03 and unstable fails to evaluate #79907