NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.74k stars 13.86k forks source link

nixos.tests.installer.simpleUefiSystemdBoot.x86_64-linux is broken #79304

Closed lovesegfault closed 4 years ago

lovesegfault commented 4 years ago

Describe the bug Test nixos.tests.installer.simpleUefiSystemdBoot.x86_64-linux is broken on Master

To Reproduce Steps to reproduce the behavior:

  1. nix build -f nixos/release.nix tests.installer.simpleUefiSystemdBoot.x86_64-linux

Expected behavior For the test to pass

Additional context I've bisected the problem to this commit:

[27dfbd55f82d0bfd4ed73fcae5557ef0f6fdbe07] rsyslog: 8.1911.0 -> 8.2001.0
worldofpeace commented 4 years ago

Thank you for opening this :sparkles:

timokau commented 4 years ago

In that case let's revert it.

jtojnar commented 4 years ago

That is weird. How does simpleUefiSystemdBoot depend on rsyslog?

lovesegfault commented 4 years ago

This test is kind of flaky, there's some chance I incorrectly bisected it?

lovesegfault commented 4 years ago

I'm bisecting again starting at 05962c4ad558b6802e64e8992aebf2dfc946d34e

worldofpeace commented 4 years ago

going to plug @timokau https://discourse.nixos.org/t/nix-bisect-bisect-nix-builds/5584 :grin:

At a glance I don't see rysyslog having any relation, and likely if the tests are flakey enough bisecting can't give reliable results.

lovesegfault commented 4 years ago

I didn't use that tool because I was bisecting by running the test thrice and only if all fail considering it a bad commit :P

lovesegfault commented 4 years ago

cc. @timokau maybe add a --max-tries flag to nix-bisect :P

timokau commented 4 years ago

That kind of unforseen usecase is exactly why nix-bisect is not just a tool, but a library ;) So you could write your own bisect script in python (a much more pleasant experience than doing so in bash).

I may add that feature to the CLI at some poing though, since its probably a pretty common usecase. --force-builds would probably be a better name, since we really want a minimum number of tries. Feel free to file an issue against https://github.com/timokau/nix-bisect :)

lovesegfault commented 4 years ago

I tried bisecting again, but I think this result is also nonsense:

git bisect start
# good: [05962c4ad558b6802e64e8992aebf2dfc946d34e] Merge master into staging-next
git bisect good 05962c4ad558b6802e64e8992aebf2dfc946d34e
# bad: [51ac6731e0c370d47750f59072d67dc6537157df] Merge pull request #79354 from r-ryantm/auto-update/spdk
git bisect bad 51ac6731e0c370d47750f59072d67dc6537157df
# good: [45158b5c65ea064688d1bc0700420ef26f70116a] poetry2nix: 1.3.0 -> 1.4.0
git bisect good 45158b5c65ea064688d1bc0700420ef26f70116a
# good: [1e5cfc9cc88b3f11bca636069ea3aa1525a9c579] cointop: 1.4.1 -> 1.4.4
git bisect good 1e5cfc9cc88b3f11bca636069ea3aa1525a9c579
# bad: [d3bef573f97689dc5857554165e68be5fdaaf753] poppler: make note to check texlive before merging updates also
git bisect bad d3bef573f97689dc5857554165e68be5fdaaf753
# bad: [afc3d258249c3682992b094e4b7180a63cbf4c86] nixosTests.buildbot: Port to python
git bisect bad afc3d258249c3682992b094e4b7180a63cbf4c86
# good: [c9d6dee9e469fe752bacf3fb885e23e4e01e7b8f] nixos/locate: don't create /var/cache
git bisect good c9d6dee9e469fe752bacf3fb885e23e4e01e7b8f
# good: [19515cf7581be7578809bfae4b1ce104ef7d6f71] Merge pull request #79059 from r-ryantm/auto-update/python2.7-pdftotext
git bisect good 19515cf7581be7578809bfae4b1ce104ef7d6f71
# bad: [d63542035d9f035bce7c40869b895298d1ccc5f1] Merge pull request #79070 from r-ryantm/auto-update/python3.7-nest_asyncio
git bisect bad d63542035d9f035bce7c40869b895298d1ccc5f1
# bad: [20e0ec99428fb392960bdf46bf1a1b6cd44c4946] Merge pull request #79039 from r-ryantm/auto-update/python3.7-asyncpg
git bisect bad 20e0ec99428fb392960bdf46bf1a1b6cd44c4946
# good: [7088659bc3113317d506f5d15785ca4c371ba707] Merge pull request #79049 from r-ryantm/auto-update/python3.7-aiorun
git bisect good 7088659bc3113317d506f5d15785ca4c371ba707
# good: [f148eccf0de2d9b6f96d8ef01d862eff63bf2d10] Merge pull request #79048 from r-ryantm/auto-update/python2.7-django-picklefield
git bisect good f148eccf0de2d9b6f96d8ef01d862eff63bf2d10
# good: [fdd23ae6ef2060eec266160cb6c1a6efbedf237f] Merge pull request #78829 from r-ryantm/auto-update/SunVox
git bisect good fdd23ae6ef2060eec266160cb6c1a6efbedf237f
# bad: [3b4fc5b60fa1abfb4ed60361efca1cc9fa14e1e2] python37Packages.asyncpg: 0.20.0 -> 0.20.1
git bisect bad 3b4fc5b60fa1abfb4ed60361efca1cc9fa14e1e2
# first bad commit: [3b4fc5b60fa1abfb4ed60361efca1cc9fa14e1e2] python37Packages.asyncpg: 0.20.0 -> 0.20.1
lovesegfault commented 4 years ago

Core of the issue seems to be this:

vm-test-run-installer-simpleUefiSystemdBoot> machine: must succeed: nixos-install < /dev/null >&2
vm-test-run-installer-simpleUefiSystemdBoot> machine# > > > > > > > > > > > > > > > > > > > > > > > > > building the configuration in /mnt/etc/nixos/configuration.nix...
vm-test-run-installer-simpleUefiSystemdBoot> [0/286 built, 1/59/304 copied (63.8/937.5 MiB)] copying db-5.3.28 from local[   28.268765] nscd[864]: 864 checking for monitored file `/etc/netgroup': No such file or directory
vm-test-run-installer-simpleUefiSystemdBoot> [201 built, 318 copied (945.3 MiB)]B)] building nixos-system-nixos-20.03.git.df87e37it.df87e37med by relative paths in udev rules exist in /nix/store/8705j9qncplrgd7njqj5gw2d3bjrp21r-systemd-243.4/lib/udev... OKraid-creating.rules'n1iwxvv5l9k-udev-rules/95-dm-notify.rules'les'
vm-test-run-installer-simpleUefiSystemdBoot> machine# copying channel...
vm-test-run-installer-simpleUefiSystemdBoot> machine# installing the boot loader...
vm-test-run-installer-simpleUefiSystemdBoot> machine# setting up /etc...
vm-test-run-installer-simpleUefiSystemdBoot> Initializing machine ID from random generator.
vm-test-run-installer-simpleUefiSystemdBoot> machine# Failed to open file system "/dev/block/253:1": No such file or directory
vm-test-run-installer-simpleUefiSystemdBoot> machine# Traceback (most recent call last):
vm-test-run-installer-simpleUefiSystemdBoot> machine#   File "/nix/store/h8csrdkrs25rig4sp4rxxw960xwidnq2-systemd-boot-builder.py", line 240, in <module>
vm-test-run-installer-simpleUefiSystemdBoot> machine#     main()
vm-test-run-installer-simpleUefiSystemdBoot> machine#   File "/nix/store/h8csrdkrs25rig4sp4rxxw960xwidnq2-systemd-boot-builder.py", line 199, in main
vm-test-run-installer-simpleUefiSystemdBoot> machine#     subprocess.check_call(["/nix/store/8705j9qncplrgd7njqj5gw2d3bjrp21r-systemd-243.4/bin/bootctl", "--path=/boot", "--no-variables", "install"])
vm-test-run-installer-simpleUefiSystemdBoot> machine#   File "/nix/store/dkm6jsa0apc7vlpdkslpvsxw1qr7kxh2-python3-3.7.6/lib/python3.7/subprocess.py", line 363, in check_call
vm-test-run-installer-simpleUefiSystemdBoot> machine#     raise CalledProcessError(retcode, cmd)
vm-test-run-installer-simpleUefiSystemdBoot> machine# subprocess.CalledProcessError: Command '['/nix/store/8705j9qncplrgd7njqj5gw2d3bjrp21r-systemd-243.4/bin/bootctl', '--path=/boot', '--no-variables', 'install']' returned non-zero exit status 1.
vm-test-run-installer-simpleUefiSystemdBoot> machine: exit status 1
vm-test-run-installer-simpleUefiSystemdBoot> machine: output:
vm-test-run-installer-simpleUefiSystemdBoot> (273.58 seconds)
vm-test-run-installer-simpleUefiSystemdBoot> error: command `nixos-install < /dev/null >&2' did not succeed (exit code 1)
vm-test-run-installer-simpleUefiSystemdBoot> (286.62 seconds)
vm-test-run-installer-simpleUefiSystemdBoot> command `nixos-install < /dev/null >&2' did not succeed (exit code 1)
vm-test-run-installer-simpleUefiSystemdBoot> cleaning up
vm-test-run-installer-simpleUefiSystemdBoot> killing machine (pid 9)
vm-test-run-installer-simpleUefiSystemdBoot> (0.00 seconds)
vm-test-run-installer-simpleUefiSystemdBoot> vde_switch: EOF on stdin, cleaning up and exiting
vm-test-run-installer-simpleUefiSystemdBoot> vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
vcunat commented 4 years ago

I don't know – locally I'm getting this :arrow_up: error even on a21c2fa3 which succeeded on Hydra (I tried several times).

vcunat commented 4 years ago

Careful (manual) bisect lead me to a surprising result:

first bad commit: [af808bd826c54b13a39e6538d7b5b655de0f3ae3] linux config: add support for xdp sockets and ebpf jit

I'll try a few more experiments now.

Many my steps failed for other reasons (e.g. libselinux not building), so I suspect that's where you were led astray. I did 2-3 attempts on every step to be surer, but I found no non-determinism on my machine.

lovesegfault commented 4 years ago

I had quite a few steps where the first attempt at building failed, but the second one succeeded, I was bisecting manually with this:

for run in {1..3}; do echo ">>>> Attempt No. $run"; nix build -f nixos/release.nix tests.installer.simpleUefiSystemdBoot.x86_64-linux; done
vcunat commented 4 years ago

I re-confirmed that reverting that commit atop master (0a901b4a) fixes the test for me. (Still no idea why so far.)

jtojnar commented 4 years ago

cc @magenbluten for the linux config change

vcunat commented 4 years ago

Narrowed down just to the BPF_JIT_ALWAYS_ON kernel option. I don't know... maybe the JIT has some bad interaction with the VM.

vcunat commented 4 years ago

For now I suppose I'll push a revert of that single option, unless someone suggests a better idea very soon.

lovesegfault commented 4 years ago

Sounds good to me