NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
16.69k stars 13.13k forks source link

Go 1.19 binaries that use `@resources` SystemCallFilter crashing on startup due to SECCOMP failure #197443

Closed tomfitzhenry closed 1 year ago

tomfitzhenry commented 1 year ago

Describe the bug

On 95aeaf83c247b8f5aa561684317ecd860476fcd6 (nixos-unstable), services.dnscrypt-proxy2 is crashing (core dumping) on startup, due to SECCOMP error.

Steps To Reproduce

Steps to reproduce the behavior:

  1. nix-build -A driverInteractive nixos/tests/dnscrypt-proxy2.nix && ./result/bin/nixos-test-driver
  2. start_all()
  3. test_script()

The tests succeeds https://hydra.nixos.org/build/196222051 but dnscrypt-proxy2 actually fails to start.

Excerpt from VM log:

client # [    7.179763] systemd[1]: Started Process Core Dump (PID 979/UID 0).
client # [    7.334395] systemd-coredump[980]: Process 974 (dnscrypt-proxy) of user 62396 dumped core.
client # 
client # Module linux-vdso.so.1 with build-id 8aef1613db87d17abfb3f09dccb11abfed4e95da
client # Module ld-linux-x86-64.so.2 with build-id 2d2d543cedf2d81d841c434bb7546559079cb6c2
client # Module libc.so.6 with build-id 28c673fe00b56ef505b898287c2654db0def666b
client # Module libpthread.so.0 with build-id cb028b537f0fdd26c58d2ef187ac92d0286066d3
client # Module dnscrypt-proxy without build-id.
client # Stack trace of thread 974:
client # #0  0x000000000040432e runtime/internal/syscall.Syscall6 (dnscrypt-proxy + 0x432e)
client # #1  0x00000000004b14fb syscall.RawSyscall (dnscrypt-proxy + 0xb14fb)
client # #2  0x00000000004b03cf syscall.Setrlimit (dnscrypt-proxy + 0xb03cf)
client # #3  0x00000000004da08e os.init.1 (dnscrypt-proxy + 0xda08e)
client # #4  0x00000000004472a6 runtime.doInit (dnscrypt-proxy + 0x472a6)
client # #5  0x00000000004471f1 runtime.doInit (dnscrypt-proxy + 0x471f1)
client # #6  0x00000000004471f1 runtime.doInit (dnscrypt-proxy + 0x471f1)
client # #7  0x0000000000439fd3 runtime.main (dnscrypt-proxy + 0x39fd3)
client # #8  0x00000000004683e1 runtime.goexit.abi0 (dnscrypt-proxy + 0x683e1)
client # ELF object binary architecture: AMD x86-64
client # 
client # [    7.350894] systemd[1]: dnscrypt-proxy2.service: Main process exited, code=dumped, status=31/SYS

From dmesg:

client # [  144.939671] Oct 23 22:19:27 client audit[669]: SECCOMP auid=4294967295 uid=62396 gid=62396 ses=4294967295 subj=unconfined pid=669 comm="dnscrypt-proxy" exe="/nix/store/fxb09q2lsswl9yzns32mjm4zhflmmwxp-dnscrypt-proxy2-2.1.2/bin/dnscrypt-proxy" sig=31 arch=c000003e syscall=160 compat=0 ip=0x40432e code=0x80000000

Expected behavior

dnscrypt-proxy2 should startup, and listen for DNS requests.

Notify maintainers

@joachifm

tomfitzhenry commented 1 year ago

https://github.com/NixOS/nixpkgs/pull/197379 looks like it could a fix for this? (Update: Confirmed this fixes it.)

Also, we should look into:

tomfitzhenry commented 1 year ago

why the test passes, despite dnscrypt-proxy2 failing to start.

I think dnscrypt-proxy.service does reach active state briefly, and so client.wait_for_unit("dnscrypt-proxy2") succeeds, but then the binary crashes when it runs the Setrlimit syscall.

https://github.com/NixOS/nixpkgs/blob/f36801e4052c4b50c4d1df591d28fe9e1992a54f/nixos/tests/dnscrypt-proxy2.nix should have stronger assertions, e.g. that dnscrypt-proxy2 manages to listen on port 43 (localPortProxy). I've raised a PR for this: https://github.com/NixOS/nixpkgs/pull/197450.

tomfitzhenry commented 1 year ago

what triggered this issue? I see no recent changes to dnscrypt-proxy2 pkg or service.

Hypothesis: A recent Go runtime update that now calls setrlimit? (Update: Reverting 0c7a6a0832b9531217d724a60ddd4377e841d68d. didn't stop the issue occurring)

MidAutumnMoon commented 1 year ago

It's probably introduced in Go 1.19 because I didn't find significant changes on systemd side. We'd expect this kind of issues popping out in the near future.

tomfitzhenry commented 1 year ago

It's probably introduced in Go 1.19 because I didn't find significant changes on systemd side. We'd expect this kind of issues popping out in the near future.

Confirmed. Changing dnscrypt2-proxy to use Go 1.18 fixes this (but allowing @resources syscalls is the better fix, as MidAutumnMoon has proposed).

https://github.com/golang/go/commit/8427429c592588af8c49522c76b3e0e0e335d270 introduces the setrlimit syscall in an init function (matching the stack trace), released in Go 1.19.

This issue should be closed once the following are merged:

xanderio commented 1 year ago

This also affects miniflux.

MidAutumnMoon commented 1 year ago

Some Go programs crashed but some didn't.

For example shiori has ~@resources set but still runs pretty fine. (However its tests failed for unknown reasons.)

MidAutumnMoon commented 1 year ago

cc @minijackson Could you take a look at shiori's tests?

MidAutumnMoon commented 1 year ago

cc @techknowlogick dex-oidc tests failed on my machine. Could you take a look?

MidAutumnMoon commented 1 year ago

cc @ehmry Could you take a look at yggdrasil's tests?

MidAutumnMoon commented 1 year ago

I think I've caught 'em all.

SuperSandro2000 commented 1 year ago

So, anything left?

MidAutumnMoon commented 1 year ago

So, anything left?

Nothing :)

tomfitzhenry commented 1 year ago

Great work @MidAutumnMoon for searching for all the occurrences of this, and fixing them before users noticed!

MidAutumnMoon commented 1 year ago

And thank @tomfitzhenry for sorting out this issue and reviewing changes.