NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.06k stars 14.11k forks source link

ananicy-cpp service failure on hardened kernel #327382

Closed MrQubo closed 3 months ago

MrQubo commented 3 months ago

Describe the bug

Ananicy systemd service fails. Here's the log from journal:

lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config apply_ioclass: true
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config apply_sched: true
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config cgroup_load: true
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config apply_ionice: true
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config apply_oom_score_adj: true
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config apply_latnice: false
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config log_applied_rule: false
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config apply_nice: true
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config apply_cgroup: true
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config type_load: true
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config rule_load: true
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config cgroup_realtime_workaround: false
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config loglevel: warn
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.509] [info] Config check_freq: 60
lip 15 16:11:27 work ananicy-cpp[1484]: Ananicy Cpp 1.1.1
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.597] [warning] Cgroups are not available on this platform (or are not enabled)
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.597] [warning] Cgroups are not available on this platform (or are not enabled)
lip 15 16:11:27 work ananicy-cpp[1484]: [2024-07-15 16:11:27.597] [warning] Cgroups are not available on this platform (or are not enabled)
lip 15 16:11:27 work ananicy-cpp[1484]: failed to attach BPF programs
lip 15 16:11:27 work systemd[1]: ananicy-cpp.service: Main process exited, code=dumped, status=11/SEGV

The service gets auto-restarted after that failure and the next error is different:

lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config apply_ioclass: true
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config apply_sched: true
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config cgroup_load: true
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config apply_ionice: true
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config apply_oom_score_adj: true
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config apply_latnice: false
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config log_applied_rule: false
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config apply_nice: true
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config apply_cgroup: true
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config type_load: true
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config rule_load: true
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config cgroup_realtime_workaround: false
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config loglevel: warn
lip 15 16:11:37 work ananicy-cpp[2625]: [2024-07-15 16:11:37.547] [info] Config check_freq: 60
lip 15 16:11:37 work ananicy-cpp[2625]: Ananicy Cpp 1.1.1
lip 15 16:11:37 work ananicy-cpp[2625]: Ananicy Cpp is already running!
lip 15 16:11:37 work systemd[1]: ananicy-cpp.service: Main process exited, code=exited, status=1/FAILURE

Also, here's the stack trace from coredump:

#0  0x00000030a76218d1 in bpf_program_init_events ()
#1  0x00000030a7610481 in ProcessQueue::init() ()
#2  0x00000030a75d0c8a in main ()

Steps To Reproduce

boot.kernelPackages = pkgs.linuxPackages_hardened;
services.ananicy = {
  enable = true;
  package = pkgs.ananicy-cpp;
  rulesProvider = pkgs.ananicy-rules-cachyos;
};

Notify maintainers

@Artturin @JohnRTitor @diniamo

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

 - system: `"x86_64-linux"`
 - host os: `Linux 6.6.32-hardened1, NixOS, 24.05 (Uakari), 24.05.2780.53e81e790209`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.4`
 - channels(nix): `""`
 - channels(root): `"home-manager-24.05.tar.gz, nixos-24.05, nixos-hardware, nixos-unstable, nur"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Add a :+1: reaction to issues you find important.

MrQubo commented 3 months ago

Works fine on non-hardened linux.

MrQubo commented 3 months ago

This also works fine:

boot.kernelPackages = pkgs.linuxPackages_hardened;
services.ananicy = {
  enable = true;
  package = pkgs.ananicy;
};

Seems like the issue is with ananicy-cpp.

JohnRTitor commented 3 months ago

Could you try with the latest kernel, though I am not sure how a kernel change could be impacting this.

EDIT: it is failing to load BPF programs, using the latest kernel (6.9+ as of now) should fix things.

MrQubo commented 3 months ago

I can't check 6.9 because I'm on zfs. With 6.8 hardened the error is still there.

JohnRTitor commented 3 months ago

You could try chaotic-nyx's cachyos hardended kernel. That is on 6.9 AFAIK.

s0me1newithhand7s commented 3 months ago

You could try chaotic-nyx's cachyos hardended kernel. That is on 6.9 AFAIK.

on regular linux-cachyos (6.9) works fine. i suppose will work on cachyos-hardenedas well, but if you mind - i'll test.

s0me1newithhand7s commented 3 months ago

core dumped..

s0me1newithhand7s commented 3 months ago

I can't check 6.9 because I'm on zfs. With 6.8 hardened the error is still there.

so. after some tests we came to a suggestion that this is hardened kernel issue: image i've and Luis tested it out. on cachyos-6.9.9 on my end nothing gone wrong, benchmark done and done well. as @JohnRTitor suggested - you can try linux-cachyos from chaotic-nyx flake, but not hardened one. :D

JohnRTitor commented 3 months ago

To be honest any non-hardened kernel should work.

MrQubo commented 3 months ago

Yeah, works fine with default pkgs.linuxPackages_hardened.

I think it's possible to build with netlink instead of bpf. Maybe this would work on hardened? But I don't know cmake to well, I'm not sure how to change this flag? nvm, it's -DUSE_BPF_PROC_IMPL=OFF https://gitlab.com/ananicy-cpp/ananicy-cpp/-/blob/097d79fd14607d3bce1021aa8b08a49c82c3222d/CMakeLists.txt#L192-200

MrQubo commented 3 months ago

I can't check 6.9 because I'm on zfs. With 6.8 hardened the error is still there.

so. after some tests we came to a suggestion that this is hardened kernel issue: image i've and Luis tested it out. on cachyos-6.9.9 on my end nothing gone wrong, benchmark done and done well. as @JohnRTitor suggested - you can try linux-cachyos from chaotic-nyx flake, but not hardened one. :D

That's not it. On nixos I have CONFIG_BPF_SYSCALL=y in /proc/config.gz with linuxPackages_6_8_hardened.

s0me1newithhand7s commented 3 months ago

I can't check 6.9 because I'm on zfs. With 6.8 hardened the error is still there.

so. after some tests we came to a suggestion that this is hardened kernel issue: image i've and Luis tested it out. on cachyos-6.9.9 on my end nothing gone wrong, benchmark done and done well. as @JohnRTitor suggested - you can try linux-cachyos from chaotic-nyx flake, but not hardened one. :D

That's not it. On nixos I have CONFIG_BPF_SYSCALL=y in /proc/config.gz with linuxPackages_6_8_hardened.

'k!

MrQubo commented 3 months ago

diff 6.8 6.8.hardened

*BTF* ones might be important?

```diff 3c3 < # Linux/x86_64 6.8.12 Kernel Configuration --- > # Linux/x86_64 6.8.11-hardened1 Kernel Configuration 231a232 > # CONFIG_USER_NS_UNPRIVILEGED is not set 257c258 < CONFIG_UID16=y --- > # CONFIG_UID16 is not set 260c261 < CONFIG_SYSFS_SYSCALL=y --- > # CONFIG_SYSFS_SYSCALL is not set 449,450d449 < CONFIG_X86_16BIT=y < CONFIG_X86_ESPFIX64=y 470d468 < CONFIG_ARCH_PROC_KCORE_TEXT=y 526,527c524,525 < CONFIG_LEGACY_VSYSCALL_XONLY=y < # CONFIG_LEGACY_VSYSCALL_NONE is not set --- > # CONFIG_LEGACY_VSYSCALL_XONLY is not set > CONFIG_LEGACY_VSYSCALL_NONE=y 529c527 < CONFIG_MODIFY_LDT_SYSCALL=y --- > # CONFIG_MODIFY_LDT_SYSCALL is not set 634c632 < CONFIG_ACPI_CUSTOM_METHOD=m --- > # CONFIG_ACPI_CUSTOM_METHOD is not set 892c890 < CONFIG_ARCH_MMAP_RND_BITS=28 --- > CONFIG_ARCH_MMAP_RND_BITS=32 894c892 < CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8 --- > CONFIG_ARCH_MMAP_RND_COMPAT_BITS=16 913c911 < # CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set --- > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT=y 946c944 < # CONFIG_GCC_PLUGIN_LATENT_ENTROPY is not set --- > CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y 960c958,959 < # CONFIG_MODVERSIONS is not set --- > CONFIG_MODVERSIONS=y > CONFIG_ASM_MODVERSIONS=y 1100c1099 < CONFIG_SLAB_MERGE_DEFAULT=y --- > # CONFIG_SLAB_MERGE_DEFAULT is not set 1102a1102 > CONFIG_SLAB_CANARY=y 1108c1108 < # CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set --- > CONFIG_SHUFFLE_PAGE_ALLOCATOR=y 1144c1144 < CONFIG_DEFAULT_MMAP_MIN_ADDR=4096 --- > CONFIG_DEFAULT_MMAP_MIN_ADDR=65536 1289,1293c1289 < CONFIG_INET_DIAG=m < CONFIG_INET_TCP_DIAG=m < CONFIG_INET_UDP_DIAG=m < CONFIG_INET_RAW_DIAG=m < CONFIG_INET_DIAG_DESTROY=y --- > # CONFIG_INET_DIAG is not set 1316a1313 > # CONFIG_TCP_SIMULT_CONNECT_DEFAULT_ON is not set 1350d1346 < CONFIG_INET_MPTCP_DIAG=m 1711d1706 < CONFIG_INET_DCCP_DIAG=m 1733d1727 < CONFIG_INET_SCTP_DIAG=m 2397c2391 < # CONFIG_RESET_ATTACK_MITIGATION is not set --- > CONFIG_RESET_ATTACK_MITIGATION=y 4520c4514 < CONFIG_LEGACY_TIOCSTI=y --- > # CONFIG_LEGACY_TIOCSTI is not set 4629c4623 < CONFIG_DEVMEM=y --- > # CONFIG_DEVMEM is not set 4631c4625 < CONFIG_DEVPORT=y --- > # CONFIG_DEVPORT is not set 9495,9496c9489,9490 < # CONFIG_IOMMU_DEFAULT_DMA_STRICT is not set < CONFIG_IOMMU_DEFAULT_DMA_LAZY=y --- > CONFIG_IOMMU_DEFAULT_DMA_STRICT=y > # CONFIG_IOMMU_DEFAULT_DMA_LAZY is not set 10632a10627 > # CONFIG_OVERLAY_FS_UNPRIVILEGED is not set 10679c10674 < CONFIG_PROC_KCORE=y --- > # CONFIG_PROC_KCORE is not set 10808d10802 < CONFIG_NFS_DEBUG=y 10940c10934,10936 < # CONFIG_SECURITY_DMESG_RESTRICT is not set --- > CONFIG_SECURITY_DMESG_RESTRICT=y > CONFIG_SECURITY_PERF_EVENTS_RESTRICT=y > CONFIG_SECURITY_TIOCSTI_RESTRICT=y 10970c10966 < # CONFIG_SECURITY_SAFESETID is not set --- > CONFIG_SECURITY_SAFESETID=y 10997,10999c10993,10999 < # CONFIG_GCC_PLUGIN_STACKLEAK is not set < # CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set < # CONFIG_INIT_ON_FREE_DEFAULT_ON is not set --- > CONFIG_GCC_PLUGIN_STACKLEAK=y > # CONFIG_GCC_PLUGIN_STACKLEAK_VERBOSE is not set > CONFIG_STACKLEAK_TRACK_MIN_SIZE=100 > # CONFIG_STACKLEAK_METRICS is not set > # CONFIG_STACKLEAK_RUNTIME_DISABLE is not set > CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y > CONFIG_INIT_ON_FREE_DEFAULT_ON=y 11001c11001,11003 < # CONFIG_ZERO_CALL_USED_REGS is not set --- > CONFIG_ZERO_CALL_USED_REGS=y > CONFIG_PAGE_SANITIZE_VERIFY=y > CONFIG_SLAB_SANITIZE_VERIFY=y 11008c11010 < # CONFIG_BUG_ON_DATA_CORRUPTION is not set --- > CONFIG_BUG_ON_DATA_CORRUPTION=y 11011c11013 < CONFIG_RANDSTRUCT_NONE=y --- > # CONFIG_RANDSTRUCT_NONE is not set 11013c11015,11017 < # CONFIG_RANDSTRUCT_PERFORMANCE is not set --- > CONFIG_RANDSTRUCT_PERFORMANCE=y > CONFIG_RANDSTRUCT=y > CONFIG_GCC_PLUGIN_RANDSTRUCT=y 11032d11035 < CONFIG_CRYPTO_SIG=y 11524d11526 < CONFIG_DEBUG_INFO_BTF=y 11527,11528d11528 < CONFIG_DEBUG_INFO_BTF_MODULES=y < CONFIG_MODULE_ALLOW_BTF_MISMATCH=y 11535a11536 > # CONFIG_DEBUG_WRITABLE_FUNCTION_POINTERS_VERBOSE is not set 11551c11552 < CONFIG_DEBUG_FS_ALLOW_ALL=y --- > # CONFIG_DEBUG_FS_ALLOW_ALL is not set 11553c11554 < # CONFIG_DEBUG_FS_ALLOW_NONE is not set --- > CONFIG_DEBUG_FS_ALLOW_NONE=y 11557c11558,11568 < # CONFIG_UBSAN is not set --- > CONFIG_UBSAN=y > CONFIG_UBSAN_TRAP=y > CONFIG_CC_HAS_UBSAN_BOUNDS_STRICT=y > CONFIG_UBSAN_BOUNDS=y > CONFIG_UBSAN_BOUNDS_STRICT=y > CONFIG_UBSAN_SHIFT=y > # CONFIG_UBSAN_DIV_ZERO is not set > CONFIG_UBSAN_BOOL=y > CONFIG_UBSAN_ENUM=y > CONFIG_UBSAN_SANITIZE_ALL=y > CONFIG_TEST_UBSAN=m 11599c11610 < # CONFIG_DEBUG_VIRTUAL is not set --- > CONFIG_DEBUG_VIRTUAL=y 11615a11627 > CONFIG_KFENCE_BUG_ON_DATA_CORRUPTION=y 11624,11626c11636,11638 < # CONFIG_PANIC_ON_OOPS is not set < CONFIG_PANIC_ON_OOPS_VALUE=0 < CONFIG_PANIC_TIMEOUT=0 --- > CONFIG_PANIC_ON_OOPS=y > CONFIG_PANIC_ON_OOPS_VALUE=1 > CONFIG_PANIC_TIMEOUT=-1 11688,11690c11700,11702 < # CONFIG_DEBUG_PLIST is not set < # CONFIG_DEBUG_SG is not set < # CONFIG_DEBUG_NOTIFIERS is not set --- > CONFIG_DEBUG_PLIST=y > CONFIG_DEBUG_SG=y > CONFIG_DEBUG_NOTIFIERS=y 11766d11777 < CONFIG_PROBE_EVENTS_BTF_ARGS=y 11795,11796c11806 < CONFIG_STRICT_DEVMEM=y < CONFIG_IO_STRICT_DEVMEM=y --- > # CONFIG_STRICT_DEVMEM is not set ```
VeilSilence commented 3 months ago

In order ananicy-cpp to function, make sure that kernelparams "debugfs=off" is not set.

MrQubo commented 3 months ago

Thanks @VeilSilence!

debugfs=off is the default on hardened. Adding debugfs=on to cmdline fixes the issue. Doesn't work with debugfs=no-mount.

MrQubo commented 3 months ago

Compiling with cmake flag -DUSE_BPF_PROC_IMPL=OFF also makes it work (with the default debugfs=off).

pkgs.ananicy-cpp.overrideAttrs (prevAttrs: { cmakeFlags = (lib.remove "-DUSE_BPF_PROC_IMPL=ON" prevAttrs.cmakeFlags) ++ [ "-DUSE_BPF_PROC_IMPL=OFF" ]; })
MrQubo commented 3 months ago

Seems like debugfs is not required, only tracefs, which can be enabled with

fileSystems."/sys/kernel/tracing" = {
  device = "tracefs";
  fsType = "tracefs";
};
VeilSilence commented 3 months ago

Seems like debugfs is not required, only tracefs, which can be enabled with

fileSystems."/sys/kernel/tracing" = {
  device = "tracefs";
  fsType = "tracefs";
};

Good to know. Maybe i'll disable once again debugfs.

MrQubo commented 3 months ago

So I think the fix should be to disable "-DUSE_BPF_PROC_IMPL=ON" on hardened kernel?

MrQubo commented 3 months ago

I've realized that we cannot do the check whether kernel is hardened or not in package itself. I was thinking, that the best course of action would be to create nixos module for ananicy-cpp.

s0me1newithhand7s commented 3 months ago

I've realized that we cannot do the check whether kernel is hardened or not in package itself. I was thinking, that the best course of action would be to create nixos module for ananicy-cpp.

sounds actually good, i agree with it. :D

JohnRTitor commented 3 months ago

Please test #330488, just override bpfSupport = false;

MrQubo commented 3 months ago

@JohnRTitor I disabled debugfs and tracefs, and tested it like this:

services.ananicy = {
  enable = true;
  package = 
    let
      ananicy-cpp = (import (builtins.fetchTarball "https://github.com/NixOS/nixpkgs/archive/refs/pull/330488/head.tar.gz") {}).ananicy-cpp;
    in ananicy-cpp.override ({ withBpf = false; });
  rulesProvider = pkgs.ananicy-rules-cachyos;
};

No more errors.

MrQubo commented 3 months ago

The problem with debugs/tracefs is that it seems like there's no nixos way to check for debufs/tracefs.

I was thinking of adding option services.ananicy-cpp.withBpf which would default to false on hardened and true otherwise. With boot.tracefs we could add an assert for withBpf -> (boot.tracefs.enabled || boot.debugfs.enabled).

JohnRTitor commented 3 months ago

No need to add an additional module. You can check if services.ananicy.package's pname == pkgs.ananicy-cpp's pname AND config.boot.kernelPackages.isHardened == true, then the override should be applied.

You could send the patch here and I'll commit it for you, or send another PR after I merge this.

MrQubo commented 3 months ago

I think it would be better to expose withBpf as module option so it can be documented properly for the users.

JohnRTitor commented 3 months ago

331722 should disable BPF in Ananicy-CPP by default if hardened is enabled.