chaotic-cx / nyx

Nix flake for "too much bleeding-edge" and unreleased packages (e.g., mesa_git, linux_cachyos, firefox_nightly, sway_git, gamescope_git). And experimental modules (e.g., HDR, duckdns).
MIT License
343 stars 37 forks source link

[Bug] Error when starting schedulers manually #763

Closed Jaage closed 4 months ago

Jaage commented 4 months ago

What happens?

When attempting to start any scheduler manually, like sudo scx_rusty I receive this error:

jjh@nixos:~/ > sudo scx_lavd
Error: Failed to attach struct ops

Caused by:
    bpf call "libbpf_rs::map::Map::attach_struct_ops::{{closure}}" returned NULL

jjh@nixos:~/ > sudo scx_rusty
01:09:32 [INFO] NUMA[00] mask= 0b1111111111111111
01:09:32 [INFO]   DOM[00] mask= 0b1111111111111111
Error: Failed to attach struct ops

Caused by:
    bpf call "libbpf_rs::map::Map::attach_struct_ops::{{closure}}" returned NULL

What is expected to happen?

╰─λ sudo scx_rusty
21:38:53 [INFO] CPUs: online/possible = 24/32
21:38:53 [INFO] DOM[00] cpumask 00000000FF03F03F (20 cpus)
21:38:53 [INFO] DOM[01] cpumask 0000000000FC0FC0 (12 cpus)
21:38:53 [INFO] Rusty Scheduler Attached

If possible, please attach logs

No response

More information

Here are the relevant parts of my configuration.nix:

boot.kernelPackages = pkgs.linuxPackages_cachyos;
chaotic.scx.enable = true;
chaotic.scx.scheduler = "scx_lavd";

And my flake.nix:

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
    nixpkgs-mozilla.url = "github:mozilla/nixpkgs-mozilla";
    nil.url = "github:oxalica/nil";
    xremap-flake.url = "github:xremap/nix-flake";
    chaotic.url = "github:chaotic-cx/nyx/nyxpkgs-unstable";
#    stylix.url = "github:danth/stylix";

  outputs = {
  } @ inputs: {
    nixosConfigurations.nixos = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      specialArgs = {inherit inputs;};
      modules = [
#        inputs.stylix.nixosModules.stylix
        inputs.xremap-flake.nixosModules.default {
          system.stateVersion = "23.11";
          services.xremap.yamlConfig = ''
              - name: Caps Lock to Right Control for shortcut purposes
                  CapsLock: rightctrl
              - name: Miscellaneous Shortcuts
                  rightctrl-i: up
                  rightctrl-j: left
                  rightctrl-k: down
                  rightctrl-l: right
                  rightctrl-o: backspace
                  rightctrl-u: delete
                  rightctrl-f: home
                  rightctrl-semicolon: end
PedroHLC commented 4 months ago

In your configuration you're starting lavd as a service, is it properly running? (systemctl status scx)

Jaage commented 4 months ago

It appears to be:

jjh@nixos:~/ > systemctl status scx
● scx.service - scheduler daemon
     Loaded: loaded (/etc/systemd/system/scx.service; enabled; preset: enabled)
     Active: active (running) since Wed 2024-06-19 18:05:43 PDT; 13h ago
   Main PID: 1045 (scx_lavd)
         IP: 0B in, 0B out
         IO: 4.3M read, 14.0M written
      Tasks: 1 (limit: 18989)
     Memory: 3.4M (peak: 28.0M swap: 20.4M swap peak: 20.4M zswap: 1.4M)
        CPU: 5.971s
     CGroup: /system.slice/scx.service
             └─1045 /nix/store/4i0h79jz9z7w10zm9907v02clhf5pwk2-scx-unstable-20240429-b1bb2a5c5/bin/scx_lavd

Jun 20 07:32:12 nixos scx_lavd[1045]: 14:32:12 [INFO] |     48373 |     8750 | IPC:CServiceEng   |    1 |   -1 |         0 |         0 |    3000000 |         6 |       21 |      42 |      37 |      43 |      48 |           20 |       1 |  >
Jun 20 07:32:13 nixos scx_lavd[1045]: 14:32:13 [INFO] |     48374 |      644 | systemd-journal   |   14 |   -1 |         0 |         0 |    3000000 |        31 |       22 |      42 |      40 |      43 |      47 |           20 |       2 |  >
Jun 20 07:32:14 nixos scx_lavd[1045]: 14:32:14 [INFO] |     48375 |      644 | systemd-journal   |    9 |   -1 |         0 |         0 |    3000000 |         4 |       21 |      42 |      33 |      43 |      48 |           20 |       1 |  >
Jun 20 07:32:15 nixos scx_lavd[1045]: 14:32:15 [INFO] |     48376 |      644 | systemd-journal   |    9 |   -1 |         0 |         0 |    3000000 |       805 |       20 |      42 |      22 |      43 |      48 |           20 |       0 |  >
Jun 20 07:32:16 nixos scx_lavd[1045]: 14:32:16 [INFO] |     48377 |      644 | systemd-journal   |    2 |    2 |   6015000 |    143846 |     911519 |      6127 |       19 |      42 |       5 |      43 |      48 |           20 |      -1 |  >
Jun 20 07:32:17 nixos scx_lavd[1045]: 14:32:17 [INFO] |     48378 |      644 | systemd-journal   |    2 |    2 |   6015000 |    103936 |    3000000 |      2509 |       19 |      42 |      17 |      43 |      49 |           20 |      -1 |  >
Jun 20 07:32:18 nixos scx_lavd[1045]: 14:32:18 [INFO] |     48379 |      644 | systemd-journal   |    2 |    2 |   6015000 |    209484 |    3000000 |      2843 |       19 |      42 |      26 |      43 |      50 |           20 |      -1 |  >
Jun 20 07:32:19 nixos scx_lavd[1045]: 14:32:19 [INFO] |     48380 |      644 | systemd-journal   |    2 |    2 |   6015000 |    331403 |    3000000 |      2566 |       19 |      42 |      17 |      43 |      49 |           20 |      -1 |  >
Jun 20 07:32:20 nixos scx_lavd[1045]: 14:32:20 [INFO] |     48381 |      644 | systemd-journal   |   12 |   -1 |         0 |         0 |    3000000 |       595 |       20 |      42 |      25 |      43 |      51 |           20 |       0 |  >
Jun 20 07:32:21 nixos scx_lavd[1045]: 14:32:21 [INFO] |     48382 |      644 | systemd-journal   |   12 |   -1 |         0 |         0 |    3000000 |       142 |       22 |      42 |      40 |      43 |      49 |           20 |       2 |  >
s0me1newithhand7s commented 4 months ago

attempt to reproduce by hand7s:

**error**: ## System: ![240620_17-34-1718894081]( ## `flake.nix`: ![240620_17-30-1718893824]( ## `chaotic.nix` with `scx`: ![240620_17-31-1718893866](
## Logs: ```shell ~ 𝑠𝑛𝑜𝑤𝑦 𝑝𝑙𝑎𝑐𝑒, 𝑓𝑢𝑙𝑙 𝑜𝑓 𝑓𝑙𝑎𝑘𝑒𝑠! ❄️ ╭──╴ on NixOS 24.11.0 ❄️ ┆ ~ ╰─> sudo systemctl stop scx ❲✓ ❳ at ❗ [17:32] [sudo] password for hand7s: ╭──╴ on NixOS 24.11.0 ❄️ ┆ ~ ╰─> sudo scx_lavd took 1s ❲✓ ❳ at ❗ [17:32] 14:32:37 [INFO] scx_lavd scheduler is initialized 14:32:37 [INFO] Note that scx_lavd currently is not optimized for multi-CCX/NUMA architectures. 14:32:37 [INFO] Stay tuned for future improvements! 14:32:37 [INFO] scx_lavd scheduler starts running. 14:32:38 [INFO] | mseq | pid | comm | cpu | vtmc | vddln_ns | elglty_ns | slice_ns | grdy_rt | lat_prio | avg_lc | static_prio | lat_bst | slice_bst | run_freq | run_tm_ns | wait_freq | wake_freq | perf_cri | avg_pc | cpu_util | sys_ld | 14:32:38 [INFO] | 1 | 18529 | kworker/u50:5 | 11 | -1 | 0 | 0 | 250000 | 0 | 20 | 0 | 20 | 0 | 0 | 1 | 15000000 | 0 | 0 | 24 | 0 | 0 | 0 | 14:32:39 [INFO] | 2 | 18529 | kworker/u50:5 | 6 | -1 | 0 | 0 | 15000000 | 0 | 15 | 36 | 20 | -5 | 0 | 92 | 109435 | 368 | 0 | 33 | 34 | 46 | 0 | 14:32:40 [INFO] | 3 | 18529 | kworker/u50:5 | 3 | -1 | 0 | 0 | 15000000 | 409 | 18 | 40 | 20 | -2 | 0 | 1764 | 38230 | 2685 | 379 | 43 | 39 | 25 | 0 | 14:32:41 [INFO] | 4 | 18529 | kworker/u50:5 | 7 | -1 | 0 | 0 | 15000000 | 321 | 18 | 40 | 20 | -2 | 0 | 1874 | 30852 | 2402 | 5313 | 47 | 39 | 6 | 0 | 14:32:42 [INFO] | 5 | 18528 | kworker/u50:4 | 0 | -1 | 0 | 0 | 15000000 | 35 | 20 | 40 | 20 | 0 | 0 | 17 | 4768573 | 153 | 255 | 38 | 39 | 8 | 0 | 14:32:43 [INFO] | 6 | 18528 | kworker/u50:4 | 3 | 5 | 24210000 | 3906 | 4223287 | 48152 | 18 | 40 | 20 | -2 | 0 | 62 | 2685530 | 7571 | 7519 | 36 | 40 | 18 | 0 | ^C14:32:44 [INFO] | 7 | 169 | kworker/u49:6 | 0 | 7 | 19290000 | 0 | 15000000 | 1729 | 17 | 40 | 20 | -3 | 0 | 2645 | 32953 | 2342 | 0 | 33 | 39 | 13 | 3 | EXIT: Scheduler unregistered from user space ╭──╴ on NixOS 24.11.0 ❄️ ┆ ~ ╰─> sudo scx_rusty took 7s ❲✓ ❳ at ❗ [17:32] 14:32:48 [INFO] NUMA[00] mask= 0b111111111111 14:32:48 [INFO] DOM[00] mask= 0b000111000111 14:32:48 [INFO] DOM[01] mask= 0b111000111000 14:32:48 [INFO] Rusty Scheduler Attached 14:32:50 [INFO] cpu= 1.25 bal=0 numa_load_avg= 0.12 dom_load_avg= 0.06 task_err=0 lb_data_err=0 proc=0ms 14:32:50 [INFO] tot= 4140 wsync= 6.91 prev_idle=74.20 greedy_idle= 3.14 pin= 0.00 14:32:50 [INFO] dir=12.68 dir_greedy= 0.00 dir_greedy_far= 0.00 14:32:50 [INFO] dsq= 3.04 greedy_local= 0.02 greedy_xnuma= 0.00 14:32:50 [INFO] kick_greedy= 0.00 rep= 0.00 14:32:50 [INFO] dl_clamped= 0.24 dl_preset= 2.83 14:32:50 [INFO] slice_length=20000us 14:32:50 [INFO] direct_greedy_cpumask=0b111111111111 14:32:50 [INFO] kick_greedy_cpumask=0b111111111111 14:32:50 [INFO] NODE[00] load=0.12 imbal=+0.00 load_delta=+0.00 14:32:50 [INFO] DOMAIN[00] load=0.05 imbal=-0.01 load_delta=+0.01 14:32:50 [INFO] DOMAIN[01] load=0.07 imbal=+0.01 load_delta=-0.01 14:32:52 [INFO] cpu= 0.63 bal=2 numa_load_avg= 0.11 dom_load_avg= 0.05 task_err=0 lb_data_err=0 proc=0ms 14:32:52 [INFO] tot= 2382 wsync=11.42 prev_idle=57.09 greedy_idle= 1.43 pin= 0.00 14:32:52 [INFO] dir=22.46 dir_greedy= 0.21 dir_greedy_far= 0.59 14:32:52 [INFO] dsq= 6.72 greedy_local= 0.08 greedy_xnuma= 0.00 14:32:52 [INFO] kick_greedy= 0.17 rep= 0.04 14:32:52 [INFO] dl_clamped= 0.97 dl_preset= 5.84 14:32:52 [INFO] slice_length=20000us 14:32:52 [INFO] direct_greedy_cpumask=0b111111111111 14:32:52 [INFO] kick_greedy_cpumask=0b111111111111 14:32:52 [INFO] NODE[00] load=0.11 imbal=+0.00 load_delta=+0.00 14:32:52 [INFO] DOMAIN[00] load=0.06 imbal=+0.00 load_delta=-0.01 14:32:52 [INFO] DOMAIN[01] load=0.05 imbal=-0.00 load_delta=+0.01 ^CEXIT: Scheduler unregistered from user space ╭──╴ on NixOS 24.11.0 ❄️ ┆ ~ ╰─> ``` ## Status of `scx` ![240620_17-33-1718894033](

PedroHLC commented 4 months ago

It appears to be:

I'm unable to reproduce, like @s0me1newithhand7s (thank you).

@Jaage are you stopping the service before manually starting scx_rusty?

Jaage commented 4 months ago

I did test manually stopping the service using systemctl stop scx and then starting it, to no avail.

I am not able to reproduce this on my desktop, only my legion lenovo laptop. Perhaps that has something to do with it. I won't have it with me for a while.

Desktop specs where it works:

          ▜███▙       ▜███▙  ▟███▛             -----------
           ▜███▙       ▜███▙▟███▛              OS: NixOS 24.11.20240605.e8057b6 (Vicuña) x86_64
            ▜███▙       ▜██████▛               Host: Z690 AORUS MASTER (-CF)
     ▟█████████████████▙ ▜████▛     ▟▙         Kernel: Linux 6.9.3-cachyos
    ▟███████████████████▙ ▜███▙    ▟██▙        Uptime: 6 mins
           ▄▄▄▄▖           ▜███▙  ▟███▛        Packages: 1750 (nix-system)
          ▟███▛             ▜██▛ ▟███▛         Shell: zsh 5.9
         ▟███▛               ▜▛ ▟███▛          Display (XB273U): 2560x1440 @ 144Hz (as 1280x720)
▟███████████▛                  ▟██████████▙    Display (LG TV SSCR2): 3840x2160 @ 120Hz (as 1920x1080) []
▜██████████▛                  ▟███████████▛    DE: KDE Plasma
      ▟███▛ ▟▙               ▟███▛             WM: KWin (Wayland)
     ▟███▛ ▟██▙             ▟███▛              WM Theme: plastik
    ▟███▛  ▜███▙           ▝▀▀▀▀               Theme: Breeze (GentlyColorDarkCyan) [QT]
    ▜██▛    ▜███▙ ▜██████████████████▛         Icons: breeze-dark [QT], breeze-dark [GTK2/3/4]
     ▜▛     ▟████▙ ▜████████████████▛          Font: Noto Sans (10pt) [QT], Noto Sans (10pt) [GTK2/3/4]
           ▟██████▙       ▜███▙                Cursor: breeze (24px)
          ▟███▛▜███▙       ▜███▙               Terminal: foot 1.17.2
         ▟███▛  ▜███▙       ▜███▙              Terminal Font: monospace (8pt)
         ▝▀▀▀    ▀▀▀▀▘       ▀▀▀▘              CPU: 13th Gen Intel(R) Core(TM) i9-13900K (32) @ 5.80 GHz
                                               GPU 1: NVIDIA GeForce RTX 4090 [Discrete]
                                               GPU 2: Intel UHD Graphics 770 @ 1.65 GHz [Integrated]
                                               Memory: 3.56 GiB / 94.06 GiB (4%)
                                               Swap: 0 B / 953.00 MiB (0%)
                                               Disk (/): 279.21 GiB / 3.64 TiB (8%) - xfs
                                               Local IP (enp6s0): *
                                               Locale: en_US.UTF-8
PedroHLC commented 4 months ago

only my legion lenovo laptop

When you have it, make sure to update it (scx 0.1.10 and kernel 6.9.5), I can see by your logs you're using an older one (scx unstable from before 0.1.10 and kernel 6.9.3).

On 0.1.10, this commit is in: -- and it throws an error about running multiple schedulers instead of a backtrace that looks identical to yours.

Jaage commented 4 months ago

Fixed after updating the system.