NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
16.51k stars 13k forks source link

New switch-to-configuration implementation not consistent with perl implementation for failed units #312297

Open jmbaur opened 1 month ago

jmbaur commented 1 month ago

Describe the bug

@nyabinary reported that the new (opt-in) switch-to-configuration implementation reported systemd-networkd-wait-online.service as failed while the perl implementation did not. Output below:

restarting sysinit-reactivation.target
the following new units were started: sysinit-reactivation.target, systemd-tmpfiles-resetup.service
warning: the following units failed: systemd-networkd-wait-online.service
× systemd-networkd-wait-online.service - Wait for Network to be Configured
     Loaded: loaded (/etc/systemd/system/systemd-networkd-wait-online.service; enabled; preset: enabled)
    Drop-In: /nix/store/nlj3x6lpwgvkmnpj5zd8k7z6ynrgr5q1-system-units/systemd-networkd-wait-online.service.d
             └─overrides.conf
     Active: failed (Result: exit-code) since Thu 2024-05-16 09:35:39 EDT; 266ms ago
       Docs: man:systemd-networkd-wait-online.service(8)
    Process: 5664 ExecStart=/nix/store/9cxd17xnmw0bi8n4nf722ysqj2bjlh8s-systemd-255.4/lib/systemd/systemd-networkd-wait-online --timeout=120 (code=exited, status=1/FAILURE)
   Main PID: 5664 (code=exited, status=1/FAILURE)
         IP: 0B in, 0B out
        CPU: 4ms

May 16 09:33:39 nyan systemd[1]: Starting Wait for Network to be Configured...
May 16 09:35:39 nyan systemd-networkd-wait-online[5664]: Timeout occurred while waiting for network connectivity.
May 16 09:35:39 nyan systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
May 16 09:35:39 nyan systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
May 16 09:35:39 nyan systemd[1]: Failed to start Wait for Network to be Configured.
warning: error(s) occurred while switching to the new configuration

Steps To Reproduce

Steps to reproduce the behavior:

  1. Enable networking.useNetworkd and system.switch.enableNg
  2. Perform switch (with degraded network @nyabinary ?)
  3. ...

Expected behavior

Both implementations are consistent with one another.

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
output here

Add a :+1: reaction to issues you find important.

jmbaur commented 1 month ago

So one thing I did notice while implementing the rust switch-to-configuration is that the ability for the perl switch-to-configuration to report failed units is flawed. It is flawed in the sense that it only checks for units that have failed from the time that the script started/restarted/reloaded units, to the time it queries systemd for failed units. This means that if a unit fails further down the line, even if just a duration after the query, then the failure will not be reported. I left a note about this here. I am interested to know, is the current behavior actually preferred? The fact of there being a failed unit and going unnoticed seems tough.