NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.81k stars 13.91k forks source link

Kea logging fails due to ignored environment variable in systemd service and inability to find lockfile. #265826

Closed kniteli closed 9 months ago

kniteli commented 11 months ago

Suspected cause and symptom

See here for a recent change which pointed kea to separate lockfiles for the various individual services.

https://github.com/NixOS/nixpkgs/blob/85f1ba3e51676fa8cc604a3d863d729026a6b8eb/nixos/modules/services/networking/kea.nix#L331-L334

It results in this symptom in logs and no logging from kea:

Nov 06 00:45:37 hostname kea-dhcp4[27517]: Unable to use interprocess sync lockfile (No such file or directory): /var/run/kea/logger_lockfile
Nov 06 00:45:37 hostname kea-dhcp4[27517]: Unable to use interprocess sync lockfile (No such file or directory): /var/run/kea/logger_lockfile
Nov 06 00:45:37 hostname kea-dhcp4[27517]: Unable to use interprocess sync lockfile (No such file or directory): /var/run/kea/logger_lockfile
Nov 06 00:45:37 hostname kea-dhcp4[27517]: Unable to use interprocess sync lockfile (No such file or directory): /var/run/kea/logger_lockfile
Nov 06 00:45:37 hostname kea-dhcp4[27517]: Unable to use interprocess sync lockfile (No such file or directory): /var/run/kea/logger_lockfile
Nov 06 00:45:37 hostname kea-dhcp4[27517]: Unable to use interprocess sync lockfile (No such file or directory): /var/run/kea/logger_lockfile

I did some investigating and can verify the following:

  1. The env variable being set is correct according to the documentation and the actual source for kea (see here) - for reference this is KEA_LOCKFILE_DIR
  2. Looking at /proc/<PID>/environs confirms the process was started with the env variable in place.
  3. The default if that variable is not available, according to the kea source above, is to use the StateDirectory setting, which I believe is /var/lib/kea currently, which is where it is actually looking according to the logs. This implies the value of that environment variable is null at the time a lock is attempted.

I spent some time looking for the root cause but nothing stood out to me and I'm not familiar with the idiosyncrasies of systemd. Simple workaround is to just override the StateDirectory to match the new change. My approach to that:

{ config, pkgs, lib, ... }:
let
  kea = config.services.kea;
in
{
  imports = [];
  options = {};
  config =  lib.mkIf (kea.ctrl-agent.enable || kea.dhcp4.enable || kea.dhcp6.enable || kea.dhcp-ddns.enable) (lib.mkMerge [
    (lib.mkIf kea.ctrl-agent.enable {
      systemd.services.kea-ctrl-agent.serviceConfig.StateDirectory = lib.mkForce "kea-ctrl-agent";
    })
    (lib.mkIf kea.dhcp4.enable {
      systemd.services.kea-dhcp4-server.serviceConfig.StateDirectory = lib.mkForce "kea-dhcp4";
    })
    (lib.mkIf kea.dhcp6.enable {
      systemd.services.kea-dhcp6-server.serviceConfig.StateDirectory = lib.mkForce "kea-dhcp6";
    })
    (lib.mkIf kea.dhcp-ddns.enable {
      systemd.services.kea-dhcp-ddns-server.serviceConfig.StateDirectory = lib.mkForce "kea-dhcp-ddns";
    })
  ]);
}
mweinelt commented 11 months ago

Feel free to send a PR and request my review.

jcollie commented 10 months ago

I don't think that changing the StateDirectory to be different for each service is the proper fix. What worked for me is changing the RuntimeDirectory to be "kea" for all the Kea services. I also set RuntimeDirectoryPreserve=true so that systemd doesn't delete the runtime directory if only one service restarts.

mweinelt commented 10 months ago

@jcollie That sounds like a better approach. Will send a PR in a minute.