acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
4.01k stars 736 forks source link

One-off symlink resolution causes fragile setups on NixOS #2393

Closed ctheune closed 8 months ago

ctheune commented 8 months ago

Describe the bug

NixOS uses a lot of symlinks to scripts and binaries that will have their underlying targets change relatively often. This can result in an active configuration referring to a path like /run/current-system/sw/bin/myscript being resolved on startup/reload then refer to something like /nix/store/<cryptographichash>-myscript/bin/myscript where the latter might be changing every now and then and get cleaned up from garbage collection a while later.

From a theoretical standpoint one could (and maybe should) perfectly manage those dependencies and ensure that resolved store paths are inserted into the config file and then this automatically becomes a trigger to execute a config reload.

Unfortunately, doing something perfect in this case doesn't prohibit one to (human brains have limits) accidentally insert an implicit dependency that works and then suddenly fails a few months later without a chance of noticing that it broke.

So, from a sysadmin perspective, keepalived robs me of the ability to use symlinks even when I'm ensuring that they aren't writable by non-root users. Disabling the script_security flag doesn't help either.

The security aspect of what keepalived is doing here is interesting because if we perform the script security checks but do not statically resolve symlinks, then the question whether something in the path is writable by non-root can change over time. However, that sounds like a sysadmin failure on a different level where countering this within keepalived kills orthogonality/composability of basic Unix features.

To Reproduce Any steps necessary to reproduce the behaviour:

Expected behavior A clear and concise description of what you expected to happen.

Keepalived version

Keepalived v2.2.2 (01/01,1970)

Copyright(C) 2001-1970 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 5.12.0
Running on Linux 5.10.105 #1-NixOS SMP Fri Mar 11 11:11:55 UTC 2022
Distro: NixOS 21.05 (Okapi)

configure options: --disable-dependency-tracking --prefix=/nix/store/zf5mhr350m4r4hsprgw8ak2gy4kqv4ds-keepalived-2.2.2 --enable-sha1 --enable-snmp PKG_CONFIG=pkg-config PKG_CONFIG_PATH=/nix/store/sqpkyjykx8rd1axgzip6kxkkfvfrbzyk-libnfnetlink-1.0.1/lib/pkgconfig:/nix/store/xkwg4p2cf4z5c2n2x8c0a11qxmiayiy1-libnl-3.5.0-dev/lib/pkgconfig:/nix/store/i27radiw7zyxx1h5j926hyz50yvrvdkb-net-snmp-5.9-dev/lib/pkgconfig:/nix/store/j3ag5anfrqgy6zwgb885q4jrpv39552w-openssl-1.1.1n-dev/lib/pkgconfig CC=gcc

Config options:  LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT SNMP_VRRP SNMP_CHECKER INIT=SYSV

System options:  VSYSLOG MEMFD_CREATE IPV4_DEVCONF LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK

Distro (please complete the following information):

Details of any containerisation or hosted service (e.g. AWS) If keepalived is being run in a container or on a hosted service, provide full details

Configuration file:

global_defs {
  notification_email { <redacted> }
notification_email_from <redacted>
smtp_server mail.gocept.net
smtp_connect_timeout 30
router_id <redacted>
script_user root

}

vrrp_script check_default_route_v4 {
      script "/run/current-system/sw/bin/check-default-route-v4"
      interval 1
      fall 2
      rise 2
}

vrrp_script check_default_route_v6 {
      script "/run/current-system/sw/bin/check-default-route-v6"
      interval 1
      fall 2
      rise 2
}

track_file check_stop_file {
  # Allow admins to locally send keepalive into FAIL state by adding
  # a non "0" entry
  file /etc/keepalived/stop
  # 0 as default causes FAIL if anything != 0 is in the file
  weight 0
  init_file 0
}

vrrp_sync_group router {
    group { mgm srv fe }

    notify_master "/run/current-system/sw/bin/fc-manage -v activate-configuration -s primary"
    notify_backup "/run/current-system/sw/bin/fc-manage -v activate-configuration --base-system"
    notify_fault "/run/current-system/sw/bin/fc-manage -v activate-configuration --base-system"
    notify_stop "/run/current-system/sw/bin/fc-manage -v activate-configuration --base-system"

    track_script {
        check_default_route_v4 weight 10
        check_default_route_v6 weight 5
    }

    track_file {
      check_stop_file
    }
}

vrrp_instance mgm {
    interface ethmgm
    state BACKUP
    virtual_router_id 51
    priority 100
    advert_int 1
    garp_master_refresh 30

    virtual_ipaddress {
        <redacted v6 address> preferred_lft 0
    }

    virtual_ipaddress_excluded {
        172.20.1.1/24
    }
}

vrrp_instance fe {
    interface ethfe
    state BACKUP
    virtual_router_id 51
    priority 100
    advert_int 1
    garp_master_refresh 30

    virtual_ipaddress {
        <redacted v6 address>
    }

    virtual_ipaddress_excluded {
        172.20.2.1/24
    }

}

vrrp_instance srv {
    interface ethsrv
    state BACKUP
    virtual_router_id 51
    priority 100
    advert_int 1
    garp_master_refresh 30

    virtual_ipaddress {
        <redacted v6 address>
    }

    virtual_ipaddress_excluded {
        172.20.3.1/24
        172.30.3.1/24
    }
}

Notify and track scripts

The actual scripts aren't relevant AFAICT and the referenced fc-manage script is a somewhat larger piece of software ... ;)

System Log entries

Mar 12 09:27:04 Keepalived_vrrp[1703695]: (mgm) Entering FAULT STATE
Mar 12 09:27:04  Keepalived_vrrp[1703695]: VRRP_Group(router) Syncing instances to FAULT state
Mar 12 09:27:04  Keepalived_vrrp[193847]: Error exec-ing command '/nix/store/frm892zjlvqsrgmp9mblxnyy34ljn4nb-python3.10-fc-agent-1.0/bin/fc-manage', error 2: No such file or directory

Did keepalived coredump?

nope

pqarmitage commented 8 months ago

When I implemented checking the security of scripts, I was fully aware that the check was only valid for the state of the system when keepalived starts, and that permissions could subsequently be changed. However, for such a change to occur it requires elevated privileges, and therefore the setup could not be modified by an unprivileged user, which was the primary objective. This was necessary because keepalived runs as root, although some people do now run keepalived as a non root user with specific elevated privileges granted, but these privileges need to be fairly extensive (an alternative approach is to use SELinux to restrict what keepalived can do, and in particular limit from what directories keepalived can execute scripts).

I think it would be wrong to alter now the default way keepalived handles scripts, but I think we could add a new global keyword use_symlink_path or something like that, so that keepalived uses the originally configured path (including symlinks) for scripts (this would apply for both VRRP notify scripts and IPVS check_misc scripts). keepalived would still do the same security checks as it currently does, but would execute the scripts via the original specified path (i.e. including any symlinks) rather than resolved path.

Would this suit your requirements?

ctheune commented 8 months ago

Yeah, I agree with the overall analysis and with the suggestion.

To clarify: this would result in the checks being performed the moment before calling the script as well, right?

pqarmitage commented 8 months ago

It wasn't my intention to perform the checks immediately before calling the script. My view is that keepalived will have checked all parts of the path (both the real patch and the path via symlinks) are non-root writeable at startup; it therefore requires someone with root privileges to modify the script, and we must assume that they know what they are doing (after all a root user can simply modify the script).

ctheune commented 8 months ago

Sure. I guess this is an aspect of why I consider the original check somewhat moot, but - just to make sure - I'm not arguing about that aspect and just wanted to understand what you're suggesting.

So: yeah, I'd appreciate using the original path when calling as you suggested. Much appreciated!

pqarmitage commented 8 months ago

Commit ebf37b9 should resolve this issue. The new keyword is use_symlink_paths, which is slightly different from what I suggested earlier.

ctheune commented 8 months ago

Thanks! I'll pass this over to my colleague for testing!