internal/wglinux: dealing with network namespaces

A situation came up in IRC this morning where a user (@bitkeks) would like to enter a network namespace in their own code using setns(2) and then manipulate WireGuard devices using wgctrl.

The goroutine with a modified namespace called into wgctrl and ultimately netlink, but because of the way my netlink package spins up an internal goroutine for dealing with system calls, that internal goroutine was not part of the same network namespace as its calling goroutine. To remedy this, I've added https://github.com/mdlayher/netlink/pull/141 which enables the caller to explicitly opt-in to setting the namespace of the calling thread on the internal goroutine/syscall thread.

On IRC, we had also discussed that EPERM from setns(2) (due to lack of CAP_SYS_ADMIN/root) could be non-fatal, but then we can end up in a state where the caller attempted to configure a namespace but the namespace was not actually applied. Yet, no error would be returned in this case.

In order to integrate these ideas into wgctrl, we have a few options. At this point, I'm leaning toward option 2, but I'd be happy to hear from others as well.

/cc @bitkeks @zx2c4

1) set netlink.CallingThreadNetNS in genetlink.Dial config and do nothing else

This means that the calling and netlink socket goroutines will share the same network namespace, but either root or CAP_SYS_ADMIN is now required at all times to set the network namespace. root or CAP_NET_ADMIN are already required to manipulate WireGuard devices.

I consider this the least favorable option, although it is the simplest.

2) expose a config struct for wgctrl.New and add an option to use the calling thread's namespace

This option is explicit and requires the caller to actually opt-in to calling setns(2) under the hood, so root/CAP_SYS_ADMIN is only required if the user explicitly says so.

This pattern is pretty common in Go applications, and could look something like:

package wgctrl

type Config struct {
    // Make Client retrieve the network namespace of the calling thread and use it
    // internally, so WireGuard devices in that namespace can be manipulated.
    LinuxCallingThreadNetNS bool

    // Alternative: actually specify a namespace FD for the netlink socket to enter,
    // with a constant for "calling thread". If set to zero, nothing happens.
    // LinuxNetNS int

    // Room for future options and extensibility...
    // OpenBSDFoo int
}

Then the caller can either use an explicit config, or nil for none:

// Use all defaults.
c, err := wgctrl.New(nil)

// Configure only what is needed. Empty fields are defaults.
c, err := wgctrl.New(&wgctrl.Config{
    LinuxCallingThreadNetNS: true,
})

I'm currently in favor of doing this because it is simple to implement and explicit. It is slightly unfortunate that:

it would break existing callers, but it won't be hard for them to replace wgctrl.New() with wgctrl.New(nil) (however, we make zero API stability guarantees at this time anyway)
the user has to explicitly opt-in to this to get the behavior they might assume would already work, if they're already using setns(2) (as @bitkeks is)

3) set netlink.CallingThreadNetNS and add additional logic in package netlink to figure out if we're already in the right namespace in both the calling thread and internal syscall thread

If we create the internal sycall goroutine and lock OS thread, then determine it has an identical network namespace to the calling thread, there's no need to ever invoke setns(2). This could mean that the code could continue to work without privileges, unless a network namespace was explicitly set in either the calling thread or configuration. If the calling thread was able to set a namespace, package netlink will already have permission to do so as well.

There is the potential to use something like kcmp(2) to see if the namespaces match, but it appears to have some caveats according to the manpage:

       Note the kcmp() is not protected against false positives which may occur if the processes are currently running.   One  should  stop  the
       processes by sending SIGSTOP (see signal(7)) prior to inspection with this system call to obtain meaningful results.

       This system call is available only if the kernel was configured with CONFIG_CHECKPOINT_RESTORE.  The main use of the system call  is  for
       the  checkpoint/restore  in  user  space  (CRIU) feature.  The alternative to this system call would have been to expose suitable process
       information via the proc(5) filesystem; this was deemed to be unsuitable for security reasons.

I did a couple of quick experiments with this and didn't seem to come up with a meaningful result, although perhaps I'm doing it wrong.

func kcmp(pid1, pid2, typ, idx1, idx2 int) (int, error) {
    r0, r1, errno := unix.Syscall6(
        unix.SYS_KCMP,
        uintptr(pid1),
        uintptr(pid2),
        uintptr(typ),
        uintptr(idx1),
        uintptr(idx2),
        0,
    )
    log.Println("kcmp", r0, r1, errno)

    if errno != 0 {
        return 0, os.NewSyscallError("kcmp", errno)
    }

    return int(r0), nil
}

            pid := os.Getpid()
            const kcmpFile = 0
            res, err := kcmp(pid, pid, kcmpFile, netNS, origNetNS)
            if err != nil {
                errC <- err
                return
            }
            log.Println("KCMP:", res)

$ go run main.go
2019/06/04 13:26:40 CALLER NETNS FD: 3
2019/06/04 13:26:40 SYSCALL NETNS FD: 5
2019/06/04 13:26:40 kcmp 2 0 errno 0
2019/06/04 13:26:40 KCMP: 2
2019/06/04 13:26:40 &os.SyscallError{Syscall:"setns", Err:0x1}
2019/06/04 13:26:40 failed to open wgctrl: setns: operation not permitted
exit status 1

Perhaps this merits more investigation, but due to the caveats listed on the manpage and potential complexity, I'm still leaning toward option 2.

For reference, here's the code snippet that expected the netns to stay the same. While wc.Device(ifName) would return a new, empty device (might be another bug), wc.ConfigureDevice(ifName, *config) failed with file not found. This was caused by the underlying netlink goroutine, which would query the device by name in the wrong netns, in which the device created by dnl.NetworkLinkAdd(ifName, "wireguard") does not exist.

import (
    "net"
    "runtime"

    "github.com/vishvananda/netns"
    dnl "github.com/docker/libcontainer/netlink"
    "golang.zx2c4.com/wireguard/wgctrl"
    "golang.zx2c4.com/wireguard/wgctrl/wgtypes"
)

func (s *sobj) modifyNamespace() {
    // Lock thread to prevent switching of namespaces
    runtime.LockOSThread()
    defer runtime.UnlockOSThread()

    nsCurrent, _ := netns.Get()
    defer nsCurrent.Close()

    nsContainer, err := netns.GetFromPid(s.containerPid)
    if err != nil {
        s.logger.Fatal(err)
    }
    netns.Set(nsContainer)
    defer nsContainer.Close()

    // Switch back to the original namespace on return
    defer netns.Set(nsCurrent)

    ifaces, _ := net.Interfaces()
    s.logger.Debugf("Interfaces: %+v", ifaces)

    // Create new wireguard client
    wc, err := wgctrl.New()
    if err != nil {
        s.logger.Error(err)
        return
    }
    defer wc.Close()

    devs, err := wc.Devices()
    if err != nil {
        s.logger.Error(err)
        return
    }

    if len(devs) == 0 {
        s.logger.Warn("No WireGuard interfaces in this namespace!")
        var ifName string = "wg66"

        // Create new link with type wireguard
        err := dnl.NetworkLinkAdd(ifName, "wireguard")
        if err != nil {
            s.logger.Error(err)
        }

        // Generate a new private key
        key, err := wgtypes.GeneratePrivateKey()

        // Create config for new wireguard interface
        var port int = 6666
        var mark int = 66
        config := &wgtypes.Config{
            PrivateKey: &key,
            ListenPort: &port,
            FirewallMark: &mark,
            ReplacePeers: true,
            Peers: nil,
        }
        s.logger.Debugf("New config: %+v", config)

        devs, err = wc.Devices()
        if err != nil {
            s.logger.Error(err)
            return
        }

        for _, dev := range devs {
            s.logger.Debugf("WireGuard device: %s %d %v", dev.Name, dev.ListenPort, dev.Peers)
        }

        dev, err := wc.Device(ifName)
        if err != nil {
            s.logger.Errorf("Device: %s", err)
            return
        } else if dev.Name == "" {
            s.logger.Errorf("Found device has no name, assuming namespace error during query for %s", ifName)
            return
        }

        s.logger.Debugf("Found device: %+v", dev)

        // Apply the config to the interface
        err = wc.ConfigureDevice(ifName, *config)
        if err != nil {
            s.logger.Error(err)
            return
        }

        s.logger.Debug("WireGuard device configured!")

        // Re-query the list of interfaces
        devs, err = wc.Devices()
        if err != nil {
            s.logger.Error(err)
            return
        }
    }

    for _, dev := range devs {
        s.logger.Debugf("WireGuard device: %s %d %v", dev.Name, dev.ListenPort, dev.Peers)
    }
}

WireGuard / wgctrl-go

internal/wglinux: dealing with network namespaces #58