JonathonReinhart / linux-netns-sysctl-verify

Linux network namespace sysctl safety verifier.
11 stars 0 forks source link
containers docker libvirt linux-kernel sysctl

linux-netns-sysctl-verify

Linux network namespace sysctl safety verifier.

Ensure that net sysctls are network-namespace-safe.

Usage

usage: verify.py [-h] [-v]

optional arguments:
  -h, --help     show this help message and exit
  -v, --verbose  Verbose output

Currently, this must be run as root, in order to use CLONE_NEWNET.

$ sudo ./verify.py -v

Theory of Operation

The premise behind this tool is simple:

Anything in the parent which changed as a result of manipulations in the child is considered a "leak".

Background

The Linux kernel provides runtime-configurable kernel parameters known as "sysctls", which are accessed via /proc/sys/.

Linux also supports supports network namespaces (netns) which enable isolated virtual network stacks and are used heavily by containerization platforms like LXC or Docker. See network_namespaces(7).

It's generally understood that the "net" sysctls (under /proc/sys/net) are supposed to be "netns safe", meaning that manipulating sysctls from one network namespace cannot affect any other network namespace. This isn't exactly guaranteed, though.

It may be desirable to allow a container to write to net sysctls, specifically parameters of devices which exist only within the container's netns. However, the latest version of Docker (20.10.6 as of this writing) mounts all of /proc/sys read-only, to prevent changes made in a container from "leaking" out of the container. This protection mechanism makes it more difficult (and less secure) to run a libvirt QEMU VM inside of a Docker container.

This tool was inspired by conversation on this runc issue.

Results

Use of this tool helped to uncover several bugs in the Linux kernel's implementation of several sysctls, which have been subsequently fixed by this tool's author:

Bug 1: Several nf_conntrack sysctls are global and writable by any netns

Bug 2: tcp_allowed_congestion_control is global and writable by any netns

Bug 3: Setting tcp_congestion_control can globally affect tcp_allowed_congestion_control

Additionally, a safety check was added to the kernel to prevent certain classes of bugs from going unnoticed: