Linux network namespace sysctl safety verifier.
Ensure that net
sysctls are network-namespace-safe.
usage: verify.py [-h] [-v]
optional arguments:
-h, --help show this help message and exit
-v, --verbose Verbose output
Currently, this must be run as root, in order to use CLONE_NEWNET
.
$ sudo ./verify.py -v
The premise behind this tool is simple:
/proc/sys/net
.CLONE_NEWNET
)./proc/sys/net
./proc/sys/net
.Anything in the parent which changed as a result of manipulations in the child is considered a "leak".
The Linux kernel provides runtime-configurable kernel parameters known as
"sysctls", which are accessed via /proc/sys/
.
Linux also supports supports network namespaces (netns) which enable isolated
virtual network stacks and are used heavily by containerization platforms like
LXC or Docker. See network_namespaces(7)
.
It's generally understood that the "net" sysctls (under /proc/sys/net
) are
supposed to be "netns safe", meaning that manipulating sysctls from one network
namespace cannot affect any other network namespace. This isn't exactly
guaranteed, though.
It may be desirable to allow a container to write to net sysctls, specifically
parameters of devices which exist only within the container's netns. However,
the latest version of Docker (20.10.6 as of this writing) mounts all of
/proc/sys
read-only, to prevent changes made in a container from "leaking"
out of the container. This protection mechanism makes it more difficult (and
less secure) to run a libvirt QEMU VM inside of a Docker container.
This tool was inspired by conversation on this runc issue.
Use of this tool helped to uncover several bugs in the Linux kernel's implementation of several sysctls, which have been subsequently fixed by this tool's author:
Bug 1: Several nf_conntrack
sysctls are global and writable by any netns
net.nf_conntrack_max
net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_expect_max
netfilter: conntrack: Make global sysctls readonly in non-init netns
v5.13-rc1
(2671fa4dc010
)v5.12.2
(671c54ea8c7f
)v5.11.19
(fbf85a34ce17
)v5.10.35
(d3598eb3915c
)v5.4.120
(baea536cf51f
)v4.19.191
(9b288479f7a9
)v4.14.233
(68122479c128
)v4.9.269
(da50f56e826e
)Bug 2: tcp_allowed_congestion_control
is global and writable by any netns
net.ipv4.tcp_allowed_congestion_control
net: Make tcp_allowed_congestion_control readonly in non-init netns
v5.12-rc8
(97684f0970f6
)v5.11.16
(1ccdf1bed140
)v5.10.32
(35d7491e2f77
)Bug 3: Setting tcp_congestion_control
can globally affect tcp_allowed_congestion_control
net.ipv4.tcp_congestion_control
(affects)net.ipv4.tcp_allowed_congestion_control
(affected)net: Only allow init netns to set default tcp cong to a restricted algo
v5.13-rc1
(8d432592f30f
)v5.12.4
(e7d7bedd507b
)v5.11.21
(efe1532a6e1a
)v5.10.37
(6c1ea8bee75d
)v5.4.119
(9884f745108f
)v4.19.191
(992de06308d9
)Additionally, a safety check was added to the kernel to prevent certain classes of bugs from going unnoticed:
31c4d2f160eb
:
net: Ensure net namespace isolation of sysctls