Open dtaht opened 2 years ago
This is another futuristic thing. I have to note that this method only works on containers and namespaces, and only for locally sourced tcp stacks, not for routing packets, and I don't think they've ever tried to run a rrul test with this design.They have regular meetings on wednesdays at 8AM PST, and I might attend one day.
https://isovalent.com/blog/post/addressing-bandwidth-exhaustion-with-cilium-bandwidth-manager/
We're going to end up using namespaces for #153 anyway...
I attempted to use namespaces to simulate 1k users. That didn't go well, with dynamic routing in play (1k babel daemons). Without dynamic routing, a bit more setup, haven't got to it.
I do this daily, and we've deployed it as a means for working with bonded interfaces.
Here's my script:
#!/bin/bash
# Set the number of rx/tx queues to create
NUM_QUEUES=1
## This script creates two `veth` devices, each in their own namespace.
## Each is assigned an address (192.168.66.1/30 and 192.168.66.2/30)
## They won't be able to ping each other until a bridge is made available.
## The idea is to simulate a middle-box setup (like `lqosd`), allowing
## `iperf` and other tests between the two.
#######################################################################
#
# USAGE:
#
# ./testbed.sh params
# Params can be multiples of the following:
# q <num_queues> -- Sets the number of rx/tx queues to create
# setup -- Creates the veth devices and namespaces
# bridge -- Creates a Linux bridge (br0) and adds the veth devices to it
# This is for base-line setup, or if you need a complex setup
# cleanup -- Deletes the veth devices and namespaces
# iperf_server -- Runs iperf server in ns_external
# iperf_client -- Runs iperf client in ns_internal
# iperf_kill_server -- Kills iperf server in ns_external
# checksum -- Disables checksum calculation with ethtool.
# You need this for AF_XDP bridges with veth.
# lossy <delay> <jitter> <loss>
# -- Add simulation to the network to make it suck.
# nat -- Adds a route between veth_external and main with NAT
#######################################################################
function setup_testbed() {
sudo ip netns add ns_external
sudo ip netns add ns_internal
if ((NUM_QUEUES > 1)); then
sudo ip link add veth_external numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 123 type veth peer name veth_toexternal numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 124
sudo ip link add veth_internal numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 125 type veth peer name veth_tointernal numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 126
else
echo "(warning) Creating single queue veths"
sudo ip link add veth_external type veth peer name veth_toexternal
sudo ip link add veth_internal type veth peer name veth_tointernal
fi
sudo ip link set veth_external netns ns_external
sudo ip link set veth_internal netns ns_internal
sudo ip netns exec ns_external ip addr add 192.168.66.1/30 dev veth_external
sudo ip netns exec ns_internal ip addr add 192.168.66.2/30 dev veth_internal
sudo ip netns exec ns_external ip link set veth_external up
sudo ip netns exec ns_internal ip link set veth_internal up
sudo ip link set veth_toexternal up
sudo ip link set veth_tointernal up
}
function setup_nat() {
# Create a routed interface to carry data from ns_external back to the main network
sudo ip link add veth_route_main numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 120 type veth peer name veth_route_ext numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 121
sudo ip link set veth_route_ext netns ns_external
sudo ip link set veth_route_main up
sudo ip netns exec ns_external ip link set veth_route_ext up
sudo ip netns exec ns_external ip addr add 192.168.65.2/30 dev veth_route_ext
sudo ip addr add 192.168.65.1/30 dev veth_route_main
sudo ip route add 192.168.66.0/30 via 192.168.65.2
sudo ip netns exec ns_external ip route add 0.0.0.0/0 via 192.168.65.1
sudo ip netns exec ns_internal ip route add 0.0.0.0/0 via 192.168.66.1
# Enable routing
sudo sysctl -w net.ipv4.ip_forward=1
sudo iptables -t nat -A POSTROUTING -o wlo1 -j MASQUERADE
sudo iptables -A FORWARD -i veth_route_main -o veth_external -m state --state RELATED,ESTABLISHED -j ACCEPT
# Inside the ns_internal, mount some things so we can run stuff
sudo ip netns exec ns_internal mount -t cgroup2 cgroup2 /sys/fs/cgroup
sudo ip netns exec ns_internal mount -t securityfs securityfs /sys/kernel/security/
}
function setup_bridge() {
echo "Setting up the bridge"
sudo ip link add name br0 type bridge
sudo ip link set veth_toexternal master br0
sudo ip link set veth_tointernal master br0
sudo ip link set br0 up
}
function no_checksums() {
sudo ip netns exec ns_internal ethtool -K veth_internal tx off
sudo ip netns exec ns_external ethtool -K veth_external tx off
}
function cleanup_testbed() {
sudo ip link del br0
sudo ip link del veth_toexternal
sudo ip link del veth_tointernal
sudo ip link del veth_route_main
sudo ip netns del ns_external
sudo ip netns del ns_internal
}
function iperf_server() {
sudo ip netns exec ns_external iperf -s &
}
function iperf_client() {
sudo ip netns exec ns_internal iperf -c 192.168.66.1
}
function iperf_kill_server() {
sudo killall iperf
}
for i in "$@"; do
case $i in
setup)
setup_testbed
shift # past argument=value
;;
bridge)
setup_bridge
shift # past argument=value
;;
nat)
setup_nat
shift # past argument=value
;;
cleanup)
cleanup_testbed
shift # past argument=value
;;
iperf_server)
iperf_server
shift # past argument=value
;;
iperf_client)
iperf_client
shift # past argument=value
;;
iperf_kill_server)
iperf_kill_server
shift # past argument=value
;;
q)
NUM_QUEUES=$2
shift # past argument=value
shift # Since we're reading two
;;
lossy)
DELAY1=$2
DELAY2=$3
LOSS=$4
sudo ip netns exec ns_external tc qdisc replace dev veth_external root netem delay ${DELAY1}ms ${DELAY}2ms loss ${LOSS}%
sudo ip netns exec ns_internal tc qdisc replace dev veth_internal root netem delay ${DELAY1}ms ${DELAY2}ms loss ${LOSS}%
shift
shift # Delay
shift # Delay 2
shift # Loss
;;
checksum)
no_checksums
shift # past argument=value
;;
offload)
sudo ip netns exec ns_internal ethtool -K veth_internal rxvlan off
sudo ip netns exec ns_internal ethtool -K veth_internal txvlan off
sudo ip netns exec ns_internal ethtool -K veth_internal gso off
sudo ip netns exec ns_internal ethtool -K veth_internal tso off
sudo ip netns exec ns_internal ethtool -K veth_internal lro off
sudo ip netns exec ns_internal ethtool -K veth_internal sg off
sudo ip netns exec ns_internal ethtool -K veth_internal gro off
sudo ip netns exec ns_external ethtool -K veth_external rxvlan off
sudo ip netns exec ns_external ethtool -K veth_external txvlan off
sudo ip netns exec ns_external ethtool -K veth_external gso off
sudo ip netns exec ns_external ethtool -K veth_external tso off
sudo ip netns exec ns_external ethtool -K veth_external lro off
sudo ip netns exec ns_external ethtool -K veth_external sg off
sudo ip netns exec ns_external ethtool -K veth_external gro off
shift # past argument=value
;;
esac
done
This setup works pretty well, especially if you want to provide other services on your box. I think we should focus on documenting "recipes" for using it.
Nice script!
This is an example proof of concept that uses veth interfaces instead of complicated filters and rules. I don't know if this would be better than how libreqos works today, and not faster (I think), but has a couple potential advantages, in that it allows for routing in addition to bridging. Also there's been some recent work on preserving the tx timestamp from ingress to egress even through namespaces (kernel 5.18 and later), (Cilium has a blog entry on what you do), you don't need to use tc-mirred, you can use the cake integral shaper on the customer interfaces, and you can name interfaces after customers, and iptables firewall rules etc, just work.
Downsides include I don't know how to bridge it properly without thinking hard about it, you end up with multiple route tables with sometimes mysterious side-effects, and an extra "hop" in the network. I also personally find it hard to wrap my head around how namespaces work in general. Anyway, a quick and drity example:
~