Feature/cgroup resource management

This PR adds support for joining cgroups and applying resource quota with cgroups. Resource quotas can be applied to the cpu, memory, and io cgroups.

To use the cgroup feature, an existing cgroup must be accessible on the fs. We will use the default mount point /sys/fs/cgroup in the following examples. If the cgroup hierarchy is not mounted at the default location /sys/fs/cgroup, it can be mounted with sudo mount -t cgroup2 cgroup /sys/fs/cgroup. To create a cgroup with the appropriate permissions for bst to join or apply resource quotas from, first ensure that /sys/fs/cgroup/cgroup.subtree_control contains whatever controller that the bst user intends to use (i,.e. if you wish for bst to control cpu then ensure that cat /sys/fs/cgroup/cgroup.subtree_control contains "cpu". If not, it can be added with echo +cpu | sudo tee /sys/fs/cgroup/cgroup.subtree_control). The /sys/fs/cgroup/bst hierarchy must also at the very least have the pids controller so that bst can add and remove itself from the cgroup. This can be accomplished with echo +pids | sudo tee /sys/fs/cgroup/cgroup.subtree_control. Now one can create a cgroup that bst will operate within using sudo mkdir /sys/fs/cgroup/bst. The sub-hierarchy can be deletehgated to the bst user with sudo chown -R $USER:$USER /sys/fs/cgroup/bst. The bst user will also need to be able to add itself to the /sys/fs/cgroup/bst/cgroup.procs which requires that the user have write permissions on /sys/fs/cgroup/cgroup.procs which can be accomplished with sudo chown $USER:$USER /sys/fs/cgroup/cgroup.procs.

If the user then manually applies cgroup quotas /sys/fs/cgroup/bst, it can be joined with bst --cgroup=/sys/fs/cgroup/bst. The bst instance will be automatically added to the cgroup and whatever quotas have been specified by the user will be applied.

If the user wishes for bst to configure quotas within the new bst cgroup, that can be accomplished with the --climit <resource>=<limit>. This still depends on the user specifying some root cgroup with the --cgroup= that bst will be able to access (i.e. simply running bst --climit cpu.max=5000 will not apply any limits, while running bst --cgroup=/sys/fs/cgroup/ --climit cpu.max=5000 will limit cpu usage). This PR supports the following limits:

bst --cgroup=/sys/fs/cgroup/bst --climit cpu.max=$MAX -- This will control the $MAX bandwidth limit for each $PERIOD duration. The $PERIOD duration is 100000 by default.
bst --cgroup=/sys/fs/cgroup/bst --climit cpu.max="$MAX $PERIOD" -- This will control the $MAX bandwidth limit for each $PERIOD duration.
bst --cgroup=/sys/fs/cgroup/bst --climit cpu.weight=$WEIGHT -- This will control the cpu proportional weight of the bst process within the parent cgroup. A parent's resource (/sys/fs/cgroup/bst in this case), is distributed by "adding up the weights of all active children and giving each the fraction matching the ratio of its weight against the sum" [ref]. This value is in the range [1,1000].
bst --cgroup=/sys/fs/cgroup/bst --climit memory.min=$MIN -- when bst memory usage is within min boundary the memory won't be reclaimed under any conditions. This can use postfixes like G, M, etc.
bst --cgroup=/sys/fs/cgroup/bst --climit memory.low=$LOW -- similar to memory.min but soft limit.
bst --cgroup=/sys/fs/cgroup/bst --climit memory.high=$HIGH -- memory usage throttle limit. As will memory bounds, this can use postfixes.
bst --cgroup=/sys/fs/cgroup/bst --climit memory.max=$MAX -- memory hard limit.
bst --cgroup=/sys/fs/cgroup/bst --climit memory.swap.high=$HIGH -- swap usage throttle limit.
bst --cgroup=/sys/fs/cgroup/bst --climit memory.swap.max=$MAX -- swap usage hard limit.
bst --cgroup=/sys/fs/cgroup/bst --climit io.weight=$WEIGHT -- updates default weight which is a number [1,1000] that specifies the relative amount of IO time the bst instance will use in relation to its siblings. -bst --cgroup=/sys/fs/cgroup/bst --climit io.max="$MAJ:$MIN rbps=$RIOPS wbps=$WBPS riops=$RIOPS wiops=$WIOPS" -- Sets the max IO usage for the device specified by the major minor code. -bst --cgroup=/sys/fs/cgroup/bst --climit io.latency="$MAJ:$MIN target=$MS" -- Set the number of ms a process can wait before IO from other procs must be given to it. -bst --cgroup=/sys/fs/cgroup/bst --climit pids.max=$MAX -- Max number of processes that can be controlled by the bst cgroup.

One could use multiple in conjunction:

bst --cgroup=/sys/fs/cgroup/bst --climit cpu.max="5000 7000" --climit cpu.weight=100 --climit memory.max=1G --climit memory.low=1M --climit memory.max=1G --climit memory.swap.max=1G --climit io.max="8:0 rbps=100 wbps=100 riops=100 wiops=100" --climit io.latency="8:0 target=20" --climit pids.max=max

for fine granularity resource control on bst.

To provide a consistent view of the cgroup hierarchy from within bst, if /sys/fs/cgroup is a cgroups mount point it will be mounted over (tmpfs mount + cgroup mount) which will render the bst cgroup (the root cgroup within bst) as the cgroup of the system. If /sys/fs/cgroup is unmounted or bst is operating from a different rootfs nothing will be done. As well as this, bst will not mount over /sys/fs/cgroup if one of the following flags is in place:

--no-cgroup-remount
--share cgroup

If this all looks appropriate I can write the corresponding entries in the manual.

aristanetworks / bst

Feature/cgroup resource management #62