Open simitt opened 3 years ago
Pinging @elastic/ingest-management (Team:Ingest Management)
Based on previous internal communication I started looking into leveraging cgroups
for creating a child cgroup per child process and applying resource limitations to the child cgroup. By not creating an independent cgroup but creating it from the parent cgroup, the resource usage still shows up in the stats for the parent cgroup.
Some first POC for playing around with cgroups
:
import (
"fmt"
"os"
"os/exec"
"github.com/containerd/cgroups"
specs "github.com/opencontainers/runtime-spec/specs-go"
)
// POC for working with v1 cgroups
// demonstrating how a process can be assigned to a cgroup and
// assign subprocesses to subcgroups with resource limitations.
//
// Example usage for runing on docker:
// docker build . -t go-cgroup
// time docker run -v /sys/fs/cgroup:/sys/fs/cgroup:rw go-cgroup
// The time command shows that the program needs significantly more
// time to finish when the subprocess resources are limited
func main() {
// create a new cgroup and add the current process to it
mainPid := os.Getpid()
q := int64(9000)
p := uint64(10000)
resources := &specs.LinuxResources{CPU: &specs.LinuxCPU{Quota: &q, Period: &p}}
mainCgroup, err := cgroups.New(cgroups.V1, cgroups.StaticPath(fmt.Sprintf("%v", mainPid)), resources)
if err != nil {
panic(err)
}
if err := mainCgroup.Add(cgroups.Process{Pid: mainPid}); err != nil {
panic(err)
}
// create subprocess
// run a script that creates some CPU load, e.g. fibonacci calculation
cmd := exec.Cmd{Path: "./fibonacci.sh", Stdout: os.Stdout, Stderr: os.Stdout}
cmd.Start()
defer cmd.Wait()
if cmd.Process == nil {
panic("subprocess not successfully started")
}
childPid := cmd.Process.Pid
// add subprocess to a new child cgroup
// change the quota to period ratio for verifying that the CPU quotas applied to
// the cgroup are limiting the processing of the script.
// decreasing the quota -> increases run time
q = int64(1000)
p = uint64(10000)
resources = &specs.LinuxResources{CPU: &specs.LinuxCPU{Quota: &q, Period: &p}}
childCgroup, err := mainCgroup.New("childgroup", resources)
if err != nil {
panic(err)
}
if err := childCgroup.Add(cgroups.Process{Pid: childPid}); err != nil {
panic(err)
}
listProcesses("main", mainCgroup)
listProcesses("child", childCgroup)
printStats(fmt.Sprint(mainPid), mainCgroup)
printStats(fmt.Sprint(childPid), mainCgroup)
}
func listProcesses(name string, cg cgroups.Cgroup) {
processes, err := cg.Processes(cgroups.Pids, true)
if err != nil {
panic(err)
}
fmt.Println(fmt.Sprintf("Processes in %s cgroup", name))
for _, p := range processes {
fmt.Println(fmt.Sprintf("Pid: %v", p.Pid))
}
}
func printStats(name string, cg cgroups.Cgroup) {
stats, err := cg.Stat()
if err == nil {
fmt.Println(fmt.Sprintf("CPU usage for %s: %v: %v", name, stats.CPU.Usage.Total, stats.CPU.Usage.User))
} else {
panic(err)
}
}
cgroups
was made for this, getting it to work inside of a container might be more difficult.
Windows has a notion of this with JobObjects. I would bet its not as feature complete as cgroups
, but provide enough to be usable.
Mac doesn't seem to support anything link this at the process level. Would be interesting if possible to use hypervisor, by limiting the resources by running the processes in a native VM.
cgroups was made for this, getting it to work inside of a container might be more difficult.
There are definitly things we need to figure out inside containers, e.g. @axw just recently had to change the cgroup paths for metrics collection when running inside containers. I did run the above shared script inside a docker container though, so we should be able to use it as a start.
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)
@ruflin should be added in architecture V2 also, right?
Happy to label this with v2 architecture for tracking. My current take is that likely we should this more via deployment instead of having elastic-agent being responsible to manage resources. It's an ongoing conversation.
Label added.
cc @ph for the V2 Architecture.
On linux I set the cgroup via systemd limits on the parent process(the Agent). All sub children will inherit this limit eg.
systemctl set-property elastic-agent.service MemoryLimit=50G
systemctl status elastic-agent.service
● elastic-agent.service - Elastic Agent is a unified agent to observe, monitor and protect your system.
Loaded: loaded (/etc/systemd/system/elastic-agent.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system.control/elastic-agent.service.d
└─50-MemoryLimit.conf
Active: active (running) since Tue 2022-05-24 21:00:20 CEST; 18h ago
Main PID: 2756710 (elastic-agent)
Tasks: 587 (limit: 618877)
Memory: 724.9M (limit: 50.0G)
CGroup: /system.slice/elastic-agent.service
├─2756710 /opt/Elastic/Agent/elastic-agent
├─2842320 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/filebeat-8.2.0-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${FILEBEAT_GOG>
├─2842351 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/metricbeat-8.2.0-linux-x86_64/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${METRICBE>
├─2842651 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osquerybeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E logging.level=error>
├─2842677 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/filebeat-8.2.0-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${FILEBEAT_GOG>
├─2842706 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/metricbeat-8.2.0-linux-x86_64/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${METRICBE>
├─2842840 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osqueryd --flagfile=osquery/osquery.flags --pack_delimiter=_ --extensions_socket=/var/run/120651786/osquery.sock --database_path=osquery/osquer>
└─2842901 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osquery-extension.ext --socket /var/run/120651786/osquery.sock --timeout 10 --interval 3
Summary
When the elastic agent installs a new input, it starts a new process or restarts an existing process with additional input configuration. The agent does not apply any resource limits to the created subprocesses, potentially leading to the processes competing for available resources. This can become an issue when multiple processes run with high load, reaching the limit of available resources. We need a solution for limiting resource usage per subprocess.
It becomes especially important when the resources for the elastic agent are already restricted, which will be the case for the hosted elastic agent.
There is currently no concept available for how the memory/cpu shares available to the elastic agent should be distributed between processes. Most probably we would not want to limit the subprocesses by default, but only if configured. For hosted agents the orchestrator should pass a configuration to the container where the agent is running.
TODO
cgroups
?