[elastic-agent] support resource limitations on child processes

simitt commented 3 years ago

Summary

When the elastic agent installs a new input, it starts a new process or restarts an existing process with additional input configuration. The agent does not apply any resource limits to the created subprocesses, potentially leading to the processes competing for available resources. This can become an issue when multiple processes run with high load, reaching the limit of available resources. We need a solution for limiting resource usage per subprocess.

It becomes especially important when the resources for the elastic agent are already restricted, which will be the case for the hosted elastic agent.

There is currently no concept available for how the memory/cpu shares available to the elastic agent should be distributed between processes. Most probably we would not want to limit the subprocesses by default, but only if configured. For hosted agents the orchestrator should pass a configuration to the container where the agent is running.

TODO

[ ] Do we need a solution for non-containerized environments which are not supporting cgroups?
[ ] Where should the configuration for resource limitations live?
[ ] Does the configuration need to be validated, e.g. sum of shares needs to be <=100% of available resources
[ ] how to retrieve available resources by the elastic agent, e.g. use ENV for passing in overall restrictions
[ ] define supported limitations (e.g. CPU period/quota)
[ ] do we need concrete limits, or is setting process priorities already enough?
[ ] privileges the elastic agent needs for applying resource limitations to the subprocesses (most probably not an issue, as it already has privileges to start the processes as root)

elasticmachine commented 3 years ago

Pinging @elastic/ingest-management (Team:Ingest Management)

simitt commented 3 years ago

Based on previous internal communication I started looking into leveraging cgroups for creating a child cgroup per child process and applying resource limitations to the child cgroup. By not creating an independent cgroup but creating it from the parent cgroup, the resource usage still shows up in the stats for the parent cgroup.

Some first POC for playing around with cgroups:

import (
    "fmt"
    "os"
    "os/exec"

    "github.com/containerd/cgroups"
    specs "github.com/opencontainers/runtime-spec/specs-go"
)

// POC for working with v1 cgroups
// demonstrating how a process can be assigned to a cgroup and
// assign subprocesses to subcgroups with resource limitations.
//
// Example usage for runing on docker:
// docker build . -t go-cgroup
// time docker run -v /sys/fs/cgroup:/sys/fs/cgroup:rw go-cgroup
// The time command shows that the program needs significantly more
// time to finish when the subprocess resources are limited
func main() {
    // create a new cgroup and add the current process to it
    mainPid := os.Getpid()
    q := int64(9000)
    p := uint64(10000)
    resources := &specs.LinuxResources{CPU: &specs.LinuxCPU{Quota: &q, Period: &p}}
    mainCgroup, err := cgroups.New(cgroups.V1, cgroups.StaticPath(fmt.Sprintf("%v", mainPid)), resources)
    if err != nil {
        panic(err)
    }
    if err := mainCgroup.Add(cgroups.Process{Pid: mainPid}); err != nil {
        panic(err)
    }

    // create subprocess
    // run a script that creates some CPU load, e.g. fibonacci calculation
    cmd := exec.Cmd{Path: "./fibonacci.sh", Stdout: os.Stdout, Stderr: os.Stdout}
    cmd.Start()
    defer cmd.Wait()
    if cmd.Process == nil {
        panic("subprocess not successfully started")
    }
    childPid := cmd.Process.Pid

    // add subprocess to a new child cgroup
    // change the quota to period ratio for verifying that the CPU quotas applied to
    // the cgroup are limiting the processing of the script.
    // decreasing the quota -> increases run time
    q = int64(1000)
    p = uint64(10000)
    resources = &specs.LinuxResources{CPU: &specs.LinuxCPU{Quota: &q, Period: &p}}
    childCgroup, err := mainCgroup.New("childgroup", resources)
    if err != nil {
        panic(err)
    }
    if err := childCgroup.Add(cgroups.Process{Pid: childPid}); err != nil {
        panic(err)
    }
    listProcesses("main", mainCgroup)
    listProcesses("child", childCgroup)
    printStats(fmt.Sprint(mainPid), mainCgroup)
    printStats(fmt.Sprint(childPid), mainCgroup)
}

func listProcesses(name string, cg cgroups.Cgroup) {
    processes, err := cg.Processes(cgroups.Pids, true)
    if err != nil {
        panic(err)
    }
    fmt.Println(fmt.Sprintf("Processes in %s cgroup", name))
    for _, p := range processes {
        fmt.Println(fmt.Sprintf("Pid: %v", p.Pid))
    }
}

func printStats(name string, cg cgroups.Cgroup) {
    stats, err := cg.Stat()
    if err == nil {
        fmt.Println(fmt.Sprintf("CPU usage for %s: %v: %v", name, stats.CPU.Usage.Total, stats.CPU.Usage.User))
    } else {
        panic(err)
    }
}

blakerouse commented 3 years ago

cgroups was made for this, getting it to work inside of a container might be more difficult.

Windows has a notion of this with JobObjects. I would bet its not as feature complete as cgroups, but provide enough to be usable.

Mac doesn't seem to support anything link this at the process level. Would be interesting if possible to use hypervisor, by limiting the resources by running the processes in a native VM.

simitt commented 3 years ago

cgroups was made for this, getting it to work inside of a container might be more difficult.

There are definitly things we need to figure out inside containers, e.g. @axw just recently had to change the cgroup paths for metrics collection when running inside containers. I did run the above shared script inside a docker container though, so we should be able to use it as a start.

elasticmachine commented 3 years ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

jlind23 commented 3 years ago

@ruflin should be added in architecture V2 also, right?

ruflin commented 3 years ago

Happy to label this with v2 architecture for tracking. My current take is that likely we should this more via deployment instead of having elastic-agent being responsible to manage resources. It's an ongoing conversation.

jlind23 commented 3 years ago

Label added.

jlind23 commented 2 years ago

cc @ph for the V2 Architecture.

zez3 commented 2 years ago

On linux I set the cgroup via systemd limits on the parent process(the Agent). All sub children will inherit this limit eg.

systemctl set-property elastic-agent.service MemoryLimit=50G

systemctl status elastic-agent.service
● elastic-agent.service - Elastic Agent is a unified agent to observe, monitor and protect your system.
     Loaded: loaded (/etc/systemd/system/elastic-agent.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system.control/elastic-agent.service.d
             └─50-MemoryLimit.conf
     Active: active (running) since Tue 2022-05-24 21:00:20 CEST; 18h ago
   Main PID: 2756710 (elastic-agent)
      Tasks: 587 (limit: 618877)
     Memory: 724.9M (limit: 50.0G)
     CGroup: /system.slice/elastic-agent.service
             ├─2756710 /opt/Elastic/Agent/elastic-agent
             ├─2842320 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/filebeat-8.2.0-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${FILEBEAT_GOG>
             ├─2842351 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/metricbeat-8.2.0-linux-x86_64/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${METRICBE>
             ├─2842651 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osquerybeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E logging.level=error>
             ├─2842677 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/filebeat-8.2.0-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${FILEBEAT_GOG>
             ├─2842706 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/metricbeat-8.2.0-linux-x86_64/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${METRICBE>
             ├─2842840 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osqueryd --flagfile=osquery/osquery.flags --pack_delimiter=_ --extensions_socket=/var/run/120651786/osquery.sock --database_path=osquery/osquer>
             └─2842901 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osquery-extension.ext --socket /var/run/120651786/osquery.sock --timeout 10 --interval 3

elastic / elastic-agent

[elastic-agent] support resource limitations on child processes #116