Update the elastic-agent resource consumption docs

elastic / ingest-docs

Elastic Ingest Documentation

Other

6 stars 63 forks source link

Update the elastic-agent resource consumption docs #1414

Closed alexsapran closed 2 weeks ago

alexsapran commented 3 weeks ago

Closes: https://github.com/elastic/ingest-dev/issues/4366

github-actions[bot] commented 3 weeks ago

A documentation preview will be available soon.

🔨 Buildkite builds
📚 HTML diff
📙 Preview page

Request a new doc build by commenting

* Rebuild this PR: `run docs-build` * Rebuild this PR and all Elastic docs: `run docs-build rebuild` _{`run docs-build` is much faster than `run docs-build rebuild`. A `rebuild` should only be needed in rare situations.} _{If your PR continues to fail for an unknown reason, the doc build pipeline may be broken. Elastic employees can check the pipeline status [here](https://buildkite.com/elastic/docs-build).}

mergify[bot] commented 3 weeks ago

This pull request does not have a backport label. Could you fix it @alexsapran? 🙏 To fixup this pull request, you need to add the backport labels for the needed branches, such as:

backport-/d./d is the label to automatically backport to the /d./d branch. /d is the digit NOTE: backport-skip has been added to this pull request.

rowlandgeoff commented 3 weeks ago

I could suggest some updates to wording and formatting, but first I think we should confirm whether this is the data we'd like current and potential customers to see. Going from 2% to 40% CPU usage is a big jump. Should the default preset info be shown as well to indicate that not every scenario is resource intensive? @cmacknz @amitkanfer

cmacknz commented 3 weeks ago

We effectively have two use cases we care about, and now that I've seen this we probably need to publish results for both of them:

High Throughput: How much data can we push assuming we can utilize all the resources of the machine. This is what we have here.
Scale/Endpoint Security: What is the resource footprint for low volume security monitoring. This is a competitive metric for security deployments.

We do not want to publish this without a callout that it is exclusive of use case 2.

amitkanfer commented 2 weeks ago

@alexsapran worth adding a link to our presets whenever mentioned...?

strawgate commented 2 weeks ago

We effectively have two use cases we care about, and now that I've seen this we probably need to publish results for both of them:

Agree, and we should probably present these use-cases separately, as they are quite different.

Also, without something like "events per second" or "disk read throughput", this information doesn't really help with sizing as it stands right now. I'm also a bit worried that we were disk throughput limited in the benchmark, do we have any info on the disk info?

alexsapran commented 2 weeks ago

Also, without something like "events per second" or "disk read throughput", this information doesn't really help with sizing as it stands right now. I'm also a bit worried that we were disk throughput limited in the benchmark, do we have any info on the disk info?

Let's sync up async; I would like to know why you think we are disk throttled. I can rerun the benchmarks and check the disk to make sure it's not the limiting factor.