elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.7k stars 8.12k forks source link

Define ILM policy for Kibana APM data #124147

Open lizozom opened 2 years ago

lizozom commented 2 years ago

We recently started collecting APM data for kibana and kibana front end. At this stage we're collecting it for a subset of our monitoring deployments, but the longer term goal is to sample APM stats for all customer deployments, to allow us to monitor them better as well as troubleshoot performance issues in production.

Kibana APM data size

On us-east-1 region, kibana generates ~5m records a day. kibana-frontend generates a negligible amount of records (usage is low for these clusters). Given an average document size is 1.5KB, this would result in APM data for kibana weighing 7.5 GB per day for a single region.

For reference, the allocator generates ~680m records a day (>100GB a day) on the same region. This means that the kibana data is negligible in size compared to the rest of the data in these indices.

ILM Policy

While it's important to set up an ILM policy for Kibana APM data, since it's size is negligible in comparison to other services, we can ignore this for now. In the longer term, the cloud observability team plans to roll all data older than 7 days in searchable snapshots.

Some interesting questions to consider

Can we define an ILM policy per service?

Once we upgrade to 8.x and use data streams, each service will have it's own stream. We would then be able to control each stream's ILM policy separately, if we choose to.

What should be the ILM policy for the kibana info? Who owns it?

Need to identify owners

How do we make sure that this policy is defined on all monitoring APM servers?

How are those deployed across the monitoring clusters?

lizozom commented 2 years ago

@nikulinivan Do you know who were the people involved in defining the ILM policy for APM data?

simitt commented 2 years ago

Once we upgrade to 8.x and use data streams, each service will have it's own stream.

Data from all services will generally end up in the same data streams for traces* and logs (errors), except for metrics will be sent to data streams per service, and trace events collected by rum clients are stored in traces-apm.rum_traces*.

See apm-data-streams for more details.

lizozom commented 2 years ago

Thanks for the input.

I think that as we move forward to collect data from customer deployments, we'll find the need to be able to customize this, but at the moment, I think this is not a high priority. 🙏🏻

elasticmachine commented 2 years ago

Pinging @elastic/apm-ui (Team:apm)