elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.51k stars 8.06k forks source link

[APM] Service map can cause OOM in elasticsearch #187707

Open neptunian opened 3 weeks ago

neptunian commented 3 weeks ago

Following up from https://github.com/elastic/kibana/pull/186417

When testing the service map under the maximum conditions of 1k trace ids with each trace having ~500 spans, the scripted metric aggregation can cause an OOM in elasticsearch depending on the memory available. Looking at the elasticsearch heapdump, I suspect this is due to the # of hash maps and other data structures being created simultaneously, where data can be duplicated and exist at the same time within the reduce phase. This issue did not happen when disabling parallel async requests and having them sync, when calling fetch_service_paths_from_trace_ids. Further investigation needed.

neptunian commented 3 weeks ago

@crespocarlos did some investigation https://github.com/elastic/kibana/pull/187445

elasticmachine commented 3 weeks ago

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)