When testing the service map under the maximum conditions of 1k trace ids with each trace having ~500 spans, the scripted metric aggregation can cause an OOM in elasticsearch depending on the memory available. Looking at the elasticsearch heapdump, I suspect this is due to the # of hash maps and other data structures being created simultaneously, where data can be duplicated and exist at the same time within the reduce phase. This issue did not happen when disabling parallel async requests and having them sync, when calling fetch_service_paths_from_trace_ids. Further investigation needed.
Following up from https://github.com/elastic/kibana/pull/186417
When testing the service map under the maximum conditions of 1k trace ids with each trace having ~500 spans, the scripted metric aggregation can cause an OOM in elasticsearch depending on the memory available. Looking at the elasticsearch heapdump, I suspect this is due to the # of hash maps and other data structures being created simultaneously, where data can be duplicated and exist at the same time within the reduce phase. This issue did not happen when disabling parallel async requests and having them sync, when calling fetch_service_paths_from_trace_ids. Further investigation needed.