Open mshustov opened 2 years ago
Pinging @elastic/kibana-core (Team:Core)
https://github.com/elastic/kibana/pull/154022/files added Elu measurements "per request", on May 18, 2023.
WARN
logs whenever measurement exceeds the defined threshold.blocked library uses a simple, timer-based mechanism to detect delays, which Node integrated on perf_hooks#monitorEventLoopDelay
.
blocked-at uses a completely different approach, based on Async Hooks.
after()
hook, or it will result in an infinite loop.Async Hooks
adds a hook for EVERY async callback, so it can be really costly. Quoting the authors of the library:
There's a performance cost to enabling Async Hooks. It's recommended to detect blocking exists with something without the performance overhead and use blocked-at in testing environment to pinpoint where the slowdown happens. Rule of thumb is you should not be running this in production unless desperate.
I explored the AsyncLocalStorage path, which could supposedly help provide context throughout the lifecycle of a request.
blocked
or perf_hooks#monitorEventLoopDelay
.The article mentioned in the description pinpoints 2 main scenarios that can cause event loop delays:
Perhaps rather than trying to detect the blocks with a timer-based strategy, we could try to calculate them per request. We know that most of the flows are [Browser => ] Kibana => ES
AsyncLocalStorage
could help us keep track of the total request time, and then subtracting the time we are doing HTTP requests to ES, effectively obtaining the "self" time for each use case.UPDATE: Unfortunately, this does not guarantee that a "blocked" request is the culprit.
@gsoldevila is there anything we can do on this issue, or should we just close it as won't do?
Since v7.14 PR, Kibana reports a warning if the mean value of event loop delay exceeds
350ms
. It helps users spot a performance problem but not investigate it since the runtime context is absent.To overcome the problem, we can borrow a few ideas from [this article] (https://www.ashbyhq.com/blog/engineering/detecting-event-loop-blockers). TLDR: Server can capture the runtime context of the expensive tasks by implementing a custom
async hook
tracking the duration of a task and attaching it to an APM transaction. It allows Cloud customers quickly identify what APM transaction triggers CPU-bound tasks on the Kibana server.