Hot threads cpu time related to stack traces

The _nodes/hot_threads API is a valuable tool for diagnosing performance issues. Today, it first finds out how much cpu-time each thread uses and then afterwards samples same threads to provide a rough profiling of where time is spent. With this approach, there is a risk that the cpu usage reported is unrelated to the stack traces, in the extreme it could report 100% cpu usage but the stack traces are waiting for IO, mutexes or just back waiting on the thread pool queue.

We could do the sampling of thread stacks while measuring the cpu usage. This does add a risk that the sampling affects the cpu usage. For instance, sampling thread stacks require a safepoint and this could reduce the cpu usage artificially of some threads.

To have the best of both worlds, I propose to take and report 2 cpu-usages, one taken before sampling thread stacks (and thus unaffected) and one taken during thread sampling (potentially affected). This would be something like (with default request parameters):

snapshot cpu-time of all threads (cputime1)
sleep 500ms
snapshot cpu-time of all threads (cputime2)
take thread stack traces, sleeping 50 ms (to make the cpu usage comparable to the previous one, with 10 samples that would total 500ms).
snapshot cpu-time of all threads(cputime3)

Then report before-cpu-time = cputime2-cputtime1 and during-cpu-time = cputime3-cputtime2

cc: @grcevski

elastic / elasticsearch

Hot threads cpu time related to stack traces #81006