elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.47k stars 8.04k forks source link

"Inspect" in Discover should show wall clock / total elapsed query time #187051

Open IanLee1521 opened 6 days ago

IanLee1521 commented 6 days ago

It would be great if there was a way in Discover to see the total elapsed (wall clock) time of a KQL or ES|QL request, e.g. via the "Inspect" menu.

Background

Today, it is possible in both KQL and ES|QL to do a query and inspect the parts of that query to see how long they each take, but these numbers are not representative of the total runtime of the query. Other tools do this much better.

KQL

For KQL, I think it's usually pretty close to the times reported in the Inspector. I've always seen this as two Requests, one for Documents and one for the Data (it's possible there other other counts at times, just wasn't what I was seeing as I was creating this ticket):

image

If I do a big query like data_stream.dataset: system.syslog over Last 7 days, then I get back about 2.25 billion documents, and the total wall clock time just now was about 15 seconds.

image

Looking at the inspect data, it's close, but not quite the time I measured. Could be imprecision in pushing the start / stop button on my stopwatch.

image

image

ES|QL

ES|QL appears to break the requests up into more parts than just two, I've seen up to ~ 10 requests before. Each one is significantly shorter than the total wall clock, and just adding them all together doesn't get to the right answer (I assume they overlap).

I just used a similar ES|QL query of from logs-* | limit 1000 over 7 days, which had a wall clock time of about 20 seconds.

Inspector says there were 7 requests made, 6x for "Table" and 1x for "Visualization":

image

The response times (green boxes) for the 7 are: 554ms, 804ms, 949ms, 1750ms, 1352ms, 9433ms, 258ms which sum up to 15,100 ms = 15.1 seconds, which is only 75% of what I measured.

What should happen?

IMO, in these two situations of KQL and ES|QL, there should be some indication, maybe up higher in the UI (mockup below) of the total wall clock time observed from "clicked button to submit query" to "data is fully loaded and page has stopped spinning". This would then allow a much easier time comparing runtimes in Elastic and between other products (e.g. Splunk).

Screenshot 2024-06-27 at 01 30 02

Competition

Splunk currently does this with their "Search job inspector" (https://www.splunk.com/en_us/blog/tips-and-tricks/splunk-clara-fication-job-inspector.html), which actually provides a ton of other useful information and a waterfall chart of how the various pieces / queries. (There is probably a whole other ticket to be opened for more of THAT functionality).

A similar query (index=system-syslog over 60 minutes of data / 4.2 M records) in our local Splunk environment produced the following details, showing me both the total wallclock time (5.458 seconds) as well as how much time different parts of the workload take to complete:

image

elasticmachine commented 5 days ago

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)