Closed atris closed 2 months ago
The nodes / indices stats apis and _cat apis do report the total time spent and number of executions for the query phase (query_time / query_total) and fetch phases (fetch_time / fetch_total). Is this what you're referring to?
Pinging @elastic/es-core-features
@martijnvg That would not give me a clear picture of the average time a query is spending in the system (right from when it lands in the coordinating node to when it gets the final response). I was looking for a more e-2-e metric that is calculated per query.
(right from when it lands in the coordinating node to when it gets the final response)
So the search api does a return a took
field in the response. Which is the measured time from when the coordinated node received the request until the final response is returned. This is per search request.
Is there a way to see that value from an index perspective? I am trying to quantify the impact of a potential server change, hence the question
No, took
field is not being kept track of. However keeping track of it per index is not really possible, because a search request can span across multiple indices.
The overall query and fetch phase stats I mentioned in previous comment are being kept track of per shard (and thus per index, via indices stats api), and I think that is interesting to monitor in order to see whether a particular change has effect. Typically the query phase is the phase of a search request that is the heaviest. Changes in the query phase time will have an impact on overall search service time.
Agreed, the query phase time tracking is useful and does reflect expected patterns for the change. However, it does not really always translate into direct customer facing latency changes, hence my quest for a metric which can accurately answer that question
@atris I think there might be a way you can get this information. We store these stats as an exponentially weighted moving average as part of the adaptive replica selection stats in the nodes stats output, see: https://www.elastic.co/guide/en/elasticsearch/reference/6.7/cluster-nodes-stats.html#adaptive-selection-stats
Since this has both the service and response times, I think it should work for your use case.
@dakrone I think collecting overall search latency per query does make sense in many scenario. For example, a query with heavy aggregation calculation happening on coordinating node, which is not being recorded in stats, but useful for performance analyzation. I'm already working on this over-all coordinating node stats. If interested, I can contribute this feature to es community.
I am looking for end-to-end monitor for the Elastic as well. My goal is to monitor the time it takes from adding a new document to the ES and to the availability of the document for search on a different node. I am thinking to write a dedicated service monitoring the ES, but I hope there is something I can reuse.
The idea is:
This has been open for quite a while, and we haven't made much progress on this due to focus in other areas. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.
I was looking around for a metric that allows seeing a value of the overall latency that a query is entailing. By overall, I mean the latency calculated right from the moment the query lands into the coordinating node's execution (not counting the queue time) till the point all QueryPhases respond back.
Is there something that exists today and/or a plan to add something on these lines?