facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.29k stars 1.09k forks source link

Query execution tracing and replay tool #9668

Open xiaoxmeng opened 2 months ago

xiaoxmeng commented 2 months ago

Description

Add query execution tracing and replay tool to facilitate query analysis. The tool shall allow us to replay a part of query execution on a local computer instead of replaying the whole query in a production environment or in a real Prestissimo cluster. The tool consists two parts: (1) trace collection: run a query with trace collection enabled through query configs (and the corresponding session properties in Prestissimo context). The query execution will collect the trace by dumping the input vectors of a particular set of specified operators (data) and the corresponding query plan info (meta data) into a specified storage location; (2) trace replay: constructs the a sub-query plan using the dumped query plan meta, and then load the dumped input vectors into memory and feed into the constructed sub-query plan for replay. If the input is too large, then we can build a special source operator to read the dumped input vector from storage in batches.

The replay can be done at different level: operator level, pipeline level and task level. We can start with the operator level and extend to pipeline and task level next.

cc @mbasmanova @duanmeng @huamn

mbasmanova commented 2 months ago

CC: @aditi-pandit

aditi-pandit commented 2 months ago

+1. This would be really useful.

FelixYBW commented 2 months ago

Similar as Gluten's microbenchmark reproduce tool. It will be super useful for debug and performance analysis. Good feature!