Add query execution tracing and replay tool to facilitate query analysis. The tool shall allow us to replay a part of query execution on a local computer instead of replaying the whole query in a production environment or in a real Prestissimo cluster. The tool consists two parts: (1) trace collection: run a query with trace collection enabled through query configs (and the corresponding session properties in Prestissimo context). The query execution will collect the trace by dumping the input vectors of a particular set of specified operators (data) and the corresponding query plan info (meta data) into a specified storage location; (2) trace replay: constructs the a sub-query plan using the dumped query plan meta, and then load the dumped input vectors into memory and feed into the constructed sub-query plan for replay. If the input is too large, then we can build a special source operator to read the dumped input vector from storage in batches.
The replay can be done at different level: operator level, pipeline level and task level. We can start with the operator level and extend to pipeline and task level next.
Description
Add query execution tracing and replay tool to facilitate query analysis. The tool shall allow us to replay a part of query execution on a local computer instead of replaying the whole query in a production environment or in a real Prestissimo cluster. The tool consists two parts: (1) trace collection: run a query with trace collection enabled through query configs (and the corresponding session properties in Prestissimo context). The query execution will collect the trace by dumping the input vectors of a particular set of specified operators (data) and the corresponding query plan info (meta data) into a specified storage location; (2) trace replay: constructs the a sub-query plan using the dumped query plan meta, and then load the dumped input vectors into memory and feed into the constructed sub-query plan for replay. If the input is too large, then we can build a special source operator to read the dumped input vector from storage in batches.
The replay can be done at different level: operator level, pipeline level and task level. We can start with the operator level and extend to pipeline and task level next.
cc @mbasmanova @duanmeng @huamn