Closed andygrove closed 2 months ago
N/A
Make it easy to see how much of the native ScanExec time is spent casting columns to different types (this usually means unpacking dictionaries).
ScanExec
Example from TPC-DS q9:
DataFusion metrics in native explain output:
metrics=[ output_rows=2097152, elapsed_compute=21.847892ms, cast_time=21.631731ms]
Full plan:
AggregateExec: mode=Partial, gby=[], aggr=[count, avg, avg], metrics=[output_rows=1, elapsed_compute=9.481194ms] ProjectionExec: expr=[col_1@1 as col_0, col_2@2 as col_1], metrics=[output_rows=400519, elapsed_compute=50.596µs] FilterExec: col_0@0 IS NOT NULL AND col_0@0 >= 81 AND col_0@0 <= 100, metrics=[output_rows=400519, elapsed_compute=4.753725ms] ScanExec: source=[CometScan parquet (unknown)], schema=[col_0: Int32, col_1: Decimal128(7, 2), col_2: Decimal128(7, 2)], metrics=[output_rows=2097152, elapsed_compute=21.847892ms, cast_time=21.631731ms]
@viirya @comphead I have addressed feedback. Thanks.
Which issue does this PR close?
N/A
Rationale for this change
Make it easy to see how much of the native
ScanExec
time is spent casting columns to different types (this usually means unpacking dictionaries).Example from TPC-DS q9:
DataFusion metrics in native explain output:
Full plan:
What changes are included in this PR?
How are these changes tested?