Closed andygrove closed 1 month ago
More detailed debug info
[CometProject [l_partkey#17L, l_extendedprice#21, l_discount#22], [l_partkey#17L, l_extendedprice#21, l_discount#22]] Creating batch with 12 rows
[CometHashAggregate [l_extendedprice#21, l_discount#22, p_type#78], Partial, [partial_sum(CASE WHEN StartsWith(p_type#78, PROMO) THEN (l_extendedprice#21 * (1 - l_discount#22)) ELSE 0.0000 END), partial_sum((l_extendedprice#21 * (1 - l_discount#22)))]] Creating batch with 1 rows
[CometFilter [p_partkey#74L, p_type#78], isnotnull(p_partkey#74L)] Creating batch with 4816 rows
[CometFilter [p_partkey#74L, p_type#78], isnotnull(p_partkey#74L)] Creating batch with 8192 rows
When running TPC-H q16 with DataFusion, there is a significant difference in performance between runs with coalesce batches enabled vs disabled.
With datafusion.execution.coalesce_batches=true
:
Query 16 executed in: 1.907695417s and returned 27840 rows
Query 16 executed in: 1.900346945s and returned 27840 rows
Query 16 executed in: 1.913095568s and returned 27840 rows
With datafusion.execution.coalesce_batches=false
:
Query 16 executed in: 3.06827928s and returned 27840 rows
Query 16 executed in: 3.052310983s and returned 27840 rows
Query 16 executed in: 3.222817185s and returned 27840 rows
I am closing this for now, because I now understand that the shuffle write exec is essentially coalescing batches as well, and I have not yet been able to prove any benefits in adding explicit coalesce batch calls
What is the problem the feature request solves?
I added some debug logging to
CometNativeIterator
to show the size of batches being processed when running TPC-H q14 and I see lots of small batches being processed.The query processes 73,456 batches with fewer than 1000 rows and 2,448 batches with at least 1000 rows.
I wonder if there would be a performance benefit in coalescing these small batches into larger batches (if that is even possible -- I do not have full information on the context yet).
My theory is that we have some overhead per batch and that we could reduce that overhead if we had larger batches. This issue is for analyzing this and writing up some findings.
Describe the potential solution
No response
Additional context
No response