Open skyzh opened 3 weeks ago
probably we need a new set of rules there -- the aggregation only needs the first column of the part
table, so we can convert most the joins into semi joins. otherwise, datafusion will fail with:
attempt to multiply with overflow
when computing the statistics for the cross join operator
this seems related to the problem of "how to do column pruning / projection pushdown in cascades?"
the problem is the generated distinct aggregation from the initial depjoin step consists of 5-way nested loop join without any filter within the aggregation child, which cannot be executed efficiently. either something wrong with the depjoin rules, or we need to implement pushdown across aggregation nodes?