we should probably call it cardinality_reducing and allow anything that is !cardinality_reducing
FilterExec is cardinality reducing, and should not be whitelisted here
things like CoalescePartitionsExec should be white listed because they don't change cardinality
in theory, something like a left outer join can only increase cardinality, so should be allowed as well
I would suggest we add a cardinality_reducing() or not_cardinality_reducing() method to the ExecutionPlan trait, so we don't need to maintain this downcast list.
Is not because AggregateExec could return 100 rows, FilterExec could make that 50, then GlobalLimitExec doesn't get the 100 it was guaranteed in the original plan.
Describe the bug
@dispanser pointed out a probable bug in TopKAggregate:
https://github.com/apache/datafusion/blob/77f330c6a2b26f2d1d4d4bf11d456fad466316b4/datafusion/physical-optimizer/src/topk_aggregation.rs#L102
cardinality_reducing
and allow anything that is!cardinality_reducing
FilterExec
is cardinality reducing, and should not be whitelisted hereCoalescePartitionsExec
should be white listed because they don't change cardinalityI would suggest we add a
cardinality_reducing()
ornot_cardinality_reducing()
method to theExecutionPlan
trait, so we don't need to maintain this downcast list.i.e.
is fine because
CoalescePartitionsExec
means the100
rows fromAggregateExec
will be passed all the way up toGlobalLimitExec
.However:
Is not because
AggregateExec
could return100
rows,FilterExec
could make that50
, thenGlobalLimitExec
doesn't get the100
it was guaranteed in the original plan.To Reproduce
We need a new test.
Expected behavior
Don't lose rows.
Additional context
No response