apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.87k stars 1.11k forks source link

`approx_distinct` should be leveraging bitmap for counting u8/16 and i8/16 #1109

Open jimexist opened 2 years ago

jimexist commented 2 years ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] (This section helps Arrow developers understand the context and why for this feature, in addition to the what)

approx_distinct should be leveraging bitmap for counting u8/16 and i8/16

Describe the solution you'd like

Using a bitmap is more efficient

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

alamb commented 2 years ago

See code at https://github.com/apache/arrow-datafusion/blob/397110ab6948ea80a14155d65acaf55e23fd624e/datafusion/src/physical_plan/expressions/approx_distinct.rs#L88-L91

Weijun-H commented 9 months ago

Why don't we also support u32/i32 🤔 ?

alamb commented 9 months ago

there is no reason I know of