apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.3k stars 1.19k forks source link

Alias `APPROX_PERCENTILE_CONT` as `PERCENTILE_CONT`? #12533

Open samuelcolvin opened 1 month ago

samuelcolvin commented 1 month ago

Is your feature request related to a problem or challenge?

See https://github.com/pydantic/logfire/issues/433, it would be great to have a percentile_cont function available in DF that performed similarly to the postgres function of the same name.

Would it be appropriate/reasonable to simply alias approx_percentile_cont as percentile_cont? I've ready #1539, but I'm not familiar enough with the behaviour to know if it makes sense.

Describe the solution you'd like

Ideally the "fix" is as simple as adding an alias?

Describe alternatives you've considered

we could add the alias just in our code, but I'd love to hear whether @Dandandan @domodwyer @alamb think that makes sense?

Additional context

I also mentioned the need for WITHIN GROUP support in #11732.

alamb commented 1 month ago

I think the expectation for PERCENTILE_CONT is that it will implement an exact calculation -- and to do so the implementation needs to keep all the actual values (e.g. the same way MEDIAN works).

So in other words, I think aliasing PERCENTILE_CONT to the approximate version would be confusing to anyone who actually needed the real value 🤔