apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.51k stars 1.03k forks source link

Convert `Min` and `Max` to UDAF #10943

Open jayzhan211 opened 3 weeks ago

jayzhan211 commented 3 weeks ago

Is your feature request related to a problem or challenge?

Similar to other issues in #8708

I think the change of this might be large, could split it in several PR.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

edmondop commented 2 weeks ago

I am working on this :)

edmondop commented 2 weeks ago

Well I am not anymore. We have a problem :D @jayzhan211 the problem is that if we modified the proto and remove min, we remove the 0 value, and this is not allowed in protobuf see https://protobuf.dev/programming-guides/proto3/#enum

https://github.com/apache/datafusion/pull/10898/files#diff-dfbd8463b9c4a1e7ac21cf0154a33966bb72a86df15d5e1409906a08cbdeb4dfR475

jayzhan211 commented 2 weeks ago

Well I am not anymore. We have a problem :D @jayzhan211 the problem is that if we modified the proto and remove min, we remove the 0 value, and this is not allowed in protobuf see https://protobuf.dev/programming-guides/proto3/#enum

https://github.com/apache/datafusion/pull/10898/files#diff-dfbd8463b9c4a1e7ac21cf0154a33966bb72a86df15d5e1409906a08cbdeb4dfR475

We can change it to 'unknown', as I know Datafusion does not ensure the backward compatibility for proto, so it is fine. We will even remove whole proto after converting all the functions to UDAF

edmondop commented 2 weeks ago

@jayzhan211 https://github.com/apache/datafusion/pull/11013/files in this draft I started to face a challenge, many tests in expr depends on min/max, and I need to remove them to avoid a circular dependency. Shall we add them back as sql tests?

jayzhan211 commented 2 weeks ago

@jayzhan211 https://github.com/apache/datafusion/pull/11013/files in this draft I started to face a challenge, many tests in expr depends on min/max, and I need to remove them to avoid a circular dependency. Shall we add them back as sql tests?

For tests in datafusion_expr, you can use function_stub. See datafusion/expr/src/test/function_stub.rs For tests in optimizer or others, you can import function in dev-dependencies or move them to slt.