apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.21k stars 435 forks source link

[VL] Add support of percentile_approx / approx_percentile #4889

Open zhztheplayer opened 8 months ago

zhztheplayer commented 8 months ago

Description

Part of https://github.com/apache/incubator-gluten/issues/4039

Velox(presto)'s approx_percentile has different intermediate types with Spark's function with same name.

Velox's signature code of approx_percentile :

void addSignatures(
    const std::string& inputType,
    const std::string& percentileType,
    const std::string& returnType,
    std::vector<std::shared_ptr<exec::AggregateFunctionSignature>>&
        signatures) {
  auto intermediateType = fmt::format(
      "row(array(double), boolean, double, integer, bigint, {0}, {0}, array({0}), array(integer))",
      inputType);
  signatures.push_back(exec::AggregateFunctionSignatureBuilder()
                           .returnType(returnType)
                           .intermediateType(intermediateType)
                           .argumentType(inputType)
                           .argumentType(percentileType)
                           .build());
  signatures.push_back(exec::AggregateFunctionSignatureBuilder()
                           .returnType(returnType)
                           .intermediateType(intermediateType)
                           .argumentType(inputType)
                           .argumentType("bigint")
                           .argumentType(percentileType)
                           .build());
  signatures.push_back(exec::AggregateFunctionSignatureBuilder()
                           .returnType(returnType)
                           .intermediateType(intermediateType)
                           .argumentType(inputType)
                           .argumentType(percentileType)
                           .argumentType("double")
                           .build());
  signatures.push_back(exec::AggregateFunctionSignatureBuilder()
                           .returnType(returnType)
                           .intermediateType(intermediateType)
                           .argumentType(inputType)
                           .argumentType("bigint")
                           .argumentType(percentileType)
                           .argumentType("double")
                           .build());
}

Link https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/ApproxPercentileAggregate.cpp#L790C1-L828C1

Spark's code of approx_percentile / percentile_approx: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala

zhztheplayer commented 8 months ago

cc @liujiayi771 @ulysses-you I took the function in https://github.com/apache/incubator-gluten/issues/4039 but may delay the implementation. I see you have been actively working on aggregation functions in past a few of days so (you or anyone) please feel free to take this issue if interested.

It looks like the new utility introduced by #4764 could help a lot here but may still require for some changes though. I am not sure and you may have more context in there than me.

ulysses-you commented 8 months ago

Thank you @zhztheplayer for driving this, I think @liujiayi771 is the better one to take it :)

liujiayi771 commented 8 months ago

Thank you @zhztheplayer for driving this, I think @liujiayi771 is the better one to take it :)

@zhztheplayer I can come to add these aggregate functions, thanks.

zhztheplayer commented 8 months ago

Thank you @zhztheplayer for driving this, I think @liujiayi771 is the better one to take it :)

@zhztheplayer I can come to add these aggregate functions, thanks.

@liujiayi771 Glad to hear you'll help. I'll reassign the task to you. Thanks!

WangGuangxin commented 8 months ago

I have a draft version before, and I can take over this if you have not start yet @liujiayi771 @zhztheplayer

liujiayi771 commented 8 months ago

@WangGuangxin I have not started yet, feel free to take it.