apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
649 stars 119 forks source link

Improve performance of TPC-H q19 #572

Open andygrove opened 2 weeks ago

andygrove commented 2 weeks ago

What is the problem the feature request solves?

Comet is currently slower than Spark for query 19.

Some initial observations:

Describe the potential solution

No response

Additional context

No response

parthchandra commented 1 week ago
  • The Parquet scan of lineitem seems to take ~10% longer than Spark and 60%+ of the time is spent in native decoding, so perhaps we should add criterion benchmarks for decoding for all types in lineitem and look for optimization opportunities there. I tested both before and after the recent changes to this code and saw no difference.

There is a chance that this is not in native decoding but in CometVector.getDecimal depending on if useDecimal128 is enabled or not. This part has a lot of data copying going on.

      byte[] bytes = getBinaryDecimal(i);
      BigInteger bigInteger = new BigInteger(bytes);
      BigDecimal javaDecimal = new BigDecimal(bigInteger, scale);
      try {
        return Decimal.apply(javaDecimal, precision, scale);