Improve performance of TPC-H q19

What is the problem the feature request solves?

Comet is currently slower than Spark for query 19.

Some initial observations:

The sort merge join cannot run natively due to https://github.com/apache/datafusion-comet/issues/398
The Parquet scan of lineitem seems to take ~10% longer than Spark and 60%+ of the time is spent in native decoding, so perhaps we should add criterion benchmarks for decoding for all types in lineitem and look for optimization opportunities there. I tested both before and after the recent changes to this code and saw no difference.
Comet avoids a very expensive C2R on 600 million rows from lineitem because it applies a filter before any C2R, so it is suprising that we are still slower
With Comet, there is a really slow C2R on the part table where it takes 18 seconds for 48k rows. Spark performs the C2R on 20 million rows and then filters down to 48k and that whole process only takes 3.2 seconds.
Spark coalesces down to 9 partitions and the HashAggregate takes 5.7 seconds and produces 9 rows, but we disable coalesce partitions with Comet and the HashAggregate there takes 11.2 seconds and produces 200 rows. Fixing https://github.com/apache/datafusion-comet/issues/387 would help with this

Describe the potential solution

No response

Additional context

No response

The Parquet scan of lineitem seems to take ~10% longer than Spark and 60%+ of the time is spent in native decoding, so perhaps we should add criterion benchmarks for decoding for all types in lineitem and look for optimization opportunities there. I tested both before and after the recent changes to this code and saw no difference.

There is a chance that this is not in native decoding but in CometVector.getDecimal depending on if useDecimal128 is enabled or not. This part has a lot of data copying going on.

      byte[] bytes = getBinaryDecimal(i);
      BigInteger bigInteger = new BigInteger(bytes);
      BigDecimal javaDecimal = new BigDecimal(bigInteger, scale);
      try {
        return Decimal.apply(javaDecimal, precision, scale);

apache / datafusion-comet

Improve performance of TPC-H q19 #572

What is the problem the feature request solves?

Describe the potential solution

Additional context