Open andygrove opened 2 weeks ago
- The Parquet scan of lineitem seems to take ~10% longer than Spark and 60%+ of the time is spent in native decoding, so perhaps we should add criterion benchmarks for decoding for all types in lineitem and look for optimization opportunities there. I tested both before and after the recent changes to this code and saw no difference.
There is a chance that this is not in native decoding but in CometVector.getDecimal
depending on if useDecimal128
is enabled or not. This part has a lot of data copying going on.
byte[] bytes = getBinaryDecimal(i);
BigInteger bigInteger = new BigInteger(bytes);
BigDecimal javaDecimal = new BigDecimal(bigInteger, scale);
try {
return Decimal.apply(javaDecimal, precision, scale);
What is the problem the feature request solves?
Comet is currently slower than Spark for query 19.
Some initial observations:
part
table where it takes 18 seconds for 48k rows. Spark performs the C2R on 20 million rows and then filters down to 48k and that whole process only takes 3.2 seconds.Describe the potential solution
No response
Additional context
No response