ORC-1578: Fix `SparkBenchmark` on `sales` data according to SPARK-40918

What changes were proposed in this pull request?

This PR aims to fix SparkBenchmark according to the requirement of SPARK-40918.

Note that this fixes the synthetic benchmark on Sales data. For the other real-life dataset (github and taxi), we will revisit.

Why are the changes needed?

Generate Sales data

$ java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -f orc -d sales -s 1000000

Run Spark Benchmark


$ java -jar spark/target/orc-benchmarks-spark-2.1.0-SNAPSHOT.jar spark data -d sales -f orc
# Run complete. Total time: 00:10:45

Benchmark SparkBenchmark.fullRead SparkBenchmark.fullRead:bytesPerRecord SparkBenchmark.fullRead:ops SparkBenchmark.fullRead:perRecord SparkBenchmark.fullRead:records SparkBenchmark.fullRead SparkBenchmark.fullRead:bytesPerRecord SparkBenchmark.fullRead:ops SparkBenchmark.fullRead:perRecord SparkBenchmark.fullRead:records SparkBenchmark.fullRead SparkBenchmark.fullRead:bytesPerRecord SparkBenchmark.fullRead:ops SparkBenchmark.fullRead:perRecord SparkBenchmark.fullRead:records SparkBenchmark.partialRead SparkBenchmark.partialRead:bytesPerRecord SparkBenchmark.partialRead:ops SparkBenchmark.partialRead:perRecord SparkBenchmark.partialRead:records SparkBenchmark.partialRead SparkBenchmark.partialRead:bytesPerRecord SparkBenchmark.partialRead:ops SparkBenchmark.partialRead:perRecord SparkBenchmark.partialRead:records SparkBenchmark.partialRead SparkBenchmark.partialRead:bytesPerRecord SparkBenchmark.partialRead:ops SparkBenchmark.partialRead:perRecord SparkBenchmark.partialRead:records SparkBenchmark.pushDown SparkBenchmark.pushDown:bytesPerRecord SparkBenchmark.pushDown:ops SparkBenchmark.pushDown:perRecord SparkBenchmark.pushDown:records SparkBenchmark.pushDown SparkBenchmark.pushDown:bytesPerRecord SparkBenchmark.pushDown:ops SparkBenchmark.pushDown:perRecord SparkBenchmark.pushDown:records SparkBenchmark.pushDown SparkBenchmark.pushDown:bytesPerRecord SparkBenchmark.pushDown:ops SparkBenchmark.pushDown:perRecord SparkBenchmark.pushDown:records (compression) (dataset) (format) Mode Cnt Score Error Units gz sales orc avgt 5 686792.235 ± 4398.971 us/op gz sales orc avgt 5 0.192 # gz sales orc avgt 5 40.000 # gz sales orc avgt 5 0.687 ± 0.004 us/op gz sales orc avgt 5 5000000.000 # snappy sales orc avgt 5 286166.380 ± 19864.429 us/op snappy sales orc avgt 5 0.201 # snappy sales orc avgt 5 40.000 # snappy sales orc avgt 5 0.286 ± 0.020 us/op snappy sales orc avgt 5 5000000.000 # zstd sales orc avgt 5 384394.233 ± 10057.315 us/op zstd sales orc avgt 5 0.192 # zstd sales orc avgt 5 40.000 # zstd sales orc avgt 5 0.384 ± 0.010 us/op zstd sales orc avgt 5 5000000.000 # gz sales orc avgt 5 41683.914 ± 4046.077 us/op gz sales orc avgt 5 0.192 # gz sales orc avgt 5 40.000 # gz sales orc avgt 5 0.042 ± 0.004 us/op gz sales orc avgt 5 5000000.000 # snappy sales orc avgt 5 23981.054 ± 17874.229 us/op snappy sales orc avgt 5 0.201 # snappy sales orc avgt 5 40.000 # snappy sales orc avgt 5 0.024 ± 0.018 us/op snappy sales orc avgt 5 5000000.000 # zstd sales orc avgt 5 41433.277 ± 25110.021 us/op zstd sales orc avgt 5 0.192 # zstd sales orc avgt 5 40.000 # zstd sales orc avgt 5 0.041 ± 0.025 us/op zstd sales orc avgt 5 5000000.000 # gz sales orc avgt 5 23760.997 ± 833.034 us/op gz sales orc avgt 5 19.153 # gz sales orc avgt 5 40.000 # gz sales orc avgt 5 2.376 ± 0.083 us/op gz sales orc avgt 5 50000.000 # snappy sales orc avgt 5 14062.508 ± 1793.691 us/op snappy sales orc avgt 5 20.105 # snappy sales orc avgt 5 40.000 # snappy sales orc avgt 5 1.406 ± 0.179 us/op snappy sales orc avgt 5 50000.000 # zstd sales orc avgt 5 15597.651 ± 1307.246 us/op zstd sales orc avgt 5 19.213 # zstd sales orc avgt 5 40.000 # zstd sales orc avgt 5 1.560 ± 0.131 us/op zstd sales orc avgt 5 50000.000 #



### How was this patch tested?

Pass the CIs.

apache / orc

ORC-1578: Fix `SparkBenchmark` on `sales` data according to SPARK-40918 #1734

What changes were proposed in this pull request?

Why are the changes needed?