apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
665 stars 477 forks source link

ORC-1704: Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark #1912

Closed cxzl25 closed 2 months ago

cxzl25 commented 2 months ago

What changes were proposed in this pull request?

This PR aims to migrate to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark.

Why are the changes needed?

https://github.com/apache/orc/pull/1909#pullrequestreview-2020282867

How was this patch tested?

local test

java -jar spark/target/orc-benchmarks-spark-2.1.0-SNAPSHOT.jar spark data -format=parquet  -compress zstd -data taxi
Benchmark                                  (compression)  (dataset)  (format)  Mode  Cnt          Score       Error  Units
SparkBenchmark.partialRead                          zstd       taxi   parquet  avgt    5      17211.731 ± 11836.315  us/op
SparkBenchmark.partialRead:bytesPerRecord           zstd       taxi   parquet  avgt    5          0.002                  #
SparkBenchmark.partialRead:ops                      zstd       taxi   parquet  avgt    5         10.000                  #
SparkBenchmark.partialRead:perRecord                zstd       taxi   parquet  avgt    5          0.001 ±     0.001  us/op
SparkBenchmark.partialRead:records                  zstd       taxi   parquet  avgt    5  113791180.000                  #

Was this patch authored or co-authored using generative AI tooling?

No

dongjoon-hyun commented 2 months ago

Merged to main/2.0.