This PR aims to fix SparkBenchmark in Parquet format according to SPARK-40918.
Why are the changes needed?
Similar to ORC-1578, there are similar problems when reading parquet format files in SparkBenchmark.
java.lang.IllegalArgumentException: OPTION_RETURNING_BATCH should always be set for ParquetFileFormat. To workaround this issue, set spark.sql.parquet.enableVectorizedReader=false.
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$1(ParquetFileFormat.scala:192)
at scala.collection.immutable.Map$EmptyMap$.getOrElse(Map.scala:110)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.buildReaderWithPartitionValues(ParquetFileFormat.scala:191)
at org.apache.orc.bench.spark.SparkBenchmark.pushDown(SparkBenchmark.java:314)
at org.apache.orc.bench.spark.jmh_generated.SparkBenchmark_pushDown_jmhTest.pushDown_avgt_jmhStub(SparkBenchmark_pushDown_jmhTest.java:219)
How was this patch tested?
local test
Was this patch authored or co-authored using generative AI tooling?
What changes were proposed in this pull request?
This PR aims to fix SparkBenchmark in Parquet format according to SPARK-40918.
Why are the changes needed?
Similar to ORC-1578, there are similar problems when reading parquet format files in SparkBenchmark.
How was this patch tested?
local test
Was this patch authored or co-authored using generative AI tooling?
No