What is your question?

After generating 100GB data with nds_gen_data.py as README wrote, I cannot use my GPU(Tesla T4) to convert them to parquet. In fact, I got a warn as below and the work is stucked then. (base) [root@localhost 13:41 /home/gpu/spark-rapids-benchmarks/nds]# ./test_transcode.sh

source convert_submit_gpu ++ source base.template +++ export SPARK_MASTER=spark://192.168.110.13:7077 +++ SPARK_MASTER=spark://192.168.110.13:7077 +++ export DRIVER_MEMORY=2G +++ DRIVER_MEMORY=2G +++ export EXECUTOR_CORES=4 +++ EXECUTOR_CORES=4 +++ export NUM_EXECUTORS=4 +++ NUM_EXECUTORS=4 +++ export EXECUTOR_MEMORY=12G +++ EXECUTOR_MEMORY=12G +++ export NDS_LISTENER_JAR=/home/gpu/spark-rapids-benchmarks/nds/jvm_listener/target/nds-benchmark-listener-1.0-SNAPSHOT.jar +++ NDS_LISTENER_JAR=/home/gpu/spark-rapids-benchmarks/nds/jvm_listener/target/nds-benchmark-listener-1.0-SNAPSHOT.jar +++ export SPARK_RAPIDS_PLUGIN_JAR=/opt/sparkRapidsPlugin/rapids-4-spark_2.12-24.02.0.jar +++ SPARK_RAPIDS_PLUGIN_JAR=/opt/sparkRapidsPlugin/rapids-4-spark_2.12-24.02.0.jar ++++ echo /home/gpu/spark/python/lib/py4j-0.10.9.7-src.zip +++ export PYTHONPATH=/home/gpu/spark/python:/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip +++ PYTHONPATH=/home/gpu/spark/python:/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip ++ export CONCURRENT_GPU_TASKS=2 ++ CONCURRENT_GPU_TASKS=2 ++ export SHUFFLE_PARTITIONS=200 ++ SHUFFLE_PARTITIONS=200 ++ SPARK_CONF=("--master" "local" "--deploy-mode" "client" "--conf" "spark.driver.memory=${DRIVER_MEMORY}" "--conf" "spark.executor.cores=${EXECUTOR_CORES}" "--conf" "spark.executor.instances=${NUM_EXECUTORS}" "--conf" "spark.executor.memory=${EXECUTOR_MEMORY}" "--conf" "spark.sql.shuffle.partitions=${SHUFFLE_PARTITIONS}" "--conf" "spark.executor.resource.gpu.amount=1" "--conf" "spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh" "--conf" "spark.plugins=com.nvidia.spark.SQLPlugin" "--conf" "spark.rapids.memory.pinnedPool.size=8g" "--conf" "spark.rapids.sql.concurrentGpuTasks=${CONCURRENT_GPU_TASKS}" "--conf" "spark.rapids.sql.explain=NOT_ON_GPU" "--conf" "spark.rapids.sql.incompatibleOps.enabled=true" "--conf" "spark.rapids.sql.variableFloatAgg.enabled=true" "--conf" "spark.sql.files.maxPartitionBytes=512m" "--conf" "spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED" "--conf" "spark.task.resource.gpu.amount=0.5" "--conf" "spark.rapids.sql.csv.read.decimal.enabled=true" "--files" "$SPARK_HOME/examples/src/main/scripts/getGpusResources.sh" "--jars" "$SPARK_RAPIDS_PLUGIN_JAR") ++ export SPARK_CONF
MORE_ARGS=("${@:2}")
CMD=("$SPARK_HOME/bin/spark-submit")
CMD+=("${SPARK_CONF[@]}")
CMD+=("${MORE_ARGS[@]}")
/home/tcn/chasen/gpu/spark/bin/spark-submit --master local --deploy-mode client --conf spark.driver.memory=2G --conf spark.executor.cores=4 --conf spark.executor.instances=4 --conf spark.executor.memory=12G --conf spark.sql.shuffle.partitions=200 --conf spark.executor.resource.gpu.amount=1 --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh --conf spark.plugins=com.nvidia.spark.SQLPlugin --conf spark.rapids.memory.pinnedPool.size=8g--conf spark.rapids.sql.concurrentGpuTasks=2 --conf spark.rapids.sql.explain=NOT_ON_GPU --conf spark.rapids.sql.incompatibleOps.enabled=true --conf spark.rapids.sql.variableFloatAgg.enabled=true --conf spark.sql.files.maxPartitionBytes=512m --conf spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED --conf spark.task.resource.gpu.amount=0.5 --conf spark.rapids.sql.csv.read.decimal.enabled=true --files /home/gpu/spark/examples/src/main/scripts/getGpusResources.sh --jars /opt/sparkRapidsPlugin/rapids-4-spark_2.12-24.02.0.jar nds_transcode.py /mnt/raw_sf100 /mnt/parquet_sf100_gpu result/transcode/report_gpu.txt 24/05/11 13:42:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 24/05/11 13:42:28 INFO SparkContext: Running Spark version 3.4.2 24/05/11 13:42:28 INFO ResourceUtils: ============================================================== 24/05/11 13:42:28 INFO ResourceUtils: No custom resources configured for spark.driver. 24/05/11 13:42:28 INFO ResourceUtils: ============================================================== 24/05/11 13:42:28 INFO SparkContext: Submitted application: NDS - transcode - parquet 24/05/11 13:42:28 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: , memory -> name: memory, amount: 12288, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: , gpu -> name: gpu, amount: 1, script: ./getGpusResources.sh, vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0, gpu -> name: gpu, amount: 0.5) 24/05/11 13:42:28 INFO ResourceProfile: Limiting resource is gpu at 2 tasks per executor 24/05/11 13:42:28 WARN ResourceUtils: The configuration of cores (exec = 4 task = 1, runnable tasks = 4) will result in wasted resources due to resource gpu limiting the number of runnable tasks per executor to: 2. Please adjust your configuration. 24/05/11 13:42:28 INFO ResourceProfileManager: Added ResourceProfile id: 0 24/05/11 13:42:28 INFO SecurityManager: Changing view acls to: root 24/05/11 13:42:28 INFO SecurityManager: Changing modify acls to: root 24/05/11 13:42:28 INFO SecurityManager: Changing view acls groups to: 24/05/11 13:42:28 INFO SecurityManager: Changing modify acls groups to: 24/05/11 13:42:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: root; groups with view permissions: EMPTY; users with modify permissions: root; groups with modify permissions: EMPTY 24/05/11 13:42:28 INFO Utils: Successfully started service 'sparkDriver' on port 33303. 24/05/11 13:42:28 INFO SparkEnv: Registering MapOutputTracker 24/05/11 13:42:28 INFO SparkEnv: Registering BlockManagerMaster 24/05/11 13:42:28 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 24/05/11 13:42:28 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 24/05/11 13:42:28 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 24/05/11 13:42:28 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-303f4523-1fb8-4e83-a911-29f3752ec04b 24/05/11 13:42:28 INFO MemoryStore: MemoryStore started with capacity 912.3 MiB 24/05/11 13:42:28 INFO SparkEnv: Registering OutputCommitCoordinator 24/05/11 13:42:28 INFO JettyUtils: Start Jetty 192.168.110.13:4040 for SparkUI 24/05/11 13:42:28 INFO Utils: Successfully started service 'SparkUI' on port 4040. 24/05/11 13:42:29 INFO SparkContext: Added JAR file:///opt/sparkRapidsPlugin/rapids-4-spark_2.12-24.02.0.jar at spark://192.168.110.13:33303/jars/rapids-4-spark_2.12-24.02.0.jar with timestamp 1715406148539 24/05/11 13:42:29 INFO SparkContext: Added file file:///home/gpu/spark/examples/src/main/scripts/getGpusResources.sh at file:///home/gpu/spark/examples/src/main/scripts/getGpusResources.sh with timestamp 1715406148539 24/05/11 13:42:29 INFO Utils: Copying /home/gpu/spark/examples/src/main/scripts/getGpusResources.sh to /tmp/spark-9e2b6b49-bba9-420e-9a47-1927f5025a19/userFiles-50d53468-d8f1-4b3d-af93-59b5b6c9f87a/getGpusResources.sh 24/05/11 13:42:29 INFO ShimLoader: Loading shim for Spark version: 3.4.2 24/05/11 13:42:29 INFO ShimLoader: Complete Spark build info: 3.4.2, https://github.com/apache/spark, HEAD, 0c0e7d4087c64efca259b4fb656b8be643be5686, 2023-11-25T07:25:47Z 24/05/11 13:42:29 INFO ShimLoader: Scala version: version 2.12.17 24/05/11 13:42:29 INFO ShimLoader: findURLClassLoader found a URLClassLoader org.apache.spark.util.MutableURLClassLoader@68d6972f 24/05/11 13:42:29 INFO ShimLoader: Updating spark classloader org.apache.spark.util.MutableURLClassLoader@68d6972f with the URLs: jar:file:/opt/sparkRapidsPlugin/rapids-4-spark_2.12-24.02.0.jar!/spark3xx-common/, jar:file:/opt/sparkRapidsPlugin/rapids-4-spark_2.12-24.02.0.jar!/spark342/ 24/05/11 13:42:29 INFO ShimLoader: Spark classLoader org.apache.spark.util.MutableURLClassLoader@68d6972f updated successfully 24/05/11 13:42:29 INFO ShimLoader: Updating spark classloader org.apache.spark.util.MutableURLClassLoader@68d6972f with the URLs: jar:file:/opt/sparkRapidsPlugin/rapids-4-spark_2.12-24.02.0.jar!/spark3xx-common/, jar:file:/opt/sparkRapidsPlugin/rapids-4-spark_2.12-24.02.0.jar!/spark342/ 24/05/11 13:42:29 INFO ShimLoader: Spark classLoader org.apache.spark.util.MutableURLClassLoader@68d6972f updated successfully 24/05/11 13:42:29 INFO RapidsPluginUtils: RAPIDS Accelerator build: {version=24.02.0, user=, url=https://github.com/NVIDIA/spark-rapids.git, date=2024-03-06T15:31:17Z, revision=4b866f591e5079255ad88d39560211fa1053f191, cudf_version=24.02.1, branch=HEAD} 24/05/11 13:42:29 INFO RapidsPluginUtils: RAPIDS Accelerator JNI build: {version=24.02.1, user=, url=https://github.com/NVIDIA/spark-rapids-jni.git, date=2024-02-28T05:34:16Z, revision=298fe9ff0a5d704dd2b7f3f0dd2df50c5637c52e, branch=HEAD} 24/05/11 13:42:29 INFO RapidsPluginUtils: cudf build: {version=24.02.1, user=, url=https://github.com/rapidsai/cudf.git, date=2024-02-28T05:34:16Z, revision=dd34fdbe35e68ba56a2183f11ed822ddaa6c927b, branch=HEAD} 24/05/11 13:42:29 WARN RapidsPluginUtils: RAPIDS Accelerator 24.02.0 using cudf 24.02.1. 24/05/11 13:42:29 WARN RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 20. 24/05/11 13:42:29 WARN RapidsPluginUtils: The current setting of spark.task.resource.gpu.amount (0.5) is not ideal to get the best performance from the RAPIDS Accelerator plugin. It's recommended to be 1/{executor core count} unless you have a special use case. 24/05/11 13:42:29 WARN RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set spark.rapids.sql.enabled to false. 24/05/11 13:42:29 WARN RapidsPluginUtils: spark.rapids.sql.explain is set to NOT_ON_GPU. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU. 24/05/11 13:42:29 INFO DriverPluginContainer: Initialized driver component for plugin com.nvidia.spark.SQLPlugin. 24/05/11 13:42:29 INFO Executor: Starting executor ID driver on host 192.168.110.13 24/05/11 13:42:29 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): '' 24/05/11 13:42:29 INFO Executor: Fetching file:///home/gpu/spark/examples/src/main/scripts/getGpusResources.sh with timestamp 1715406148539 24/05/11 13:42:29 INFO Utils: /home/gpu/spark/examples/src/main/scripts/getGpusResources.sh has been previously copied to /tmp/spark-9e2b6b49-bba9-420e-9a47-1927f5025a19/userFiles-50d53468-d8f1-4b3d-af93-59b5b6c9f87a/getGpusResources.sh 24/05/11 13:42:29 INFO Executor: Fetching spark://192.168.110.13:33303/jars/rapids-4-spark_2.12-24.02.0.jar with timestamp 1715406148539 24/05/11 13:42:29 INFO TransportClientFactory: Successfully created connection to /192.168.110.13:33303 after 17 ms (0 ms spent in bootstraps) 24/05/11 13:42:29 INFO Utils: Fetching spark://192.168.110.13:33303/jars/rapids-4-spark_2.12-24.02.0.jar to /tmp/spark-9e2b6b49-bba9-420e-9a47-1927f5025a19/userFiles-50d53468-d8f1-4b3d-af93-59b5b6c9f87a/fetchFileTemp5471135308516178978.tmp 24/05/11 13:42:29 INFO Executor: Adding file:/tmp/spark-9e2b6b49-bba9-420e-9a47-1927f5025a19/userFiles-50d53468-d8f1-4b3d-af93-59b5b6c9f87a/rapids-4-spark_2.12-24.02.0.jar to class loader 24/05/11 13:42:34 INFO RapidsPluginUtils: Estimated number of cores is 1 24/05/11 13:42:34 INFO RapidsExecutorPlugin: RAPIDS Accelerator build: {version=24.02.0, user=, url=https://github.com/NVIDIA/spark-rapids.git, date=2024-03-06T15:31:17Z, revision=4b866f591e5079255ad88d39560211fa1053f191, cudf_version=24.02.1, branch=HEAD} 24/05/11 13:42:34 INFO RapidsExecutorPlugin: cudf build: {version=24.02.1, user=, url=https://github.com/rapidsai/cudf.git, date=2024-02-28T05:34:16Z, revision=dd34fdbe35e68ba56a2183f11ed822ddaa6c927b, branch=HEAD} 24/05/11 13:42:34 INFO RapidsExecutorPlugin: Initializing memory from Executor Plugin 24/05/11 13:42:34 INFO GpuDeviceManager: Initializing RMM ASYNC pool size = 14164.75 MB on gpuId 0 24/05/11 13:42:34 INFO GpuDeviceManager: Using per-thread default stream 24/05/11 13:42:34 INFO ShimDiskBlockManager: Created local directory at /tmp/blockmgr-5979d6de-95e9-4e65-8d51-928f214d5787 24/05/11 13:42:34 INFO RapidsBufferCatalog: Installing GPU memory handler for spill 24/05/11 13:42:34 INFO GpuDeviceManager: Initializing pinned memory pool (8192.0 MiB) 24/05/11 13:42:36 INFO ExecutorPluginContainer: Initialized executor component for plugin com.nvidia.spark.SQLPlugin. 24/05/11 13:42:36 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37311. 24/05/11 13:42:36 INFO NettyBlockTransferService: Server created on 192.168.110.13:37311 24/05/11 13:42:36 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 24/05/11 13:42:36 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.110.13, 37311, None) 24/05/11 13:42:36 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.110.13:37311 with 912.3 MiB RAM, BlockManagerId(driver, 192.168.110.13, 37311, None) 24/05/11 13:42:36 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.110.13, 37311, None) 24/05/11 13:42:36 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.110.13, 37311, None) Load Test Start Time: 2024-05-11 13:42:36.295765 24/05/11 13:42:36 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. 24/05/11 13:42:36 INFO SharedState: Warehouse path is 'file:/home/gpu/spark-rapids-benchmarks/nds/spark-warehouse'. 24/05/11 13:42:36 WARN SQLConf: The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' has been deprecated in Spark v3.2 and may be removed in the future. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead. 24/05/11 13:42:36 WARN SQLConf: The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' has been deprecated in Spark v3.2 and may be removed in the future. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead. 24/05/11 13:42:36 INFO InMemoryFileIndex: It took 47 ms to list leaf files for 1 paths. 24/05/11 13:42:38 INFO FileSourceStrategy: Pushed Filters: 24/05/11 13:42:38 INFO FileSourceStrategy: Post-Scan Filters: 24/05/11 13:42:38 WARN GpuOverrides: !Exec cannot run on GPU because GpuCSVScan only supports UTF8 encoded data

24/05/11 13:42:38 INFO GpuOverrides: Plan conversion to the GPU took 59.87 ms 24/05/11 13:42:38 INFO GpuOverrides: GPU plan transition optimization took 15.15 ms 24/05/11 13:42:38 INFO GpuParquetFileFormat: Using default output committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter 24/05/11 13:42:38 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 24/05/11 13:42:38 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 24/05/11 13:42:38 INFO SQLHadoopMapReduceCommitProtocol: Using user defined output committer class org.apache.parquet.hadoop.ParquetOutputCommitter 24/05/11 13:42:38 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 24/05/11 13:42:38 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 24/05/11 13:42:38 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.parquet.hadoop.ParquetOutputCommitter 24/05/11 13:42:38 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 367.9 KiB, free 911.9 MiB) 24/05/11 13:42:38 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 37.1 KiB, free 911.9 MiB) 24/05/11 13:42:38 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.110.13:37311 (size: 37.1 KiB, free: 912.3 MiB) 24/05/11 13:42:38 INFO SparkContext: Created broadcast 0 from execute at GpuRowToColumnarExec.scala:896 24/05/11 13:42:38 INFO FileSourceScanExec: Planning scan with bin packing, max size: 536870912 bytes, open cost is considered as scanning 4194304 bytes. 24/05/11 13:42:38 INFO SparkContext: Starting job: runColumnar at GpuDataWritingCommandExec.scala:117 24/05/11 13:42:38 INFO DAGScheduler: Got job 0 (runColumnar at GpuDataWritingCommandExec.scala:117) with 1 output partitions 24/05/11 13:42:38 INFO DAGScheduler: Final stage: ResultStage 0 (runColumnar at GpuDataWritingCommandExec.scala:117) 24/05/11 13:42:38 INFO DAGScheduler: Parents of final stage: List() 24/05/11 13:42:38 INFO DAGScheduler: Missing parents: List() 24/05/11 13:42:38 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[6] at runColumnar at GpuDataWritingCommandExec.scala:117), which has no missing parents 24/05/11 13:42:38 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 231.5 KiB, free 911.7 MiB) 24/05/11 13:42:38 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 85.1 KiB, free 911.6 MiB) 24/05/11 13:42:38 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.110.13:37311 (size: 85.1 KiB, free: 912.2 MiB) 24/05/11 13:42:38 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1535 24/05/11 13:42:38 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[6] at runColumnar at GpuDataWritingCommandExec.scala:117) (first 15 tasks are for partitions Vector(0)) 24/05/11 13:42:38 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0

The task keeps getting stuck here, I don't know why there is this warning !Exec cannot run on GPU because GpuCSVScan only supports UTF8 encoded data. I make sure the system encode is UTF8 with this cmd: (base) [root@localhost 14:04 /home/gpu/spark-rapids-benchmarks/nds]# locale LANG=en_US.utf8 LC_CTYPE="en_US.utf8" LC_NUMERIC="en_US.utf8" LC_TIME="en_US.utf8" LC_COLLATE="en_US.utf8" LC_MONETARY="en_US.utf8" LC_MESSAGES="en_US.utf8" LC_PAPER="en_US.utf8" LC_NAME="en_US.utf8" LC_ADDRESS="en_US.utf8" LC_TELEPHONE="en_US.utf8" LC_MEASUREMENT="en_US.utf8" LC_IDENTIFICATION="en_US.utf8"

Can anybody help me? Thanks so much.

Hi @chasen918, Thanks for reporting this issue. The warning is just to say CSVScan won't be ran on GPU, it will fall back to CPU CSVScan plan so it shouldn't block the spark job running. From the command, you used a local mode to run the job but setting --conf spark.executor.instances=4 for 4 executors, which means requesting 4 GPUs. How many GPUs do you have in your node?

It's probably to remove the spark.executor.instances config then check if you can run successfully.

About the encoding problem, currently we only support UTF8 in CSV Scan on GPU. From the code, the charset in the option is probably not UTF8.

Is that possible to check the environment tab in the spark event log page, search if there's any encoding or charset value set in your spark?

About the encoding problem, currently we only support UTF8 in CSV Scan on GPU. From the code, the charset in the option is probably not UTF8.

Is that possible to check the environment tab in the spark event log page, search if there's any encoding or charset value set in your spark?

@GaryShen2008 Thanks much for your help😊. But right now, I haven't had a chance to delve deep into the code yet... I just want to run some tests with my GPU first. I had tried some changes list below in my work to solve this issue, but they seem not work for this issue. Pls see:

There are no encode errors from spark event log page, I think spark log may not show the charset encode information.

Then, I use command file -i to get encoding format of each data file, I got these formats: iso-8859-1、us-ascii、utf8. I haven't made any encoding settings in my CentOS 7 system. This data was generated directly using the command python nds_gen_data.py hdfs 100 100 /data/raw_sf100 --overwrite_output. However, based on your description, it seems like the data source must be entirely in UTF-8 for it to function correctly, ...I can't make sure🙃.

I used the 'iconv' tool to transcode ISO-8859-1 to UTF-8, but data in UC-ASCII format cannot be treated this way since UC-ASCII is a subset of UTF-8. Additionally, I couldn't find any encoding-related settings in the 'nds_gen_data.py' script.

So, currently, I don't have a way to convert all datasets to UTF-8 format to pass verification of GPUScan. Could you please provide me with some assistance?

Furthermore, I only hava a GPU, so I reset spark.executor.instances=1, but the job still got stuck at the submission stage. Since GPU wasn't supported, it should have switched to CPU, so I couldn't understand why😓. Then, by excluding the GPU settings using the configuration in 'convert_submit_cpu.template', the job ran smoothly🤗.

Thanks Chasen

Then, by excluding the GPU settings using the configuration in 'convert_submit_cpu.template', the job ran smoothly🤗.

You're running in local mode, and Spark will hang in local mode when trying to ask Spark for resources. Remove the GPU resource requests (i.e.: spark.executor.resource. and spark.task.resource.) from the configs. The RAPIDS Accelerator should locate the one GPU you have and use it, even if Spark didn't explicitly schedule for it.

As for the GPU CSV scan fallback, like @GaryShen2008 mentioned the fallback occurs because the CSV parser was passed a charset or encoding option that was not UTF-8. This is happening explicitly in nds_transcode.py. where it sets the encoding to ISO-8859-1. I'm guessing that's what the TPC-DS dsdgen tool is emitting when it creates the CSV data. The GPU does not support ISO-8859-1 which is almost a complete subset of UTF-8 but not quite. As you mentioned above, if you really want the transcode to be performed by the GPU, you can use iconv to convert from iso-8859-1 to utf-8 and then modify nds_transcode.py to remove the .option("encoding", "ISO-8859-1") setting when running on the converted data.

As mentioned above, it's not critical that the CSV read occurs on the GPU during the data transcoding from CSV to Parquet. Once it's in Parquet, strings will be UTF-8 and the GPU will read the Parquet files just fine for all the queries. So if you're just concerned about running NDS queries on the GPU, don't worry about the CSV fallback during the one-time transcode from CSV to Parquet.

Then, by excluding the GPU settings using the configuration in 'convert_submit_cpu.template', the job ran smoothly🤗.

You're running in local mode, and Spark will hang in local mode when trying to ask Spark for resources. Remove the GPU resource requests (i.e.: spark.executor.resource. and spark.task.resource.) from the configs. The RAPIDS Accelerator should locate the one GPU you have and use it, even if Spark didn't explicitly schedule for it.

As for the GPU CSV scan fallback, like @GaryShen2008 mentioned the fallback occurs because the CSV parser was passed a charset or encoding option that was not UTF-8. This is happening explicitly in nds_transcode.py. where it sets the encoding to ISO-8859-1. I'm guessing that's what the TPC-DS dsdgen tool is emitting when it creates the CSV data. The GPU does not support ISO-8859-1 which is almost a complete subset of UTF-8 but not quite. As you mentioned above, if you really want the transcode to be performed by the GPU, you can use iconv to convert from iso-8859-1 to utf-8 and then modify nds_transcode.py to remove the .option("encoding", "ISO-8859-1") setting when running on the converted data.

As mentioned above, it's not critical that the CSV read occurs on the GPU during the data transcoding from CSV to Parquet. Once it's in Parquet, strings will be UTF-8 and the GPU will read the Parquet files just fine for all the queries. So if you're just concerned about running NDS queries on the GPU, don't worry about the CSV fallback during the one-time transcode from CSV to Parquet.

Thank jlowe,

Based on your advice, I have successfully converted the data using CPU and have been using it in subsequent tests. As you said, I understand there shouldn't be any issues. Emm, however, when I tried running the power_run command with my GPU, I met an internal error in Spark😰. Here's the error log:

====== Creating TempView for table customer_address ======
Time taken: 13961 millis for table customer_address
====== Creating TempView for table customer_demographics ======
Time taken: 89 millis for table customer_demographics
====== Creating TempView for table date_dim ======
Time taken: 100 millis for table date_dim
====== Creating TempView for table warehouse ======
Time taken: 74 millis for table warehouse
====== Creating TempView for table ship_mode ======
Time taken: 68 millis for table ship_mode
====== Creating TempView for table time_dim ======
Time taken: 83 millis for table time_dim
====== Creating TempView for table reason ======
Time taken: 59 millis for table reason
====== Creating TempView for table income_band ======
Time taken: 60 millis for table income_band
====== Creating TempView for table item ======
Time taken: 61 millis for table item
====== Creating TempView for table store ======
Time taken: 65 millis for table store
====== Creating TempView for table call_center ======
Time taken: 63 millis for table call_center
====== Creating TempView for table customer ======
Time taken: 58 millis for table customer
====== Creating TempView for table web_site ======
Time taken: 66 millis for table web_site
====== Creating TempView for table store_returns ======
Time taken: 10495 millis for table store_returns
====== Creating TempView for table household_demographics ======
Time taken: 49 millis for table household_demographics
====== Creating TempView for table web_page ======
Time taken: 48 millis for table web_page
====== Creating TempView for table promotion ======
Time taken: 54 millis for table promotion
====== Creating TempView for table catalog_page ======
Time taken: 43 millis for table catalog_page
====== Creating TempView for table inventory ======
Time taken: 1007 millis for table inventory
====== Creating TempView for table catalog_returns ======
Time taken: 6872 millis for table catalog_returns
====== Creating TempView for table web_returns ======
Time taken: 6632 millis for table web_returns
====== Creating TempView for table web_sales ======
Time taken: 5108 millis for table web_sales
====== Creating TempView for table catalog_sales ======
Time taken: 5007 millis for table catalog_sales
====== Creating TempView for table store_sales ======
Time taken: 5015 millis for table store_sales
====== Run query96 ======
TaskFailureListener is registered.
ERROR BEGIN
An error occurred while calling o208.collectToPython.
: java.lang.OutOfMemoryError: Java heap space

File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/PysparkBenchReport.py", line 88, in report_on
fn(*args)
File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/nds_power.py", line 132, in run_one_query
df.collect()
File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1216, in collect
sock_info = self._jdf.collectToPython()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in call
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
^^^^^^^^^^^
File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
ERROR END
Time taken: [18910784] millis for query96
====== Run query7 ======
TaskFailureListener is registered.
ERROR BEGIN
An error occurred while calling o271.collectToPython.
: org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase planning failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace.
at org.apache.spark.SparkException$.internalError(SparkException.scala:88)
at org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:516)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:528)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:175)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:168)
at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:221)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:266)
at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:235)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:112)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4204)
at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:4033)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException
at org.apache.spark.sql.rapids.GpuShuffleEnv$.isRapidsShuffleAvailable(GpuShuffleEnv.scala:118)
at org.apache.spark.sql.rapids.GpuShuffleEnv$.useMultiThreadedShuffle(GpuShuffleEnv.scala:141)
at com.nvidia.spark.rapids.GpuPartitioning.$init$(GpuPartitioning.scala:42)
at com.nvidia.spark.rapids.GpuHashPartitioningBase.(GpuHashPartitioningBase.scala:29)
at com.nvidia.spark.rapids.shims.GpuHashPartitioning.(GpuHashPartitioning.scala:42)
at com.nvidia.spark.rapids.GpuOverrides$$anon$201.convertToGpu(GpuOverrides.scala:3915)
at com.nvidia.spark.rapids.GpuOverrides$$anon$201.convertToGpu(GpuOverrides.scala:3897)
at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertShuffleToGpu(GpuShuffleExchangeExecBase.scala:130)
at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:125)
at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:44)
at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
at com.nvidia.spark.rapids.GpuBaseAggregateMeta.convertToGpu(GpuAggregateExec.scala:1205)
at com.nvidia.spark.rapids.GpuBaseAggregateMeta.convertToGpu(GpuAggregateExec.scala:1023)
at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
at com.nvidia.spark.rapids.shims.Spark340PlusNonDBShims$$anon$2.convertToGpu(Spark340PlusNonDBShims.scala:93)
at com.nvidia.spark.rapids.shims.Spark340PlusNonDBShims$$anon$2.convertToGpu(Spark340PlusNonDBShims.scala:61)
at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
at com.nvidia.spark.rapids.GpuOverrides$.com$nvidia$spark$rapids$GpuOverrides$$doConvertPlan(GpuOverrides.scala:4415)
at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:4760)
at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$3(GpuOverrides.scala:4620)
at com.nvidia.spark.rapids.GpuOverrides$.logDuration(GpuOverrides.scala:454)
at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$1(GpuOverrides.scala:4617)
at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4583)
at com.nvidia.spark.rapids.GpuOverrides.applyWithContext(GpuOverrides.scala:4637)
at com.nvidia.spark.rapids.GpuQueryStagePrepOverrides.$anonfun$apply$1(GpuOverrides.scala:4600)
at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4583)
at com.nvidia.spark.rapids.GpuQueryStagePrepOverrides.apply(GpuOverrides.scala:4603)
at com.nvidia.spark.rapids.GpuQueryStagePrepOverrides.apply(GpuOverrides.scala:4596)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.$anonfun$applyPhysicalRules$2(AdaptiveSparkPlanExec.scala:798)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.applyPhysicalRules(AdaptiveSparkPlanExec.scala:797)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$initialPlan$1(AdaptiveSparkPlanExec.scala:192)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.(AdaptiveSparkPlanExec.scala:191)
at org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan.applyInternal(InsertAdaptiveSparkPlan.scala:66)
at org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan.apply(InsertAdaptiveSparkPlan.scala:44)
at org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan.apply(InsertAdaptiveSparkPlan.scala:41)
at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$1(QueryExecution.scala:457)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:456)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:175)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526)
... 27 more

File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/PysparkBenchReport.py", line 88, in report_on
fn(*args)
File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/nds_power.py", line 132, in run_one_query
df.collect()
File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1216, in collect
sock_info = self._jdf.collectToPython()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in call
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
^^^^^^^^^^^
File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
ERROR END
Time taken: [218] millis for query7

Starting from the error occurred in this query, every subsequent query has been encountering such Spark internal error prints, resulting in query failures. I don't know how to avoid this problem😰. Here is my task configuration:

base.template

export SPARK_MASTER=${SPARK_MASTER:-spark://192.168.110.13:7077}
export DRIVER_MEMORY=${DRIVER_MEMORY:-2G}
export EXECUTOR_CORES=${EXECUTOR_CORES:-4}
export NUM_EXECUTORS=${NUM_EXECUTORS:-1}
export EXECUTOR_MEMORY=${EXECUTOR_MEMORY:-12G}

# The NDS listener jar which is built in jvm_listener directory.
export NDS_LISTENER_JAR=${NDS_LISTENER_JAR:-./jvm_listener/target/nds-benchmark-listener-1.0-SNAPSHOT.jar}
# The spark-rapids jar which is required when running on GPU
export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_PLUGIN_JAR:-rapids-4-spark_2.12-24.02.0.jar}
export PYTHONPATH=$SPARK_HOME/python:`echo $SPARK_HOME/python/lib/py4j-*.zip`

power_run_gpu

source base.template
export CONCURRENT_GPU_TASKS=${CONCURRENT_GPU_TASKS:-2}
export SHUFFLE_PARTITIONS=${SHUFFLE_PARTITIONS:-200}

export SPARK_CONF=("--master" "${SPARK_MASTER}"
                   "--deploy-mode" "client"
                   "--conf" "spark.driver.maxResultSize=2GB"
                   "--conf" "spark.driver.memory=${DRIVER_MEMORY}"
                   "--conf" "spark.executor.cores=${EXECUTOR_CORES}"
                   "--conf" "spark.executor.instances=${NUM_EXECUTORS}"
                   "--conf" "spark.executor.memory=${EXECUTOR_MEMORY}"
                   "--conf" "spark.worker.resource.gpu.amount=1"
                   "--conf" "spark.sql.shuffle.partitions=${SHUFFLE_PARTITIONS}"
                   "--conf" "spark.sql.files.maxPartitionBytes=512b"
                   "--conf" "spark.sql.adaptive.enabled=true"
                   "--conf" "spark.executor.resource.gpu.amount=1"
                   "--conf" "spark.executor.resource.gpu.discoveryScript=/opt/sparkRapidsPlugin/getGpusResources.sh"
                   "--conf" "spark.task.resource.gpu.amount=1"
                   "--conf" "spark.plugins=com.nvidia.spark.SQLPlugin"
                   "--conf" "spark.rapids.memory.host.spillStorageSize=1G"
                   "--conf" "spark.rapids.memory.pinnedPool.size=1g"
                   "--conf" "spark.rapids.sql.concurrentGpuTasks=${CONCURRENT_GPU_TASKS}"
                   "--conf" "spark.dynamicAllocation.enabled=false"
                   "--files" "$SPARK_HOME/examples/src/main/scripts/getGpusResources.sh"
                   "--jars" "$SPARK_RAPIDS_PLUGIN_JAR,$NDS_LISTENER_JAR")`

command

./spark-submit-template power_run_gpu nds_power.py /mnt/parquet_sf100 query_streams/query_0.sql result/power_run/power_run_time_gpu.csv --property_file properties/aqe-on.properties

My master and worker are both located on the same machine, similar to standalone mode. I'm not sure if this is related to the issue. Can you provide me with some suggestions?

From the log, you're using Spark 3.4.2, which 24.02.0 release doesn't support yet.

If you want to test on 3.4.2, you can try the latest 24.04.1 release.
Or you can change to 3.4.1, which is supported by 24.02.0.

From the log, you're using Spark 3.4.2, which 24.02.0 release doesn't support yet.

If you want to test on 3.4.2, you can try the latest 24.04.1 release.

Or you can change to 3.4.1, which is supported by 24.02.0.

I tried switching between different versions of Spark and Rapids, but the tasks couldn't continue running and got different errors. Here's the error message. Group1 Spark3.4.1 + Rapids 24.02.0

====== Creating TempView for table customer_address ======
Time taken: 9287 millis for table customer_address
====== Creating TempView for table customer_demographics ======
Time taken: 102 millis for table customer_demographics
====== Creating TempView for table date_dim ======
Time taken: 93 millis for table date_dim
====== Creating TempView for table warehouse ======
Time taken: 67 millis for table warehouse
====== Creating TempView for table ship_mode ======
Time taken: 65 millis for table ship_mode
====== Creating TempView for table time_dim ======
Time taken: 67 millis for table time_dim
====== Creating TempView for table reason ======
Time taken: 59 millis for table reason
====== Creating TempView for table income_band ======
Time taken: 58 millis for table income_band
====== Creating TempView for table item ======
Time taken: 58 millis for table item
====== Creating TempView for table store ======
Time taken: 60 millis for table store
====== Creating TempView for table call_center ======
Time taken: 64 millis for table call_center
====== Creating TempView for table customer ======
Time taken: 55 millis for table customer
====== Creating TempView for table web_site ======
Time taken: 63 millis for table web_site
====== Creating TempView for table store_returns ======
Time taken: 8517 millis for table store_returns
====== Creating TempView for table household_demographics ======
Time taken: 46 millis for table household_demographics
====== Creating TempView for table web_page ======
Time taken: 45 millis for table web_page
====== Creating TempView for table promotion ======
Time taken: 44 millis for table promotion
====== Creating TempView for table catalog_page ======
Time taken: 42 millis for table catalog_page
====== Creating TempView for table inventory ======
Time taken: 972 millis for table inventory
====== Creating TempView for table catalog_returns ======
Time taken: 6552 millis for table catalog_returns
====== Creating TempView for table web_returns ======
Time taken: 6440 millis for table web_returns
====== Creating TempView for table web_sales ======
Time taken: 4907 millis for table web_sales
====== Creating TempView for table catalog_sales ======
Time taken: 4806 millis for table catalog_sales
====== Creating TempView for table store_sales ======
Time taken: 4447 millis for table store_sales
====== Run query96 ======
Traceback (most recent call last):
  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/nds_power.py", line 396, in <module>
    run_query_stream(args.input_prefix,
  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/nds_power.py", line 260, in run_query_stream
    summary = q_report.report_on(run_one_query,spark_session,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/PysparkBenchReport.py", line 80, in report_on
    listener.register()
  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/python_listener/PythonListener.py", line 43, in register
    self.uuid = manager.register(self)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
  File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.nvidia.spark.rapids.listener.Manager.register.
: java.lang.NoSuchMethodError: scala.collection.immutable.Map$.apply(Lscala/collection/Seq;)Lscala/collection/GenMap;
    at com.nvidia.spark.rapids.listener.Manager$.<init>(Manager.scala:25)
    at com.nvidia.spark.rapids.listener.Manager$.<clinit>(Manager.scala)
    at com.nvidia.spark.rapids.listener.Manager.register(Manager.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.lang.Thread.run(Thread.java:750)

Group2 Spark 3.4.2 + Rapids v24.04

====== Creating TempView for table customer_address ======
Time taken: 9514 millis for table customer_address
====== Creating TempView for table customer_demographics ======
Time taken: 90 millis for table customer_demographics
====== Creating TempView for table date_dim ======
Time taken: 85 millis for table date_dim
====== Creating TempView for table warehouse ======
Time taken: 78 millis for table warehouse
====== Creating TempView for table ship_mode ======
Time taken: 89 millis for table ship_mode
====== Creating TempView for table time_dim ======
Time taken: 64 millis for table time_dim
====== Creating TempView for table reason ======
Time taken: 59 millis for table reason
====== Creating TempView for table income_band ======
Time taken: 64 millis for table income_band
====== Creating TempView for table item ======
Time taken: 64 millis for table item
====== Creating TempView for table store ======
Time taken: 61 millis for table store
====== Creating TempView for table call_center ======
Time taken: 61 millis for table call_center
====== Creating TempView for table customer ======
Time taken: 56 millis for table customer
====== Creating TempView for table web_site ======
Time taken: 60 millis for table web_site
====== Creating TempView for table store_returns ======
Time taken: 8827 millis for table store_returns
====== Creating TempView for table household_demographics ======
Time taken: 45 millis for table household_demographics
====== Creating TempView for table web_page ======
Time taken: 48 millis for table web_page
====== Creating TempView for table promotion ======
Time taken: 46 millis for table promotion
====== Creating TempView for table catalog_page ======
Time taken: 43 millis for table catalog_page
====== Creating TempView for table inventory ======
Time taken: 1009 millis for table inventory
====== Creating TempView for table catalog_returns ======
Time taken: 6840 millis for table catalog_returns
====== Creating TempView for table web_returns ======
Time taken: 6821 millis for table web_returns
====== Creating TempView for table web_sales ======
Time taken: 5070 millis for table web_sales
====== Creating TempView for table catalog_sales ======
Time taken: 4982 millis for table catalog_sales
====== Creating TempView for table store_sales ======
Time taken: 4947 millis for table store_sales
====== Run query96 ======
TaskFailureListener is registered.
ERROR BEGIN
An error occurred while calling o208.collectToPython.
: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.lang.reflect.Array.newInstance(Array.java:75)
    at scala.reflect.ClassTag$GenericClassTag.newArray(ClassTag.scala:171)
    at scala.collection.mutable.ArrayBuilder$ofRef.mkArray(ArrayBuilder.scala:74)
    at scala.collection.mutable.ArrayBuilder$ofRef.resize(ArrayBuilder.scala:79)
    at scala.collection.mutable.ArrayBuilder$ofRef.sizeHint(ArrayBuilder.scala:84)
    at scala.collection.mutable.Builder.sizeHint(Builder.scala:82)
    at scala.collection.mutable.Builder.sizeHint$(Builder.scala:79)
    at scala.collection.mutable.ArrayBuilder.sizeHint(ArrayBuilder.scala:26)
    at scala.collection.TraversableLike.builder$1(TraversableLike.scala:282)
    at scala.collection.TraversableLike.map(TraversableLike.scala:285)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
    at org.apache.spark.sql.execution.PartitionedFileUtil$.getBlockHosts(PartitionedFileUtil.scala:69)
    at org.apache.spark.sql.execution.PartitionedFileUtil$.$anonfun$splitFiles$1(PartitionedFileUtil.scala:39)
    at org.apache.spark.sql.execution.PartitionedFileUtil$.$anonfun$splitFiles$1$adapted(PartitionedFileUtil.scala:36)
    at org.apache.spark.sql.execution.PartitionedFileUtil$$$Lambda$3919/38916316.apply(Unknown Source)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.TraversableLike$$Lambda$64/93199773.apply(Unknown Source)
    at scala.collection.immutable.NumericRange.foreach(NumericRange.scala:75)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at org.apache.spark.sql.execution.PartitionedFileUtil$.splitFiles(PartitionedFileUtil.scala:36)
    at org.apache.spark.sql.execution.rapids.shims.FilePartitionShims$.$anonfun$splitFiles$5(FilePartitionShims.scala:107)
    at org.apache.spark.sql.execution.rapids.shims.FilePartitionShims$$$Lambda$3915/154373703.apply(Unknown Source)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
    at scala.collection.TraversableLike$$Lambda$224/1676605578.apply(Unknown Source)
    at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
    at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)

  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/PysparkBenchReport.py", line 88, in report_on
    fn(*args)
  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/nds_power.py", line 132, in run_one_query
    df.collect()
  File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1216, in collect
    sock_info = self._jdf.collectToPython()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
ERROR END
Time taken: [243133] millis for query96
====== Run query7 ======
TaskFailureListener is registered.
ERROR BEGIN
An error occurred while calling o271.collectToPython.
: java.lang.OutOfMemoryError: GC overhead limit exceeded

  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/PysparkBenchReport.py", line 88, in report_on
    fn(*args)
  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/nds_power.py", line 132, in run_one_query
    df.collect()
  File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1216, in collect
    sock_info = self._jdf.collectToPython()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
ERROR END
Time taken: [8756560] millis for query7
====== Run query75 ======
TaskFailureListener is registered.
ERROR BEGIN
An error occurred while calling o334.collectToPython.
: org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase planning failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace.
    at org.apache.spark.SparkException$.internalError(SparkException.scala:88)
    at org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:516)
    at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:528)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:175)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:168)
    at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:221)
    at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:266)
    at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:235)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:112)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4204)
    at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:4033)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException
    at org.apache.spark.sql.rapids.GpuShuffleEnv$.isRapidsShuffleAvailable(GpuShuffleEnv.scala:118)
    at org.apache.spark.sql.rapids.GpuShuffleEnv$.useMultiThreadedShuffle(GpuShuffleEnv.scala:141)
    at com.nvidia.spark.rapids.GpuPartitioning.$init$(GpuPartitioning.scala:42)
    at com.nvidia.spark.rapids.GpuHashPartitioningBase.<init>(GpuHashPartitioningBase.scala:29)
    at com.nvidia.spark.rapids.shims.GpuHashPartitioning.<init>(GpuHashPartitioning.scala:42)
    at com.nvidia.spark.rapids.GpuOverrides$$anon$200.convertToGpu(GpuOverrides.scala:3929)
    at com.nvidia.spark.rapids.GpuOverrides$$anon$200.convertToGpu(GpuOverrides.scala:3911)
    at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertShuffleToGpu(GpuShuffleExchangeExecBase.scala:130)
    at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:125)
    at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:44)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:835)
    at com.nvidia.spark.rapids.GpuSortMergeJoinMeta.$anonfun$convertToGpu$3(GpuSortMergeJoinMeta.scala:84)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at com.nvidia.spark.rapids.GpuSortMergeJoinMeta.convertToGpu(GpuSortMergeJoinMeta.scala:84)
    at com.nvidia.spark.rapids.GpuSortMergeJoinMeta.convertToGpu(GpuSortMergeJoinMeta.scala:24)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.GpuProjectExecMeta.convertToGpu(basicPhysicalOperators.scala:54)
    at com.nvidia.spark.rapids.GpuProjectExecMeta.convertToGpu(basicPhysicalOperators.scala:45)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:123)
    at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:44)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:835)
    at com.nvidia.spark.rapids.GpuSortMergeJoinMeta.$anonfun$convertToGpu$3(GpuSortMergeJoinMeta.scala:84)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at com.nvidia.spark.rapids.GpuSortMergeJoinMeta.convertToGpu(GpuSortMergeJoinMeta.scala:84)
    at com.nvidia.spark.rapids.GpuSortMergeJoinMeta.convertToGpu(GpuSortMergeJoinMeta.scala:24)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.GpuProjectExecMeta.convertToGpu(basicPhysicalOperators.scala:54)
    at com.nvidia.spark.rapids.GpuProjectExecMeta.convertToGpu(basicPhysicalOperators.scala:45)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.GpuOverrides$$anon$211.$anonfun$convertToGpu$30(GpuOverrides.scala:4187)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at com.nvidia.spark.rapids.GpuOverrides$$anon$211.convertToGpu(GpuOverrides.scala:4187)
    at com.nvidia.spark.rapids.GpuOverrides$$anon$211.convertToGpu(GpuOverrides.scala:4185)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.GpuBaseAggregateMeta.convertToGpu(GpuAggregateExec.scala:1205)
    at com.nvidia.spark.rapids.GpuBaseAggregateMeta.convertToGpu(GpuAggregateExec.scala:1023)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:123)
    at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:44)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.GpuBaseAggregateMeta.convertToGpu(GpuAggregateExec.scala:1205)
    at com.nvidia.spark.rapids.GpuBaseAggregateMeta.convertToGpu(GpuAggregateExec.scala:1023)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.GpuBaseAggregateMeta.convertToGpu(GpuAggregateExec.scala:1205)
    at com.nvidia.spark.rapids.GpuBaseAggregateMeta.convertToGpu(GpuAggregateExec.scala:1023)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:123)
    at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:44)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.GpuBaseAggregateMeta.convertToGpu(GpuAggregateExec.scala:1205)
    at com.nvidia.spark.rapids.GpuBaseAggregateMeta.convertToGpu(GpuAggregateExec.scala:1023)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.GpuOverrides$$anon$210.convertToGpu(GpuOverrides.scala:4162)
    at com.nvidia.spark.rapids.GpuOverrides$$anon$210.convertToGpu(GpuOverrides.scala:4159)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:123)
    at org.apache.spark.sql.rapids.execution.GpuShuffleMetaBase.convertToGpu(GpuShuffleExchangeExecBase.scala:44)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:835)
    at com.nvidia.spark.rapids.GpuSortMergeJoinMeta.$anonfun$convertToGpu$3(GpuSortMergeJoinMeta.scala:84)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at com.nvidia.spark.rapids.GpuSortMergeJoinMeta.convertToGpu(GpuSortMergeJoinMeta.scala:84)
    at com.nvidia.spark.rapids.GpuSortMergeJoinMeta.convertToGpu(GpuSortMergeJoinMeta.scala:24)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.GpuProjectExecMeta.convertToGpu(basicPhysicalOperators.scala:54)
    at com.nvidia.spark.rapids.GpuProjectExecMeta.convertToGpu(basicPhysicalOperators.scala:45)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.shims.Spark340PlusNonDBShims$$anon$2.convertToGpu(Spark340PlusNonDBShims.scala:93)
    at com.nvidia.spark.rapids.shims.Spark340PlusNonDBShims$$anon$2.convertToGpu(Spark340PlusNonDBShims.scala:61)
    at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
    at com.nvidia.spark.rapids.GpuOverrides$.com$nvidia$spark$rapids$GpuOverrides$$doConvertPlan(GpuOverrides.scala:4429)
    at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:4774)
    at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$3(GpuOverrides.scala:4634)
    at com.nvidia.spark.rapids.GpuOverrides$.logDuration(GpuOverrides.scala:455)
    at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$1(GpuOverrides.scala:4631)
    at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4597)
    at com.nvidia.spark.rapids.GpuOverrides.applyWithContext(GpuOverrides.scala:4651)
    at com.nvidia.spark.rapids.GpuQueryStagePrepOverrides.$anonfun$apply$1(GpuOverrides.scala:4614)
    at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4597)
    at com.nvidia.spark.rapids.GpuQueryStagePrepOverrides.apply(GpuOverrides.scala:4617)
    at com.nvidia.spark.rapids.GpuQueryStagePrepOverrides.apply(GpuOverrides.scala:4610)
    at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.$anonfun$applyPhysicalRules$2(AdaptiveSparkPlanExec.scala:798)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:91)
    at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.applyPhysicalRules(AdaptiveSparkPlanExec.scala:797)
    at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$initialPlan$1(AdaptiveSparkPlanExec.scala:192)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
    at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.<init>(AdaptiveSparkPlanExec.scala:191)
    at org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan.applyInternal(InsertAdaptiveSparkPlan.scala:66)
    at org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan.apply(InsertAdaptiveSparkPlan.scala:44)
    at org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan.apply(InsertAdaptiveSparkPlan.scala:41)
    at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$1(QueryExecution.scala:457)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:91)
    at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:456)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:175)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202)
    at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526)
    ... 27 more

  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/PysparkBenchReport.py", line 88, in report_on
    fn(*args)
  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/nds_power.py", line 132, in run_one_query
    df.collect()
  File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1216, in collect
    sock_info = self._jdf.collectToPython()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
ERROR END

In this group, the tasks ran for over two hours before encountering errors and quitting. I noticed there were two types of errors present:

GC overhead limit exceeded
[INTERNAL_ERROR] The Spark SQL phase planning failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace.

I also tried different version combinations (Spark 3.4.1 & Spark 3.4.2 & Rapids 24.02 & Rapids 24.04), and they encountered the errors mentioned in group1 or group2😓. Does this relate to my parameter configuration, or is it another issue?

Also, I'm not sure if this should be included in this ticket, but there's a minor issue: running the example command for Data Maintenance directly encounters an error, because nds_maintenance.py doesn't support--data_format orc... And when I run data_maintenance test, I hit this issue:

Traceback (most recent call last):
  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/nds_maintenance.py", line 313, in <module>
    register_temp_views(spark_session, args.refresh_data_path)
  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/nds_maintenance.py", line 274, in register_temp_views
    "header", "false").csv(refresh_data_path + '/' + table, schema=schema).createOrReplaceTempView(table)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 727, in csv
  File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 175, in deco
pyspark.errors.exceptions.captured.AnalysisException: [PATH_NOT_FOUND] Path does not exist: file:/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/data_maintenance/s_purchase_lineitem.

I knew there were only some sqls in this folder, so the issue made me confused.

I just ran a quick NDS with similar configurations locally with Spark 3.4.2 and Spark-Rapids v24.04.0.

one little problem as below, but I don't think it's the blocking ones. "--conf" "spark.sql.files.maxPartitionBytes=512b" 512 bytes? I changed to 512m

source base.template
export CONCURRENT_GPU_TASKS=${CONCURRENT_GPU_TASKS:-2}
export SHUFFLE_PARTITIONS=${SHUFFLE_PARTITIONS:-200}

export SPARK_CONF=("--master" "${SPARK_MASTER}"
                   "--deploy-mode" "client"
                   "--conf" "spark.driver.maxResultSize=2GB"
                   "--conf" "spark.driver.memory=${DRIVER_MEMORY}"
                   "--conf" "spark.executor.cores=${EXECUTOR_CORES}"
                   "--conf" "spark.executor.instances=${NUM_EXECUTORS}"
                   "--conf" "spark.executor.memory=${EXECUTOR_MEMORY}"
                   "--conf" "spark.sql.shuffle.partitions=${SHUFFLE_PARTITIONS}"
                   "--conf" "spark.sql.files.maxPartitionBytes=512m"
                   "--conf" "spark.sql.adaptive.enabled=true"
                   "--conf" "spark.executor.resource.gpu.amount=1"
                   "--conf" "spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh"
                   "--conf" "spark.task.resource.gpu.amount=1"
                   "--conf" "spark.plugins=com.nvidia.spark.SQLPlugin"
                   "--conf" "spark.rapids.memory.host.spillStorageSize=1G"
                   "--conf" "spark.rapids.memory.pinnedPool.size=1g"
                   "--conf" "spark.rapids.sql.concurrentGpuTasks=${CONCURRENT_GPU_TASKS}"
                   "--conf" "spark.dynamicAllocation.enabled=false"
                   "--files" "$SPARK_HOME/examples/src/main/scripts/getGpusResources.sh"
                   "--jars" "$SPARK_RAPIDS_PLUGIN_JAR,$NDS_LISTENER_JAR")

Can you share more detailed steps you did? Just to confirm, Is your Spark downloaded from Apache Spark page and not a customized version? And Spark-Rapids jar is downloaded from maven central? (Something weird, your 24.02.0 contains a shim layer for Spark 3.4.2, which we didn't release.)

I just ran a quick NDS with similar configurations locally with Spark 3.4.2 and Spark-Rapids v24.04.0.

one little problem as below, but I don't think it's the blocking ones. "--conf" "spark.sql.files.maxPartitionBytes=512b" 512 bytes? I changed to 512m

source base.template
export CONCURRENT_GPU_TASKS=${CONCURRENT_GPU_TASKS:-2}
export SHUFFLE_PARTITIONS=${SHUFFLE_PARTITIONS:-200}

export SPARK_CONF=("--master" "${SPARK_MASTER}"
                   "--deploy-mode" "client"
                   "--conf" "spark.driver.maxResultSize=2GB"
                   "--conf" "spark.driver.memory=${DRIVER_MEMORY}"
                   "--conf" "spark.executor.cores=${EXECUTOR_CORES}"
                   "--conf" "spark.executor.instances=${NUM_EXECUTORS}"
                   "--conf" "spark.executor.memory=${EXECUTOR_MEMORY}"
                   "--conf" "spark.sql.shuffle.partitions=${SHUFFLE_PARTITIONS}"
                   "--conf" "spark.sql.files.maxPartitionBytes=512m"
                   "--conf" "spark.sql.adaptive.enabled=true"
                   "--conf" "spark.executor.resource.gpu.amount=1"
                   "--conf" "spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh"
                   "--conf" "spark.task.resource.gpu.amount=1"
                   "--conf" "spark.plugins=com.nvidia.spark.SQLPlugin"
                   "--conf" "spark.rapids.memory.host.spillStorageSize=1G"
                   "--conf" "spark.rapids.memory.pinnedPool.size=1g"
                   "--conf" "spark.rapids.sql.concurrentGpuTasks=${CONCURRENT_GPU_TASKS}"
                   "--conf" "spark.dynamicAllocation.enabled=false"
                   "--files" "$SPARK_HOME/examples/src/main/scripts/getGpusResources.sh"
                   "--jars" "$SPARK_RAPIDS_PLUGIN_JAR,$NDS_LISTENER_JAR")

Hi @GaryShen2008 , My spark is download from https://spark.apache.org/downloads.html , and rapids jar is from https://nvidia.github.io/spark-rapids/docs/download.html#release-v24040.

Here are some of my test steps:

generate data with nds_gen_data.py: python nds_gen_data.py local 100 100 /data/raw_sf100 --overwrite_output

then, convert data to parquet

./spark-submit-template convert_submit_gpu.template \
nds_transcode.py  /data/raw_sf100  parquet_sf3k report.txt

generate query streams and build nds-benchmark-listener-1.0-SNAPSHOT.jar

Last, I use this command below to setup power_run test

./spark-submit-template power_run_gpu.template \
nds_power.py \
parquet_sf3k \
./nds_query_streams/query_0.sql \
time.csv \
--property_file properties/aqe-on.properties

This is my confiugation: Base.template


export SPARK_MASTER=${SPARK_MASTER:-spark://192.168.110.13:7077}
export DRIVER_MEMORY=${DRIVER_MEMORY:-2G}
export EXECUTOR_CORES=${EXECUTOR_CORES:-4}
export NUM_EXECUTORS=${NUM_EXECUTORS:-1}
export EXECUTOR_MEMORY=${EXECUTOR_MEMORY:-12G}

The NDS listener jar which is built in jvm_listener directory.

export NDS_LISTENER_JAR=${NDS_LISTENER_JAR:-./jvm_listener/target/nds-benchmark-listener-1.0-SNAPSHOT.jar}

The spark-rapids jar which is required when running on GPU

export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_PLUGIN_JAR:-rapids-4-spark_2.12-24.04.0.jar} export PYTHONPATH=$SPARK_HOME/python:echo $SPARK_HOME/python/lib/py4j-*.zip

**power_run_gpu**

source base.template export CONCURRENT_GPU_TASKS=${CONCURRENT_GPU_TASKS:-2} export SHUFFLE_PARTITIONS=${SHUFFLE_PARTITIONS:-200}

export SPARK_CONF=("--master" "${SPARK_MASTER}" "--deploy-mode" "client" "--conf" "spark.driver.maxResultSize=2GB" "--conf" "spark.driver.memory=${DRIVER_MEMORY}" "--conf" "spark.executor.cores=${EXECUTOR_CORES}" "--conf" "spark.executor.instances=${NUM_EXECUTORS}" "--conf" "spark.executor.memory=${EXECUTOR_MEMORY}" "--conf" "spark.worker.resource.gpu.amount=1" "--conf" "spark.sql.shuffle.partitions=${SHUFFLE_PARTITIONS}" "--conf" "spark.sql.files.maxPartitionBytes=512b" "--conf" "spark.sql.adaptive.enabled=true" "--conf" "spark.executor.resource.gpu.amount=1" "--conf" "spark.executor.resource.gpu.discoveryScript=/opt/sparkRapidsPlugin/getGpusResources.sh" "--conf" "spark.task.resource.gpu.amount=1" "--conf" "spark.plugins=com.nvidia.spark.SQLPlugin" "--conf" "spark.rapids.memory.host.spillStorageSize=1G" "--conf" "spark.rapids.memory.pinnedPool.size=1g" "--conf" "spark.rapids.sql.concurrentGpuTasks=${CONCURRENT_GPU_TASKS}" "--conf" "spark.dynamicAllocation.enabled=false" "--files" "$SPARK_HOME/examples/src/main/scripts/getGpusResources.sh" "--jars" "$SPARK_RAPIDS_PLUGIN_JAR,$NDS_LISTENER_JAR")

**Spark.env.sh**

export SPARK_EXECUTOR_CORES=4 export SPARK_DRIVER_MEMORY=2G export SPARK_EXECUTOR_MEMORY=16G export SPARK_WORKER_CORES=4 export SPARK_WORKER_MERMORY=4G export HADOOP_CONF_DIR=/home/tcn/chasen/gpu/hadoop/etc/hadoop/ export YARN_CONF_DIR=/home/tcn/chasen/gpu/hadoop/etc/hadoop/ SPARK_WORKER_OPTS="-Dspark.worker.resource.gpu.amount=1 -Dspark.worker.resource.gpu.discoveryScript=/opt/sparkRapidsPlugin/getGpusResources.sh"



I'll adjust some parameters to verify. And attempt to download Rapids from maven  later.

I just ran a quick NDS with similar configurations locally with Spark 3.4.2 and Spark-Rapids v24.04.0.

one little problem as below, but I don't think it's the blocking ones. "--conf" "spark.sql.files.maxPartitionBytes=512b" 512 bytes? I changed to 512m

source base.template
export CONCURRENT_GPU_TASKS=${CONCURRENT_GPU_TASKS:-2}
export SHUFFLE_PARTITIONS=${SHUFFLE_PARTITIONS:-200}

export SPARK_CONF=("--master" "${SPARK_MASTER}"
                   "--deploy-mode" "client"
                   "--conf" "spark.driver.maxResultSize=2GB"
                   "--conf" "spark.driver.memory=${DRIVER_MEMORY}"
                   "--conf" "spark.executor.cores=${EXECUTOR_CORES}"
                   "--conf" "spark.executor.instances=${NUM_EXECUTORS}"
                   "--conf" "spark.executor.memory=${EXECUTOR_MEMORY}"
                   "--conf" "spark.sql.shuffle.partitions=${SHUFFLE_PARTITIONS}"
                   "--conf" "spark.sql.files.maxPartitionBytes=512m"
                   "--conf" "spark.sql.adaptive.enabled=true"
                   "--conf" "spark.executor.resource.gpu.amount=1"
                   "--conf" "spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh"
                   "--conf" "spark.task.resource.gpu.amount=1"
                   "--conf" "spark.plugins=com.nvidia.spark.SQLPlugin"
                   "--conf" "spark.rapids.memory.host.spillStorageSize=1G"
                   "--conf" "spark.rapids.memory.pinnedPool.size=1g"
                   "--conf" "spark.rapids.sql.concurrentGpuTasks=${CONCURRENT_GPU_TASKS}"
                   "--conf" "spark.dynamicAllocation.enabled=false"
                   "--files" "$SPARK_HOME/examples/src/main/scripts/getGpusResources.sh"
                   "--jars" "$SPARK_RAPIDS_PLUGIN_JAR,$NDS_LISTENER_JAR")

Hi @GaryShen2008 ,

I used to run my work on a workstation before and met this issue, now, I run on a server with the same spark configuration, power_run and throughput_run can both run normally. Emm..., I guess the problem may be caused by insufficient resources on my workstation.

So I think the issue may be solved, thanks for your help.🤗

Also, I'm not sure if this should be included in this ticket, but there's a minor issue I found: running the example command for Data Maintenance directly encounters an error, because nds_maintenance.py doesn't support --data_format orc... And when I run data_maintenance test, I hit this issue:

Traceback (most recent call last):
  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/nds_maintenance.py", line 313, in <module>
    register_temp_views(spark_session, args.refresh_data_path)
  File "/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/nds_maintenance.py", line 274, in register_temp_views
    "header", "false").csv(refresh_data_path + '/' + table, schema=schema).createOrReplaceTempView(table)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 727, in csv
  File "/home/tcn/chasen/gpu/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/home/tcn/chasen/gpu/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 175, in deco
pyspark.errors.exceptions.captured.AnalysisException: [PATH_NOT_FOUND] Path does not exist: file:/home/tcn/chasen/gpu/spark-rapids-benchmarks/nds/data_maintenance/s_purchase_lineitem.

I know there were only some sqls in data_maintenance folder, not the s_purchase_lineitem, I have no idea about this issue😓.

Hi @chasen918 ,

The Data Maintenance will add/remove data to existing tables. The data used to do this needs to be generated by the Data Generation step by specifying --update. Can you please check if the data is already there or not?

Hi @chasen918, Good to know the original issue has been solved on the new server. For data maintenance, according to above Alllen's answer, can you try again? If there's still a problem, you can file another issue. And I'll close this issue if you're ok.

Hi @wjxiz1992 @GaryShen2008,

I have not generated new update data for the main_maintenance at present. So, I will try this step again at another time. And yes, you can close this ticket.

Thanks for everyone's help🤗.

NVIDIA / spark-rapids-benchmarks

[QST] Cannot run on GPU because GpuCSVScan only supports UTF8 encoded data #185

The NDS listener jar which is built in jvm_listener directory.

The spark-rapids jar which is required when running on GPU