NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
757 stars 223 forks source link

GPU OutOfMemory while DISTINCT a partitionedBy column on DeltaTable ? #11064

Open LIN-Yu-Ting opened 2 weeks ago

LIN-Yu-Ting commented 2 weeks ago

We are using Spark Rapids + Spark Thrift Server to serve SQL request on Spark 3.3.0 and Rapids 23.10 and we have a Delta Table partitioned by a column, runName.

We executed a SQL query SELECT DISTINCT runName FROM table on the mentioned table with which we got the following error messages saying that GPU is out of memory.

24/06/11 04:14:23.084 [task-result-getter-2] WARN  o.a.spark.scheduler.TaskSetManager - Lost task 38.0 in stage 23.0 (TID 571) (10.0.0.12 executor 2): java.lang.OutOfMemoryError: Could not allocate native memory: std::bad_alloc: out_of_memory: RMM failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni-release-11-cuda12/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/mr/device/limiting_resource_adaptor.hpp:144: Exceeded memory limit
    at ai.rapids.cudf.ColumnVector.fromScalar(Native Method)
    at ai.rapids.cudf.ColumnVector.fromScalar(ColumnVector.java:430)
    at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues$.$anonfun$buildPartitionColumns$1(ColumnarPartitionReaderWithPartitionValues.scala:107)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
    at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues$.buildPartitionColumns(ColumnarPartitionReaderWithPartitionValues.scala:105)
    at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues$.$anonfun$addPartitionValues$1(ColumnarPartitionReaderWithPartitionValues.scala:79)
    at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
    at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues$.addPartitionValues(ColumnarPartitionReaderWithPartitionValues.scala:78)
    at com.nvidia.spark.rapids.MultiFileReaderUtils$.$anonfun$addSinglePartitionValuesAndClose$2(GpuMultiFileReader.scala:245)
    at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:56)
    at com.nvidia.spark.rapids.MultiFileReaderUtils$.addSinglePartitionValuesAndClose(GpuMultiFileReader.scala:243)
    at com.nvidia.spark.rapids.MultiFileReaderFunctions.addPartitionValues(GpuMultiFileReader.scala:119)
    at com.nvidia.spark.rapids.MultiFileReaderFunctions.addPartitionValues$(GpuMultiFileReader.scala:114)
    at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader.addPartitionValues(GpuParquetScan.scala:2086)
    at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader.readBatches(GpuParquetScan.scala:2539)
    at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.liftedTree1$1(GpuMultiFileReader.scala:572)
    at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.readBuffersToBatch(GpuMultiFileReader.scala:571)
    at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1(GpuMultiFileReader.scala:764)
    at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1$adapted(GpuMultiFileReader.scala:719)
    at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
    at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.next(GpuMultiFileReader.scala:719)
    at com.nvidia.spark.rapids.PartitionIterator.hasNext(dataSourceUtil.scala:29)
    at com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(dataSourceUtil.scala:46)
    at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1(GpuDataSourceRDD.scala:66)
    at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(GpuDataSourceRDD.scala:66)
    at scala.Option.exists(Option.scala:376)
    at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:66)
    at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.advanceToNextIter(GpuDataSourceRDD.scala:90)
    at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:66)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:474)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$hasNext$4(aggregate.scala:1922)
    at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
    at scala.Option.getOrElse(Option.scala:189)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.hasNext(aggregate.scala:1922)
    at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:332)
    at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:136)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
24/06/11 04:14:48.097 [task-result-getter-0] WARN  o.a.spark.scheduler.TaskSetManager - Lost task 3.2 in stage 23.0 (TID 580) (10.0.0.12 executor 0): com.nvidia.spark.rapids.jni.RetryOOM: GPU OutOfMemory
    at ai.rapids.cudf.Table.contiguousSplit(Native Method)
    at ai.rapids.cudf.Table.contiguousSplit(Table.java:2298)
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.$anonfun$splitSpillableInHalfByRows$4(RmmRapidsRetryIterator.scala:634)
    at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.$anonfun$splitSpillableInHalfByRows$3(RmmRapidsRetryIterator.scala:632)
    at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.$anonfun$splitSpillableInHalfByRows$2(RmmRapidsRetryIterator.scala:631)
    at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.$anonfun$splitSpillableInHalfByRows$1(RmmRapidsRetryIterator.scala:625)
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$AutoCloseableAttemptSpliterator.split(RmmRapidsRetryIterator.scala:442)
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:557)
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:495)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:496)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
    at com.nvidia.spark.rapids.GpuMergeAggregateIterator.aggregateInputBatches(aggregate.scala:795)
    at com.nvidia.spark.rapids.GpuMergeAggregateIterator.$anonfun$next$2(aggregate.scala:752)
    at scala.Option.getOrElse(Option.scala:189)
    at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(aggregate.scala:749)
    at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(aggregate.scala:711)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$next$6(aggregate.scala:2034)
    at scala.Option.map(Option.scala:230)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(aggregate.scala:2034)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(aggregate.scala:1898)
    at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:333)
    at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:136)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
24/06/11 04:14:36.169 [dispatcher-CoarseGrainedScheduler] INFO  o.a.spark.scheduler.TaskSetManager - Starting task 99.1 in stage 22.0 (TID 574) (10.0.0.12, executor 0, partition 99, PROCESS_LOCAL, 8406 bytes) taskResourceAssignments Map(gpu -> [name: gpu, addresses: 0])
24/06/11 04:14:36.172 [task-result-getter-3] WARN  o.a.spark.scheduler.TaskSetManager - Lost task 98.0 in stage 22.0 (TID 526) (10.0.0.12 executor 0): com.nvidia.spark.rapids.jni.SplitAndRetryOOM: GPU OutOfMemory: could not split inputs and retry
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$AutoCloseableAttemptSpliterator.split(RmmRapidsRetryIterator.scala:439)
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:557)
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:495)
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.drainSingleWithVerification(RmmRapidsRetryIterator.scala:287)
    at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRetryNoSplit(RmmRapidsRetryIterator.scala:132)
    at com.nvidia.spark.rapids.CachedGpuBatchIterator.next(GpuDataProducer.scala:125)
    at com.nvidia.spark.rapids.CachedGpuBatchIterator.next(GpuDataProducer.scala:116)
    at com.nvidia.spark.rapids.GpuColumnarBatchWithPartitionValuesIterator.next(GpuColumnarBatchIterator.scala:116)
    at com.nvidia.spark.rapids.GpuColumnarBatchWithPartitionValuesIterator.next(GpuColumnarBatchIterator.scala:101)
    at scala.collection.TraversableOnce$FlattenOps$$anon$2.next(TraversableOnce.scala:522)
    at com.nvidia.spark.rapids.FilePartitionReaderBase.get(GpuMultiFileReader.scala:392)
    at com.nvidia.spark.rapids.FilePartitionReaderBase.get(GpuMultiFileReader.scala:383)
    at com.nvidia.spark.rapids.PartitionIterator.next(dataSourceUtil.scala:39)
    at com.nvidia.spark.rapids.MetricsBatchIterator.next(dataSourceUtil.scala:49)
    at com.nvidia.spark.rapids.MetricsBatchIterator.next(dataSourceUtil.scala:43)
    at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.next(GpuDataSourceRDD.scala:70)
    at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
    at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.next(GpuFileSourceScanExec.scala:480)
    at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.next(GpuFileSourceScanExec.scala:469)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
    at com.nvidia.spark.rapids.AbstractProjectSplitIterator.next(basicPhysicalOperators.scala:248)
    at com.nvidia.spark.rapids.AbstractProjectSplitIterator.next(basicPhysicalOperators.scala:228)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
    at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at com.nvidia.spark.rapids.GpuMergeAggregateIterator.$anonfun$next$2(aggregate.scala:751)
    at scala.Option.getOrElse(Option.scala:189)
    at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(aggregate.scala:749)
    at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(aggregate.scala:711)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$next$6(aggregate.scala:2034)
    at scala.Option.map(Option.scala:230)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(aggregate.scala:2034)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(aggregate.scala:1898)
    at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:333)
    at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:136)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

However, this error does not pop out if we executed SELECT DISTINCT other columns FROM table.

In addition, this error happens if we are using Azure NCas64_T4_v3 server with 64 CPU cores and 4 x T4 GPUs. However, error does not happens if we use 2 x NCas8_T4_v3 cluster with each 8 CPU cores and 1 x T4 GPUs.

It seems quite intuitively that it is caused by large size of partition files (around 100MB each). However, this error does not show if we distinct other columns, which makes our assumption not that reasonable. Moreover, as we are distincting a partitionedBy column, normally, Spark does not need to load parquets to figure out which are the values of runName. Instead, it only need to read directory name to get the value. This should not make GPU out of memory.

Any ideas ?

revans2 commented 2 weeks ago

@LIN-Yu-Ting Generally we treat GPU OutOfMemory errors as bugs that need to be fixed. There are a few cases where an algorithm cannot be split up into smaller pieces and we cannot fix the issue, but most of the time running out of memory is something that we can work around with spilling to either host memory or disk.

If you want to file three separate bugs, one for each of the stack traces, I am happy to try and fix them. If you don't want to just let me know and I'll file something myself.

In the short term you have a few choices to try and mitigate the problem. The goal would be to try and reduce the memory pressure on the GPU. There are several different config settings you can try to help reduce the memory pressure on the GPU.

You could try and set spark.sql.files.maxPartitionBytes smaller. This should reduce the amount of compressed data each task gets, and would reduce the total number of rows. This does not always work because it cannot split up a single parquet row group.

You could also try to reduce the number of concurrent tasks on the GPU by setting spark.rapids.sql.concurrentGpuTasks to 1. By default we run with 2 tasks on the GPU to try and keep the GPU as busy as possible, but it does increase the amount of data on the GPU.

You could also try and set the spark.rapids.sql.batchSizeBytes config to be smaller. By default we target each task to process about 1 GiB of data, and use up to 3 more GiB of scratch space. But this is a soft limit and we might use more if the algorithm needs it. If we set the target smaller, then it is likely that the overall memory pressure will be less.

Please note that all of these are also likely to reduce the performance of your query. Also I am just guessing based on the stack traces. The first and third stack traces look to be running out of memory when trying to read data in from a file.

The second stack trace looks like you are doing a round robin partitioning as a part of your job and the sort that is implicit in that operation ran out of memory. That one is harder because I don't know what task it is associated with, or what. Sort tends to be fairly well behaved with spilling, but if you can help us with a repro case that would let us try and figure out exactly what is happening.

LIN-Yu-Ting commented 2 weeks ago

@revans2. Thanks in advance for your responsive comments. I do not mind that you create these three exception stack trace into other issues. If you can link them to this original one then it will be easier to trace.

About your three quick config changes, I have tried to adjust spark.rapids.sql.concurrentGpuTasks to 1 and it does not seem to help. The other two config improvement I will try later.

And about repro case, I will have a look on how to share data or other possibilities.

LIN-Yu-Ting commented 1 week ago

@revans2 Let me precise our DeltaTable:

+-------+-------------------+------+--------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-----------------------------------------------------------------------------+------------+-----------------------------------+
|version|timestamp          |userId|userName|operation              |operationParameters                                                                                                                                            |job |notebook|clusterId|readVersion|isolationLevel|isBlindAppend|operationMetrics                                                             |userMetadata|engineInfo                         |
+-------+-------------------+------+--------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-----------------------------------------------------------------------------+------------+-----------------------------------+
|25     |2024-06-11 12:22:16|null  |null    |REPLACE TABLE AS SELECT|{isManaged -> false, description -> null, partitionBy -> [], properties -> {}}                                                                                 |null|null    |null     |24         |Serializable  |false        |{numFiles -> 127, numOutputRows -> 9572955254, numOutputBytes -> 13631153367}|null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|
|24     |2024-06-11 09:23:06|null  |null    |REPLACE TABLE AS SELECT|{isManaged -> false, description -> null, partitionBy -> ["runName"], properties -> {"delta.columnMapping.mode":"name","delta.columnMapping.maxColumnId":"10"}}|null|null    |null     |23         |Serializable  |false        |{numFiles -> 215, numOutputRows -> 9572955254, numOutputBytes -> 12304450868}|null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|
|23     |2024-06-02 12:47:30|null  |null    |REPLACE TABLE AS SELECT|{isManaged -> false, description -> null, partitionBy -> ["runName"], properties -> {}}                                                                        |null|null    |null     |22         |Serializable  |false        |{numFiles -> 280, numOutputRows -> 9572955254, numOutputBytes -> 9649503160} |null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|
|22     |2024-05-14 11:20:34|null  |null    |WRITE                  |{mode -> Append, partitionBy -> []}                                                                                                                            |null|null    |null     |21         |Serializable  |true         |{numFiles -> 200, numOutputRows -> 549120214, numOutputBytes -> 290521902}   |null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|
|21     |2024-05-14 09:53:39|null  |null    |WRITE                  |{mode -> Append, partitionBy -> []}                                                                                                                            |null|null    |null     |20         |Serializable  |true         |{numFiles -> 200, numOutputRows -> 551313299, numOutputBytes -> 459494733}   |null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|

Number of Row 9,572,955,254 version 22 -> no partitionBy around 4000 files. version 23 -> from 22. 280 files. We executed REPLACE TABLE USING DELTA PARTITIONED BY (runName) AS SELECT * FROM Table. version 24 -> from 23. 215 files. We executed REPLACE TABLE USING DELTA PARTITIONED BY (runName) TBLPROPERTIES ('delta.columnMapping.mode' = 'name', 'delta.minReaderVersion' = 2, 'delta.minWriterVersion' = 5) AS SELECT * FROM Table. version 25 -> from 24. 127 files. We executed REPLACE TABLE USING DELTA AS SELECT * FROM Table from version 24.

version 23 -> 8cores/1GPU, 16cores/1GPU all get exception when executing SELECT DISTINCT runName FROM Table version 24 -> 8cores/1GPU safe, 16cores/1GPU get exception when executing SELECT DISTINCT runName FROM Table. version 25 -> No exception when executing SELECT DISTINCT runName FROM Table

Btw, it seems that modifying spark.rapids.sql.batchSizeBytes, spark.rapids.sql.concurrentGpuTasks and spark.sql.files.maxPartitionBytes do not help. Exception comes out only when DISTINCT partitionedBy column which makes me believe that are something wrong when reading partitionedBy with GPU.

revans2 commented 4 days ago

Thanks for the updated information. We will try and reproduce this locally and see what we can come up with. For now I think I will just move this over to a bug and then we can figure out if we need to split it up further.

mattahrens commented 3 days ago

P0 scope is to identify opportunity for improved OOM + retry handling. This could potentially be fixed by chunking the data differently given the partition data being read.

LIN-Yu-Ting commented 1 hour ago

@mattahrens I have downloaded spark rapids project by myself and built it successfully. Do you have any suggestions if I would like to run it on my own IDE with debugger to understand the mechanism behind it.