NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
792 stars 230 forks source link

Failing Delta Lake Tests for Databricks 13.3 Due to WriteIntoDeltaCommand #9675

Closed razajafri closed 10 months ago

razajafri commented 10 months ago

There are currently a few tests that are failing due to WriteIntoDeltaCommand expression not being on the GPU.

Here is the exception seen on Databricks 13.3 platform

!Exec <DataWritingCommandExec> cannot run on GPU because not all data writing commands can be replaced
  ! <WriteIntoDeltaCommand> cannot run on GPU because GPU does not currently support the operator class com.databricks.sql.transaction.tahoe.commands.WriteIntoDeltaCommand
  !Exec <WriteFilesExec> cannot run on GPU because WriteFilesExec can't run on GPU because parent can't run on GPU
      !Exec <FilterExec> cannot run on GPU because not all expressions can be replaced
        ! <IncrementMetric> true cannot run on GPU because GPU does not currently support the operator class com.databricks.sql.execution.metric.IncrementMetric
          @Expression <Literal> true could run on GPU

23/11/11 23:51:45 ERROR GpuOverrideUtil: Encountered an exception applying GPU overrides java.lang.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.datasources.WriteFilesExec
WriteFiles
+- GpuColumnarToRow false
   +- GpuProject [0 AS a#1559]
      +- GpuRowToColumnar targetsize(104857600)
         +- Filter true
            +- GpuColumnarToRow false
               +- GpuFileGpuScan parquet [] Batched: true, DataFilters: [], Format: Parquet, Location: TahoeBatchFileIndex[file:/tmp/pyspark_tests/1011-182421-dr4cfwta-10-59-173-177-master-117977-5671..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>

java.lang.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.datasources.WriteFilesExec
WriteFiles
+- GpuColumnarToRow false
   +- GpuProject [0 AS a#1559]
      +- GpuRowToColumnar targetsize(104857600)
         +- Filter true
            +- GpuColumnarToRow false
               +- GpuFileGpuScan parquet [] Batched: true, DataFilters: [], Format: Parquet, Location: TahoeBatchFileIndex[file:/tmp/pyspark_tests/1011-182421-dr4cfwta-10-59-173-177-master-117977-5671..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>
razajafri commented 10 months ago

fixed as allow_non_gpu in https://github.com/NVIDIA/spark-rapids/pull/9644