NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
783 stars 228 forks source link

Unshim common classes after spark31x removal #11161

Open jlowe opened 1 month ago

jlowe commented 1 month ago

Is your feature request related to a problem? Please describe. After #11159, there are a number of classes that are under a spark-specific shim directory but are now common across all supported Spark versions.

Describe the solution you'd like Shimmed classes that are now common should be moved to the standard paths and shim directives removed to simplify the code base.Identify common classes. Base classes or traits that only existed for shim reasons and no longer need to be shims should be removed (e.g.: ShimUnaryExpression and many other classes in TreeNode.scala).

jlowe commented 1 month ago

Here's a naive approach to identifying candidates for unshimming, which is seeing how many files under sql-plugin/src/main/spark320/ have no other peers after #11159:

$ cd sql-plugin/src/main/spark320
$ for i in $(find . -type f);do count=$(ls ../*/$i | wc -l); if [[ $count == 1 ]];then echo $i;fi;done
./scala/com/nvidia/spark/rapids/v1FallbackWriters.scala
./scala/com/nvidia/spark/rapids/shims/GpuOrcDataReaderBase.scala
./scala/com/nvidia/spark/rapids/shims/Spark320PlusShims.scala
./scala/com/nvidia/spark/rapids/shims/HashUtils.scala
./scala/com/nvidia/spark/rapids/shims/YearParseUtil.scala
./scala/com/nvidia/spark/rapids/shims/CudfUnsafeRowBase.scala
./scala/com/nvidia/spark/rapids/shims/ShimBaseSubqueryExec.scala
./scala/com/nvidia/spark/rapids/shims/OffsetWindowFunctionMeta.scala
./scala/com/nvidia/spark/rapids/shims/ShimAQEShuffleReadExec.scala
./scala/com/nvidia/spark/rapids/shims/extractValueShims.scala
./scala/com/nvidia/spark/rapids/shims/TreeNode.scala
./scala/com/nvidia/spark/rapids/shims/ShimPredicateHelper.scala
./scala/com/nvidia/spark/rapids/shims/gpuWindows.scala
./scala/com/nvidia/spark/rapids/shims/TypeSigUtil.scala
./scala/com/nvidia/spark/rapids/shims/GpuOrcDataReader320Plus.scala
./scala/com/nvidia/spark/rapids/shims/RapidsCsvScanMeta.scala
./scala/com/nvidia/spark/rapids/shims/Spark31Xuntil33XShims.scala
./scala/com/nvidia/spark/rapids/shims/AnsiCastRuleShims.scala
./scala/com/nvidia/spark/rapids/shims/RebaseShims.scala
./scala/com/nvidia/spark/rapids/shims/GpuBatchScanExecBase.scala
./scala/com/nvidia/spark/rapids/shims/OrcShims320untilAllBase.scala
./scala/com/nvidia/spark/rapids/shims/Spark320PlusNonDBShims.scala
./scala/com/nvidia/spark/rapids/shims/spark320/SparkShimServiceProvider.scala
./scala/com/nvidia/spark/rapids/spark320/RapidsShuffleManager.scala
./scala/org/apache/spark/rapids/shims/GpuShuffleBlockResolver.scala
./scala/org/apache/spark/rapids/shims/ShuffledBatchRDDUtil.scala
./scala/org/apache/spark/rapids/shims/storage/ShimDiskBlockManager.scala
./scala/org/apache/spark/sql/rapids/shims/misc.scala
./scala/org/apache/spark/sql/rapids/shims/Spark32XShimsUtils.scala
./scala/org/apache/spark/sql/rapids/shims/AvroUtils.scala
./scala/org/apache/spark/sql/rapids/shims/RapidsQueryErrorUtils.scala
./scala/org/apache/spark/sql/rapids/shims/RapidsShuffleThreadedWriter.scala
./scala/org/apache/spark/sql/rapids/shims/datetimeExpressions.scala
./scala/org/apache/spark/sql/rapids/GpuDataSource.scala
./scala/org/apache/spark/sql/execution/ShimTrampolineUtil.scala
./scala/org/apache/spark/sql/hive/rapids/shims/GpuInsertIntoHiveTable.scala
./scala/org/apache/spark/sql/hive/rapids/shims/GpuCreateHiveTableAsSelectCommand.scala
./scala/org/apache/spark/storage/RapidsShuffleBlockFetcherIterator.scala
./scala/org/apache/spark/storage/RapidsPushBasedFetchHelper.scala
./java/com/nvidia/spark/rapids/shims/ShimSupportsRuntimeFiltering.java
./java/com/nvidia/spark/rapids/shims/XxHash64Shims.scala