NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
44 stars 34 forks source link

Sync supported operators with plugin changes and update default score #1020

Closed nartal1 closed 1 month ago

nartal1 commented 1 month ago

This fixes https://github.com/NVIDIA/spark-rapids-tools/issues/1007

Updated the supportedExprs.csv to include ArrayFilter and updated the notes section for json_tuple to be in sync with plugin changes.

Spark plan:

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[], functions=[sum(size(f#4, true))])
   +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=15]
      +- HashAggregate(keys=[], functions=[partial_sum(size(f#4, true))])
         +- Project [filter(array(id#0L, (id#0L + 1), (id#0L - 100), (id#0L + 3), (id#0L + 2)), lambdafunction(((lambda f#5L % 3) = 0), lambda f#5L, false)) AS f#4]
            +- Range (0, 1000, step=1, splits=64)

Before:

===================================================================================
|   App Name|             App ID|App Duration|SQL DF Duration|GPU Opportunity|Estimated GPU Duration|Estimated GPU Speedup|Estimated GPU Time Saved|      Recommendation|Unsupported Execs|Unsupported Expressions|Estimated Job Frequency (monthly)|
=====================================================================================================================================================================================================================================================
|Spark shell|local-1715892862056|       31861|           1346|           1010|              31103.44|                 1.02|                  757.55|     Not Recommended|          Project|                 filter|                               30|
=====================================================================================================================================================================================================================================================

After updating the supportedExprs.csv file:

=====================================================================================================================================================================================================================================================
|   App Name|             App ID|App Duration|SQL DF Duration|GPU Opportunity|Estimated GPU Duration|Estimated GPU Speedup|Estimated GPU Time Saved|      Recommendation|Unsupported Execs|Unsupported Expressions|Estimated Job Frequency (monthly)|
=====================================================================================================================================================================================================================================================
|Spark shell|local-1715892862056|       31861|           1346|           1346|               30831.7|                 1.03|                 1029.29|     Not Recommended|                 |                       |                               30|
=====================================================================================================================================================================================================================================================