NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
49 stars 36 forks source link

[BUG] unsupported operator handling logic should use action column and not override table #1131

Closed eordentlich closed 3 months ago

eordentlich commented 3 months ago

Describe the bug In the logic here, https://github.com/NVIDIA/spark-rapids-tools/blob/dev/user_tools/src/spark_rapids_tools/tools/qualx/preprocess.py#L1119 the Action column of exec_info dataframe should be used and not the override table. The override table should be removed, with the Action column being the sole source of truth for unsupported operator perf impact or lack thereof. This was the case in the previous version which is still in the repo (but not invoked): https://github.com/NVIDIA/spark-rapids-tools/blob/dev/user_tools/src/spark_rapids_tools/tools/model_xgboost.py#L319-L320 The new wholestagecodegen part of the logic should be preserved for now.

Related issue: