Closed nvliyuan closed 3 months ago
@nvliyuan - I couldn't reproduce this issue. I tried on Spark-3.2, 3.3 and 3.4. Also with tools version - 24.02.05-SNAPSHOT, 24.04.0 and latest 24.04.1-SNAPSHOT. Could you please check. min_by is classified as unsupported.
Output of rapids_4_spark_qualification_output_unsupportedOperators.csv based on the query mentioned in the description:
App ID,SQL ID,Stage ID,ExecId,Unsupported Type,Unsupported Operator,Details,Stage Duration,App Duration,Action
"local-1717711113162",0,2,"0","Exec","HashAggregate","Contains unsupported expr",90,410483,"Triage"
"local-1717711113162",0,2,"0","Expr","min_by","Unsupported",90,410483,"Triage"
"local-1717711113162",0,0,"1","Exec","HashAggregate","Contains unsupported expr",569,410483,"Triage"
"local-1717711113162",0,0,"1","Expr","min_by","Unsupported",569,410483,"Triage"
"local-1717711113162",0,0,"2","Exec","LocalTableScan","Unsupported",569,410483,"IgnorePerf"
"local-1717711113162",0,-1,"3","Exec","AdaptiveSparkPlan","Unsupported",0,410483,"IgnoreNoPerf"
Output of explain in spark-shell:
df.explain
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[course#10], functions=[min_by(year#11, earnings#12)])
+- Exchange hashpartitioning(course#10, 200), ENSURE_REQUIREMENTS, [plan_id=42]
+- HashAggregate(keys=[course#10], functions=[partial_min_by(year#11, earnings#12)])
+- LocalTableScan [course#10, year#11, earnings#12]
Hi @nartal1 , I know what happened, I ran the query with the plugin enabled, and although the event log is the same as the CPU run, it cannot generate the output. Since Q tool is for CPU eventlog, so this issue is invalid, close the issue, thx for investigating.
I ran below scripts and expect the Qualification tool to catch the unsupported function MinBy, but failed.
run tool:
empty content in rapids_4_spark_qualification_output_unsupportedOperators.csv