NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
53 stars 37 forks source link

[FEA] Qualification tool: Report supported expressions in the output file #1325

Open nartal1 opened 2 months ago

nartal1 commented 2 months ago

Is your feature request related to a problem? Please describe. Qualification tool currently reports the unsupported expressions in unsupportedOperators.csv. Qual tool also reports Execs(both supported and unsupported) per sql in qualification_statistics.csv. It would be nice to report supported expressions per-sql . It would help to determine the frequency of a particular expression in a sql/job/application.

Describe the solution you'd like Currently, the Qualification tool doesn't capture supported expressions. We need to update all the ExecParsers to capture the expressions so that we can report them later.

Things to consider:

  1. Since there could be many expressions in an Exec, capturing these would add to memory pressure. We need to come up with data structure to update the expression count rather than storing the expression in a map and then counting it at the end.
  2. Discussion is required on how to report it. Should it be a separate file ? Or should we include in any of the current output files? If we are going to report it per-SQL, then the report may get too lengthy and not easy to interpret. We also need to discuss the format of the output. Should it include Exec name as well OR just the Expression name and count per-sql
### Tasks
- [ ] Refactor ExecParser code, remove duplicate code in Parsers. 
- [ ] Update all Exec Parsers to capture supported expressions in an Exec
- [ ] Add code to report the supported expressions. 
amahussein commented 1 week ago

We have so many code redundancy in creating the execParsers. I would suggest that the code duplicates are removed so that we don't have to copy/paste again in tens of class files. This ExecParser needs to be refactored to be trait + abstract class. Then we only implement classes that are completely require different kind of handling. This applies to the Photon classes as well.

nartal1 commented 1 week ago

Thanks @amahussein ! I have update the description. Will start on refactor first followed by adding supported operators.