Open viadea opened 1 week ago
After taking a look at the eventlog.
The get_json_object appears as an expression of project
Currently, we can link project
to a stageID iff it is contained inside WholestageCodeGen
because the ltter has metrics that can be linked into stageID.
There is no clear path to work around this. We can try adding some heuristics that link an exec to a stage based on the neighboring expressions, but then we need to come up with a well defined strategy for that. Otherwise, it will be come a big mess of heuristics that's hard to understand.
We need to investigate further by checking the SHS code that parses the RDD information inside a stage. There might be some further information about linkage between the execs and their stages. This concern has been raised before in https://github.com/NVIDIA/spark-rapids-tools/issues/794
Describe the bug unsupportedoperators.csv shows stageID=-1 for certain unsupported operator.
Does it mean Qual tool could not figure out which stage is associated with certain unsupported operators? As a result, Qual tool thinks the % of unsupported duration is very low which could be wrong.