Open wjxiz1992 opened 9 months ago
cc @winningsix
I doubt if the semaphore wait time is per operation or it's at stage level. If it's not at per-operation level, it may require code refator at spark-rapids side.
Op time should not include the GPU semaphore wait time. If it does we need to fix it for each operator where it happens.
We had a semaphore wait time metric per operator, but it was really hard to maintain, and impossible to do in all cases. If someone wants to try and put it back in, I would suggest that we try and set a thread local metric for it when processing and then remove it when done instead of trying to pass the metric around. As that got to be really hard to maintain.
Is your feature request related to a problem? Please describe.
When analyzing from GPU kernel point of view, we want to understand the actual computing time for a kernel. But currently the Scan op time contains the GPU semaphore wait time, which disturbs the performance analysis.
Describe the solution you'd like provide 2 views, one for Op time with GPU semaphore wait time, the other without.
With such clear view, kernel devs can quickly identify the kernel perf issue according to op time.