Open jeromeku opened 3 months ago
@jeromeku Are you expecting something like a kernel_dataframe with a callstack column = ["aten:op1", "aten:op", "module name"..]
Does the call_stack logic help to achieve something similar to your request? https://github.com/facebookresearch/HolisticTraceAnalysis/blob/main/hta/common/call_stack.py
It should be able to link from the kernel up to the operators (and likely user annotations like profiler.profile)
Something like the "Events" view in nsys
, where you can see a trace of kernels by time, grouped by nvtx range. See this for example from this thread.
Essentially what you see when you do prof.key_averages().print_table
except:
record_function('my_range')
, I should see a top-level my_range
followed by the entire call stack of operators and the kernels they ultimately dispatch to ordered by time along with other collected stats.
π Motivation and context
Is it possible to correlate kernel distribution with ranges annotated either through
torch.cuda.nvtx
ortorch.profiler.profile
?The use case is model architecture optimization. I'd like a to understand where the bottlenecks are in a model forward / backwards and where the opportunities are for kernel fusion, cuda graphs, etc. Exporting a chrome / tensorboard trace can be helpful for visualizing such areas when model regions are annotated with
torch.profiler.record_function
(ornvtx
) but it would be helpful to have this information available for further analysis as a dataframe.Description
It would be useful to have kernel breakdown by annotation range aggregated into a dataframe to further investigate problematic modules and layers within the model:
aten
/torch
ops that dispatched these kernelsAlternatives
No response
Additional context
No response