facebookresearch / HolisticTraceAnalysis

A library to analyze PyTorch traces.
http://hta.readthedocs.io
MIT License
306 stars 43 forks source link

Kernel Breakdown by Annotation Range #180

Open jeromeku opened 3 months ago

jeromeku commented 3 months ago

πŸš€ Motivation and context

Is it possible to correlate kernel distribution with ranges annotated either through torch.cuda.nvtx or torch.profiler.profile?

The use case is model architecture optimization. I'd like a to understand where the bottlenecks are in a model forward / backwards and where the opportunities are for kernel fusion, cuda graphs, etc. Exporting a chrome / tensorboard trace can be helpful for visualizing such areas when model regions are annotated with torch.profiler.record_function (or nvtx) but it would be helpful to have this information available for further analysis as a dataframe.

Description

It would be useful to have kernel breakdown by annotation range aggregated into a dataframe to further investigate problematic modules and layers within the model:

Alternatives

No response

Additional context

No response

briancoutinho commented 2 months ago

@jeromeku Are you expecting something like a kernel_dataframe with a callstack column = ["aten:op1", "aten:op", "module name"..]

Does the call_stack logic help to achieve something similar to your request? https://github.com/facebookresearch/HolisticTraceAnalysis/blob/main/hta/common/call_stack.py

It should be able to link from the kernel up to the operators (and likely user annotations like profiler.profile)

jeromeku commented 2 months ago

Something like the "Events" view in nsys, where you can see a trace of kernels by time, grouped by nvtx range. See this for example from this thread.

Essentially what you see when you do prof.key_averages().print_table except: