Enabling Detailed Profiling of Graph Nodes in OmniTrace

ROCm / omnitrace

Omnitrace: Application Profiling, Tracing, and Analysis

MIT License

291 stars 23 forks source link

Hi, I am currently working on profiling VLLM and I observed that the tool captures the execution of graph kernels at a high level but does not provide detailed insights into individual graph nodes' execution.

My goal is to obtain detailed profiling information on the execution of individual graph nodes, similar to the capabilities offered by Nvidia Nsight, which allows for tracking nodes instead of just graph-level execution.

I am seeking guidance or a workaround to enable detailed profiling of graph nodes within OmniTrace. Any insights or configuration options?

here is the command I use: omnitrace-run -c ~/.omnitrace.cfg --enable-categories device-critical-trace device_busy device_hip device_hsa device_memory_usage python rocm_hip rocm_hsa rocm_smi rocprofiler roctracer --roctracer-hip-activity --roctracer-hip-api --roctracer-hsa-activity --roctracer-hsa-api -- python -m omnitrace -- vllm_benchmark.py

Thanks in advance.

ROCm / omnitrace

Enabling Detailed Profiling of Graph Nodes in OmniTrace #335