Closed sethrj closed 1 month ago
After looking at more Perfetto features, I think integrating it would be useful, at least supporting the basic slices and counters events.
First, we still have the ScopedProfiling
feature of recording and displaying function calls on a timeline, which can be used if CUDA/HIP is unavailable. Perfetto also supports counters, e.g. we could have an active track counter updated each step which would be displayed on a timeline (similar to the memory usage below).
Then, if we want to support Linux ftrace API (requires root privilege), we can find out anything that the kernel is doing: e.g. which processes/threads are scheduled on which CPU cores, memory usage, enter/exit a syscall.
The first picture below illustrates which threads are scheduled on CPU5, the status of these threads, and their syscalls. The second picture shows the memory usage for the celer-sim process only.
In addition to visualization on the Web Interface, there is also a SQL interface ( accessible via either a Python API or the web interface). For example, we can compute the average time per Celeritas action:
That's awesome @esseivaju ! Let's do it. I'm curious how this Google-developed tool compares to the in-house CERN performance tools.
Are you referring to AdaptivePerf?
It looks like Perfetto, the Google performance/tracing tool, could be useful to integrate into Celeritas to measure performance. It might be straightforward to wrap their tracing SDK underneath
ScopedProfiling
.